Pyspark Save Dataframe To S3

June 22, 2024 Post a Comment

I want to save dataframe to s3 but when I save the file to s3 , it creates empty file with ${folder_name}, in which I want to save the file. Syntax to save the dataframe :- f.write

Solution 1:

I was able to do it by using below code.

df.write.parquet("s3a://bucket-name/shri/test.parquet",mode="overwrite")

Solution 2:

As far as I know, there is no way to control the naming of the actual parquet files. When you write a dataframe to parquet, you specify what the directory name should be, and spark creates the appropriate parquet files under that directory.

Python Courses, Training, and Tutorials

Pyspark Save Dataframe To S3

Solution 1:

Solution 2:

Post a Comment for "Pyspark Save Dataframe To S3"