Saving Spark Dataframes to different file formats

DataFrameWriter API can be used to save Spark dataframes to different file formats and external sources. The common syntax usage is as below.

dataframe.write.format(...).mode(...).option(...).partitionBy(...).bucketBy(...).sortBy(...).save(path)

The default write  mode is PARQUET file and files are saved  to hdfs . The mode can be set to append or overwrite during save.

FormatsSyntax
TextDataframe should be single Column String format to save in text format
df1.write.format(“text”).save(“/data/txtout1/”)
df1.write.text(“/data/textdata/”)
csvdf.write.option(“header”,”true”).csv(“/data/csvout”)
df.write.format(“csv”).option(“header”, “true”).save(“/data/csvout1”)
parquetdf.write.save(“/data/parquetout”)
df.write.format(“parquet”).save(“/data/parquetout1”)
df.write.parquet(“/data/parquetout2”)
jsondf.write.format(“json”).save(“/data/jsonout”)
df.write.json(“/data/jsonout1/”)
orcdf.write.orc(“/data/orcout1”)
df.write.format(“orc”).save(“/data/orcout”)
avrodf.write.format(“avro”).save(“/data/outavro/”)
hive tabledf.write.saveAsTable(“HiveShema.TableName”)

Leave a Reply