DataFrameWriter API can be used to save Spark dataframes to different file formats and external sources. The common syntax usage is as below.
dataframe.write.format(...).mode(...).option(...).partitionBy(...).bucketBy(...).sortBy(...).save(path)
The default write mode is PARQUET file and files are saved to hdfs . The mode can be set to append or overwrite during save.
Formats | Syntax |
Text | Dataframe should be single Column String format to save in text format |
df1.write.format(“text”).save(“/data/txtout1/”) | |
df1.write.text(“/data/textdata/”) | |
csv | df.write.option(“header”,”true”).csv(“/data/csvout”) |
df.write.format(“csv”).option(“header”, “true”).save(“/data/csvout1”) | |
parquet | df.write.save(“/data/parquetout”) |
df.write.format(“parquet”).save(“/data/parquetout1”) | |
df.write.parquet(“/data/parquetout2”) | |
json | df.write.format(“json”).save(“/data/jsonout”) |
df.write.json(“/data/jsonout1/”) | |
orc | df.write.orc(“/data/orcout1”) |
df.write.format(“orc”).save(“/data/orcout”) | |
avro | df.write.format(“avro”).save(“/data/outavro/”) |
hive table | df.write.saveAsTable(“HiveShema.TableName”) |