Spark write or save dataframes examples

Dataframes in spark can be saved into different file formats with DataframeWriter’s write API.Spark supports text,parquet,orc,json file formats. By default it saves in parquet file format. You can provide different compression options during saving the output . With mode you…

Connect Oracle with Spark

Reading data from oracle database with Spark can be done with these steps.  Get The JDBC Thin Driver           Download the proper driver ojdbc6.jar for Oracle11.2 and ojdbc7.jar for Oracle12c.           Check the compatibility…

Spark and Hadoop Compression Codecs

The below table lists the available compression codes in spark and hadoop ecosystem. Compression Fully qualified class name Alias deflate org.apache.hadoop.io.compress.DefaultCodec deflate gzip org.apache.hadoop.io.compress.GzipCodec gzip bzip2 org.apache.hadoop.io.compress.BZip2Codec bzip2 lzo com.hadoop.compression.lzo.LzopCodec lzo LZ4 org.apache.hadoop.io.compress.Lz4Codecorg.apache.spark.io.LZ4CompressionCodec lz4 LZF org.apache.spark.io.LZFCompressionCodec   Snappy org.apache.hadoop.io.compress.SnappyCodecorg.apache.spark.io.SnappyCompressionCodec snappy…

Ways to Create Spark RDD

RDDs(Resilient Distributed Datasets) can be created in many different ways. Reading data from different sources Text file RDDs can be created using SparkContext’s textFile method. This method takes a URI for the file (either a local path on the machine,…

spark dataframe select columns

Spark select method is used to select the specific columns from the dataframe. Its is a transformation operation which is lazily evaluated to create a new dataframe. You should pass list of column names(string) or column expressions as argument. The…