Connect Oracle with Spark

Reading data from oracle database with Spark can be done with these steps.

  1.  Get The JDBC Thin Driver 

         Download the proper driver ojdbc6.jar for Oracle11.2 and ojdbc7.jar for Oracle12c.
           Check the compatibility of the drivers and its supported JDK versions.

      2.  Invoke Spark-shell with jars option

spark-shell --jars  "/home/work/ODBC/ojdbc7.jar"

3. Read the oracle table through spark read function

Use sparks read function to connect to jdbc source with proper connection string and options.

 val empDF ="jdbc").
             option("url", "jdbc:oracle:thin:@").
             option("dbtable", "employee").
             option("user", "MYEMPDB").
             option("password", "mypassword").
             option("driver", "oracle.jdbc.driver.OracleDriver").

In this example table employee is passed to dbtable option. Instaead of table you can also pass a full fledged query

Ensure your Connection string or URL is proper 


This example is tested with Spark 2.4 with Oracle 12c and JDK8 .

scala> val empDF ="jdbc").
     |                 option("url", "jdbc:oracle:thin:@").
     |                 option("dbtable", "employee").
     |                 option("user", "EMPDB").
     |                 option("password", "Pass12@!").
     |                 option("driver", "oracle.jdbc.driver.OracleDriver").
     |                 load()
empDF: org.apache.spark.sql.DataFrame = [EMPID: decimal(38,10), EMPNAME: string ... 2 more fields]

scala> empDF.printSchema
 |-- EMPID: decimal(38,10) (nullable = true)
 |-- EMPNAME: string (nullable = true)
 |-- TITLE: string (nullable = true)
 |-- SALARY: decimal(38,10) (nullable = true)

|          EMPID|EMPNAME|  TITLE|         SALARY|
|1002.0000000000|  Rahul|     WK|2000.0000000000|
|1002.0000000000|  Raina|     VC|2000.0000000000|
|1001.0000000000|  Dhoni|Captain|9000.0000000000|


Leave a Reply