docs.intersystems.com
Home  /  Application Development: Analytics Options  /  Using the InterSystems Spark Connector  /  Introduction


Using the InterSystems Spark Connector
Introduction
[Back]  [Next] 
InterSystems: The power behind what matters   
Search:  


The InterSystems IRIS™ Spark Connector enables an InterSystems IRIS database to function as an Apache Spark data source. It implements a plug-compatible replacement for the standard Spark jdbc data source. This allows the results of a complex SQL query executed within the database to be retrieved by the Spark program as a Spark Dataset, and for a Dataset to be written back into the database as a SQL table.
Features
The Spark Connector has an intimate knowledge of —and tight integration with —the underlying database server that provides several advantages over the standard Spark jdbc data source:
Data Source Provider Class Names
Spark data sources are accessed through provider class names. The standard Spark jdbc data source provider class is named org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider, and can be used by specifying it in a call to format() as demonstrated in the following example:
   var df = spark.read
      .format("org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider")
      .option("dbtable","mytable").load()
Since the full class name is very awkward to use, it is normally specified with the short alias jdbc:
   var df = spark.read.format("jdbc").option("dbtable","mytable").load()
The InterSystems Spark Connector data source is referenced in exactly the same way, using the full provider class name com.intersystems.spark or the short alias iris:
   var df = spark.read.format("iris").option("dbtable","mytable").load()
Important:
The terms jdbc and iris (lower case, in the same typography as other class names) are used frequently in this book, and always refer specifically to the data source provider class names, never to Java JDBC or InterSystems IRIS.
Requirements and Configuration
Requirements
The Spark Connector requires the following:
Optional Configuration Settings
The Connector recognizes a number of configuration settings that parameterize its operation. These are parsed from the Apache Spark SparkConfconfiguration structure at startup and may be specified by:
The url, user, and password options specify connection string values for a read or write. The default values are automatically defined using information from the default InterSystems IRIS master instance specified in the SparkConf configuration. Connection options can be explicitly specified in a read or write operation (see Connection Options) to override the defaults.
Default values are also assigned to the following settings:
For more information, see the Apache Spark documentation on Spark Configuration and the org.apache.spark.SparkConf  class.