docs.intersystems.com
InterSystems IRIS Data Platform 2019.2  /  Using the InterSystems Spark Connector

Using the InterSystems Spark Connector
Introduction
Previous section           Next section
InterSystems: The power behind what matters   
Search:  


The InterSystems IRIS™ Spark Connector enables an InterSystems IRIS database to function as an Apache Spark data source. It implements a plug-compatible replacement for the standard Spark jdbc data source. This allows the results of a complex SQL query executed within the database to be retrieved by the Spark program as a Spark Dataset, and for a Dataset to be written back into the database as a SQL table.
Features
The Spark Connector has an intimate knowledge of —and tight integration with —the underlying database server that provides several advantages over the standard Spark jdbc data source:
Data Source Provider Class Names
Spark data sources are accessed through provider class names. The standard Spark jdbc data source provider class is named org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider, and can be used by specifying it in a call to format() as demonstrated in the following example:
   var df = spark.read
      .format("org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider")
      .option("dbtable","mytable").load()
Since the full class name is very awkward to use, it is normally specified with the short alias jdbc:
   var df = spark.read.format("jdbc").option("dbtable","mytable").load()
The InterSystems Spark Connector data source is referenced in exactly the same way, using the full provider class name com.intersystems.spark or the short alias iris:
   var df = spark.read.format("iris").option("dbtable","mytable").load()
Important:
The terms jdbc and iris (lower case, in the same typography as other class names) are used frequently in this book, and always refer specifically to the data source provider class names, never to Java JDBC or InterSystems IRIS.
Requirements and Configuration
Requirements
The Spark Connector requires the following:
Optional Configuration Settings
The Connector recognizes a number of configuration settings that parameterize its operation. These are parsed from the Apache Spark SparkConfconfiguration structure at startup and may be specified by:
The url, user, and password options specify connection string values for a read or write. The default values are automatically defined using information from the default InterSystems IRIS master instance specified in the SparkConf configuration. Connection options can be explicitly specified in a read or write operation (see “Connection Options”) to override the defaults.
Default values are also assigned to the following settings:
For more information, see the Apache Spark documentation on “Spark Configuration” and the org.apache.spark.SparkConf  class.


Previous section           Next section
Send us comments on this page
View this book as PDF   |  Download all PDFs
Copyright © 1997-2019 InterSystems Corporation, Cambridge, MA
Content Date/Time: 2019-08-22 06:48:58