Skip to main content

Apache Spark Support

The Spark Connector is deprecated

Due to continued improvements in both Apache Spark and the InterSystems JDBC driver, the InterSystems Spark Connector no longer provides significant advantages over the standard Spark JDBC connector. The Spark Connector is deprecated, and will be discontinued in an upcoming 2022 release.

The InterSystems IRIS® Spark Connector is an implementation of the Data Source API for Apache Spark that allows the Spark data processing engine to make optimal use of the InterSystems IRIS® data platform and its distributed data capabilities.

This chapter provides technical details about the InterSystems Spark Connector. The following topics are discussed:

Apache Spark and the InterSystems Spark Connector

Apache SparkOpens in a new tab is a high performance Java analytics engine for use in clustered computing environments. Its heart is the Resilient Distributed Dataset (RDD) which represents a distributed, fault tolerant, collection of data that can be operated on in parallel. Spark includes libraries for SQL, machine learning, graph processing, stream processing, and many other functions.

Spark provides a jdbc data sourceOpens in a new tab that allows the results of a complex SQL query executed within the database to be retrieved by Spark as a Dataset, and for a Dataset to be written back into the database as a SQL table.

The InterSystems IRIS data platform can connect to Spark using only the jdbc data source, but the InterSystems Spark Connector implements a custom iris data source that provides important enhancements for optimal performance.


The terms jdbc and iris (lower case, in the same typography as other class names) are used frequently in this book, and always refer specifically to the data source provider class names, never to Java JDBC or InterSystems IRIS.

Installation and Configuration

See the following sections in Using the InterSystems Spark Connector for information on installation and configuration:

Also see the following related documents:

Spark Connector Compliance and Compatibility

The InterSystems IRIS Spark Connector is a plug-compatible replacement for the Spark jdbc data source.

Spark Connector Extensions

The InterSystems IRIS Spark Connector provides InterSystems-specific extension methods to improve usability.

Spark is implemented using a combination of Java and Scala, and can run on any JVM. The Data Source API and all extensions are implemented in Scala.

FeedbackOpens in a new tab