docs.intersystems.com
Home  /  Application Development: Analytics Options  /  Using the InterSystems Spark Connector  /  Spark Connector Quick Reference


Using the InterSystems Spark Connector
Spark Connector Quick Reference
[Back] 
InterSystems: The power behind what matters   
Search:  


This chapter is a quick reference to the InterSystems Spark Connector API methods, which define a custom Scala interface for the Spark Connector. It extends the generic Spark interface with a set of implicit Scala classes and types that provide more convenience and better type safety.
Note:
This is not a definitive reference for this API. It is intended as a “cheat sheet” containing only short descriptions of members and parameters, plus links to detailed documentation and examples.
For the most complete and up-to-date information, see the ScalaDoc for the InterSystems Spark Connector API, located in <install-dir>/dev/java/doc/isc-spark/index.html.
The Spark Connector API provides the following extension methods to Spark Scala classes:
Spark Connector Method Reference
Each entry in this section provides a brief method overview including signatures and parameter descriptions. Most entries also include links to examples and more detailed information.
DataFrameReader Extension Methods
iris()
DataFrameReader.iris() executes a query on the cluster or loads the specified table, tunes partitioning, and returns results in a DataFrame. See Using the iris() Read and Write Methods for more information and examples.
def iris(text: String,mfpi: N = 1): DataFrame
def iris(text: String,column: String,lo: Long,hi: Long,partitions: N): DataFrame
See Partition Tuning Options for detailed information and examples concerning partitioning parameters (column, lo, hi, partitions, and mfpi).
address()
DataFrameReader.address() specifies the connection details of the cluster to read from. Overrides the default instance specified in the Spark configuration for the duration of this read operation.
def address(url: String,user: String = "",password: String = ""): DataFrameReader
See Connection Options for more information and examples.
DataFrameWriter Extension Methods
iris()
DataFrameWriter.iris() saves a DataFrame to the specified table on the cluster. See Using the iris() Read and Write Methods for detailed information and examples.
def iris(table: String): Unit
See CREATE TABLE in the InterSystems SQL Reference for more information on table creation options.
address()
DataFrameWriter.address() specifies the connection details of the cluster to write to. Overrides the default instance specified in the Spark configuration for the duration of this write operation.
def address(url: String,user: String = "",password: String = ""): DataFrameWriter[T]
See Connection Options for more information and examples.
description()
DataFrameWriter.description() specifies a description to document the newly created table.
def description(value: String): DataFrameWriter[T]
See InterSystems IRIS Save Options for more information and examples.
publicRowID()
DataFrameWriter.publicRowID() specifies whether the master RowID field of the newly created table should be publicly visible.
def publicRowID(value: Boolean): DataFrameWriter[T]
See InterSystems IRIS Save Options for more information and examples.
shard()
DataFrameWriter.shard() specifies the user defined shard key for the newly created table, or specifies whether the newly created table is to be sharded. Has no effect if the table already exists and the save mode is anything other than OVERWRITE.
def shard(fields: String*): DataFrameWriter[T]
def shard(value: Boolean): DataFrameWriter[T]
See InterSystems IRIS Save Options for more information and examples.
autobalance()
Specifies whether or not inserted records should be evenly distributed among the available shards of the cluster when saving to a table that is sharded on a system assigned shard key. Has no effect if the table is not sharded, or is sharded using a custom shard key.
def autobalance(value: Boolean): DataFrameWriter[T]
See InterSystems IRIS Save Options for more information and examples.
SparkSession and SparkContext Extension Methods
dataset[T]()
SparkSession.dataset[T]() executes a query on the cluster or loads the specified table, optionally tunes partitioning, and returns results in a Dataset[T]s. See Using the dataset[T]() and rdd[T]() Methods for more information and examples.
def dataset[T](text: String, mfpi: N = 1)(implicit arg0: Encoder[T]): Dataset[T]
def dataset[T](text: String, column: String, lo: Long, hi: Long, partitions: N)(implicit arg0: Encoder[T]): Dataset[T]
See Partition Tuning Options for detailed information and examples concerning partitioning options.
rdd[T]()
SparkContext.rdd[T]() executes a query on the cluster or loads the specified table, optionally tunes partitioning, and computes a suitably partitioned RDD whose elements are formatted by the provided formatting function (an instance of Format[T]). See Using the dataset[T]() and rdd[T]() Methods for more information and examples.
def rdd[T](text: String, mfpi: N, format: Format[T])(implicit arg0: ClassTag[T]): RDD[T]
def rdd[T](text: String, column: String, lo: Long, hi: Long, partitions: N, format: Format[T])(implicit arg0: ClassTag[T]): RDD[T]
See Partition Tuning Options for concerning partitioning parameters (column, lo, hi, partitions, and mfpi).
See Format[T] for more information on creating and using an instance of Format[T].