Class

com.intersys.spark

DefaultSource

Related Doc: package spark

Permalink

final class DefaultSource extends CacheDataSource

Registers the InterSystems IRIS Spark Connector as a Spark SQL data source provider for the format "com.intersys.spark", also known by its short alias "iris".

This allows clients to execute queries against a cluster by calling Spark's generic load and save functions. For example:

spark.read
     .format("com.intersys.spark")
     .option("query","SELECT * FROM Owls")
     .load()

executes the query "SELECT * FROM Owls" on the default cluster, and hands its rows back in the form of an appropriately partitioned DataFrame.

Here read means 'execute a SELECT statement against the database', while write means 'execute batch INSERT statements against a database table'.

See also

Apache Spark Documentation for more on how to use the generic load and save functions.

Linear Supertypes
CacheDataSource, Logging, RelationProvider, CreatableRelationProvider, DataSourceRegister, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. DefaultSource
  2. CacheDataSource
  3. Logging
  4. RelationProvider
  5. CreatableRelationProvider
  6. DataSourceRegister
  7. AnyRef
  8. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new DefaultSource()

    Permalink

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  6. def createRelation(sc: SQLContext, mode: SaveMode, options: Map[String, String], df: DataFrame): BaseRelation

    Permalink

    Write the given DataFrame to the database and return a base relation with which it can subsequently be read.

    Write the given DataFrame to the database and return a base relation with which it can subsequently be read.

    Arguments are supplied as a collection of (key,value) pairs. They include all of the options described above, as for reading; in particular:

    • dbtable The name of the target table to write to.

    Other optional arguments include:

    • url: A string of the form "Cache://host:port/namespace" that specifies the cluster to which the data is to be written. If omitted, the default cluster specified via the "spark. isc.master.url" configuration setting is used instead.
    • user: The user account with which to make the connection to the cluster named in the "url" option above.
    • password: The password for the given user account.
    • batchsize: The number of rows to insert per server round trip.

    Default = 1000.

    • isolationlevel: The transaction isolation level.

    Can be either NONE, REPEATABLE_READ, READ_COMMITTED, READ_UNCOMMITTED, or SERIALIZABLE.

    Corresponds to the standard transaction isolation levels specified by the JDBC Connection object.

    Default = READ_UNCOMMITTED.

    • description: An optional description for the newly created table.
    • publicrowid: Specifies whether or not the master row ID column for the newly created table is to be made publicly visible.
    • shard: Indicates the records of the table are to be distributed across the instances of the cluster; the optional value specifies the shard key to use - see Caché documentation for CREATE TABLE for more details.
    • autobalance: When writing a dataset to a table that is sharded on a system assigned shard key, the value true specifies that the inserted records are to be evenly distributed amongst the available shards,while the value false specifies that that they be sent to whichever shard is closest to where the partitions of the dataset reside.

    Default = true.

    sc

    The context in which to create the new relation.

    mode

    Describes the behavior if the target table already exists.

    options

    The arguments with which to initialize the new relation.

    df

    The source data frame that is to be written.

    returns

    A new CacheRelation that can be called upon to execute the given query on the cluster as needed.

    Definition Classes
    CacheDataSource → CreatableRelationProvider
    Exceptions thrown

    Exception if the proposed write operation would certainly fail.

    IllegalArgumentException if passed an invalid parameter value.

    See also

    Using JDBC Transaction Isolation Levels

    InterSystems Caché SQL Reference for more information on the options supported by the CREATE TABLE statement.

  7. def createRelation(sc: SQLContext, options: Map[String, String]): BaseRelation

    Permalink

    Constructs a BaseRelation from the given arguments.

    Constructs a BaseRelation from the given arguments. These are supplied as a collection of (key,value) pairs, and must include a value for either:

    • query: The text of a query to be executed on the cluster, or
    • dbtable: The name of a database table within the cluster, in which case the entire table is loaded.

    Other optional arguments include:

    • url: A string of the form "Cache://host:port/namespace" that specifies the cluster from which the data is to be read. If omitted, the default cluster specified via the "spark. isc.master.url" configuration setting is used instead.
    • user: The user account with which to make the connection to the cluster named in the "url" option above.
    • password: The password for the given user account.
    • mfpi: The maximum number of partitions per server instance to include in any implicit query factorization performed by the server.

    Default = 1.

    • fetchsize: The number of rows to fetch per server round trip.

    Default = 1000.

    • partitionColumn, lowerBound, upperBound, and numPartitions: An explicit description of how to partition the queries sent to each distinct instance; these have an identical semantics to the similarly named arguments for the JDBC data source that is built into Spark.

    If both mfpi and partionColumn arguments are given, then the explicit partitioning specification takes precedence.

    sc

    The context in which to create the new relation.

    options

    The arguments with which to initialize the new relation.

    returns

    A new CacheRelation that can be called upon to execute the given query on the Caché cluster as needed.

    Definition Classes
    CacheDataSource → RelationProvider
    Exceptions thrown

    IllegalArgumentException if passed an invalid parameter value.

    See also

    JDBC to Other Databases for more on the semantics of the column, lo, hi, and partitions parameters.

  8. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  9. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  10. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  11. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  12. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  13. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  14. lazy val log: Logger

    Permalink

    A dedicated logger that instances of this class can write to.

    A dedicated logger that instances of this class can write to.

    Definition Classes
    Logging
  15. def logName: String

    Permalink

    The name of the logger that instances of this class can write to.

    The name of the logger that instances of this class can write to.

    Definition Classes
    Logging
  16. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  17. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  18. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  19. val shortName: String

    Permalink

    A short user-friendly name for the data source.

    A short user-friendly name for the data source.

    Definition Classes
    CacheDataSource → DataSourceRegister
  20. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  21. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  22. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  23. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  24. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from CacheDataSource

Inherited from Logging

Inherited from RelationProvider

Inherited from CreatableRelationProvider

Inherited from DataSourceRegister

Inherited from AnyRef

Inherited from Any

Ungrouped