Overview of the Java Persister

The InterSystems IRIS® Persister for Java is designed to ingest data streams and persist them to a database at extremely high speed. Each thread-safe Persister instance consumes a data stream, serializes each record, and writes each serialized record to an output buffer or pool of buffers. Each buffer in a pool maintains a separate connection to an InterSystems IRIS server.

The Persister SDK uses a format based on the Apache AvroOpens in a new tab schema-based data serialization format, which enables extremely fast, flexible, reliable data persistence. The format consists of two parts:

a schema, which describes the structure the data
data records, serialized in a compact format without repetitive structural information

Since the data and the schema are stored together, records can always be serialized or deserialized without any previous knowledge of their structure.

The Persister SDK provides several ways to create schemas. They can be generated from an existing ObjectScript class, created by inference from source data, or designed using the SchemaBuilder class.

When data is serialized to an InterSystems IRIS database, schemas are stored on the server in a Schema Registry, where they are available to any Persister application. Each schema in the registry defines a corresponding ObjectScript class, and records are stored as serialized instances of those classes.

The serialized classes are immediately usable through standard access methods, including ObjectScript and SQL. Indexes can be generated during serialization or deferred until later.

The Persister SDK has three main classes:

Persister handles all aspects of reading, serializing, buffering, and writing data. Each instance of Persister is bound to one specific schema and its associated database extent. Persisters are thread-safe, and allow precise control and monitoring of buffers and buffer queues. See Serializing Data with Persister for details.
SchemaManager implements schemas and makes them available to Persisters. It maintains a local cache of schemas for the current application, and synchronizes the cache with a persistent Schema Registry on the server. It provides Persisters with a connection to the database, and includes tools for creating and changing both schemas and their corresponding database extents. See Implementing Schemas with SchemaManager for details.
SchemaBuilder is a utility that provides methods to construct schema definitions, which are returned as JSON strings. All construction methods are static and calls can be nested. Field types can be specified directly, or can be inferred from Java types, classes, or objects. See Designing Schemas with SchemaBuilder for details.

The following code fragments demonstrate the basic steps from schema creation to data serialization. The code uses the three main Persister SDK classes to create a schema, synchronize it to the server, and persist the serialized data (see Hello Persister in the Java Persister Examples section for a complete listing of the source application).

Persister Workflow

The test data for this example consists of three String[] objects that are used to create a stream for the Persister to ingest.

  String[][] data = new String[][]{{"Hello"},{"Bonjour"},{"Guten Tag"}};
  Stream<Object[]> stream = Arrays.stream(data);

SchemaBuilder examines the first data record to infer the data structure, and returns the resulting schema as JSON string schemaJson. The schema name is Demo.Hello.

  String[] fieldnames = new String[]{"greeting"}
  String schemaJson = SchemaBuilder.infer(data[0],"Demo.Hello",fieldnames);

Next, a SchemaManager is created and connected to the InterSystems IRIS server (this example assumes that JDBC connection objectOpens in a new tab irisConn already exists). The schema manager makes the JSON schema definition available to the application by synchronizing it to the Schema Registry on the server.

  SchemaManager manager = new SchemaManager(irisConn);
  RecordSchema schemaRec = manager.synchronizeSchema(schemaJson);

The synchronized schema is returned as schemaRec, a canonical RecordSchema object that identifies the extent of the associated ObjectScript class on the server. Synchronizing a schema creates a new extent if one does not already exist. The ObjectScript class has the same name as the schema, Demo.Hello.

Finally, a Persister object is created. It is bound to the Demo.Hello extent identified by schemaRec, and accesses the server through the connection provided by manager.

  Persister persister = Persister.createPersister(manager, schemaRec, 
    Persister.INDEX_MODE_DEFERRED);

Each item in the data stream is passed to persister, which serializes the data and inserts each serialized record into the database extent.

  persister.deleteExtent();  // delete old test data
  stream.map(d -> new ArrayRecord(d, schemaRec)).forEach(persister::insert);

Once the data has been persisted, it can be retrieved by standard database access methods such as an SQL query.

  Statement statement = irisConn.createStatement();
  ResultSet rs = statement.executeQuery( "SELECT %ID, * FROM Demo.Hello");
  while (rs.next()) { 
    System.out.printf( "\n Greeting: %s", rs.getString("greeting"));
  }

The following sections discuss how the main Persister SDK classes are typically used:

Serializing Data with Persister — describes how a Persister reads, serializes, buffers, and writes data.
Implementing Schemas with SchemaManager — describes how a SchemaManager implements a schema on the server and makes it available to Persisters.
Designing Schemas with SchemaBuilder — describes the structure of a schema and demonstrates how to create one with the SchemaBuilder utility.

See Java Persister Examples for complete program listings that are the source for many of the examples shown in other parts of this document.

Serializing Data with Persister

The InterSystems Persister for Java