Skip to main content
Previous section   Next section

Introducing InterSystems IRIS Document Database (DocDB)

InterSystems IRIS® data platform DocDB is a facility for storing and retrieving database data. It is compatible with, but separate from, traditional SQL table and field (class and property) data storage and retrieval. It is based on JSON (JavaScript Object Notation) which provides support for web-based data exchange. InterSystems IRIS provides support for developing DocDB databases and applications in REST and in ObjectScript, as well as providing SQL support for creating or querying DocDB data.

By its nature, InterSystems IRIS Document Database is a schema-less data structure. That means that each document has its own structure, which may differ from other documents in the same database. This has several benefits when compared with SQL, which requires a pre-defined data structure.

The word “document” is used here as a specific industry-wide technical term, as a dynamic data storage structure. “Document”, as used in DocDB, should not be confused with a text document, or with documentation.

Features and Benefits

Some of the key features of InterSystems IRIS DocDB include:

    Application Flexibility: Documents do not require a predefined schema. This allows applications to rapidly set up their data environments and easily adapt to changes in data structure. This allows for rapid capture of data. Document Database can begin capturing data immediately, without having to define a structure for that data. This is ideal for unpredictable data feeds, as are often found in web-based and social media data sources. If in capturing a body of data, structures within that data become evident or emerge as useful, your document data structure can evolve. Existing captured data can co-exist with this more-structured data representation. It is up to your application to determine the data structure for each document and process it appropriately. One way to do this is to establish a key:value pair representing the document structure version. Thus, conversion of data from one JSON structure to another can be performed gradually, without interrupting data capture or access, or data conversion cannot be done at all.

    Sparse Data Efficiency: Documents are very efficient at storing sparse data because attributes with a particular key can appear in some documents in a collection but not in others. A document may have one set of defined keys; another document in the same collection may have a very different set of defined keys. In contrast, SQL requires that every record contain every key; in sparse data many records have keys with NULL values. For example, an SQL patient medical record provides fields for many diagnoses, conditions, and test; for most patients most of these fields are NULL. The system allocates space for all of these unused fields. In a DocDB patient medical record only those keys that contain actual data are present.

    Hierarchical Data Storage: DocDB is very efficient at storing hierarchically structured data. In a key:value pair data can be nested within the data to an unlimited number of levels. This means that hierarchical data can be stored de-normalized. In the SQL relational model, hierarchical data is stored normalized by using multiple tables.

    Dynamic Data Types: A key does not have a defined data type. The value assigned to the key has an associated data type. Therefore a key:value pair in one document may have one data type; a key:value pair for the same key in another document may have a different data type. Because data types are not fixed, you can change the data type of a key:value pair in a document at runtime by assigning a new value that has a different data type.

These features of DocDB have important implications for application development. In a traditional SQL environment, the database design establishes data structure that is followed in developing applications. In DocDB, data structure is largely provided in the applications themselves.

Components of DocDB

The package name for DocDB is %DocDB. It contains the following classes:

    %DocDB.Database: an ObjectScript persistent class used to manage the documents. A Database is a set of Documents, implemented by a persistent class that extends %DocDB.Document. You use methods of this class to create a database, retrieve an existing database, or delete a database, and within a database to insert a document, retrieve a document, or delete a document.

    %DocDB.Document: a structure used to store document data. It consists of the Document ID, the last modified date, and the document contents. The document content is stored in the %Doc property. Data is stored either as a JSON dynamic object or as a JSON dynamic array. A Document consists of multiple key:value pairs (an object) or an ordered list of values (an array).

    %DocDB.REST implements DocDB REST APIs to access the document database.

A related class is %Library.DynamicAbstractObject which is used to contain the JSON structures, and contains subclasses for JSON arrays and JSON key:value objects.

Creating a Database

A Database is an ObjectScript persistent class that extends the abstract class %DocDB.Document. You must instantiate a Database for each namespace used for DocDB. Only one database is required per namespace. Commonly, it is assigned the same name as the namespace name.

The following example shows how to create a Database through class definition:

Class MyDBs.People Extends %DocDB.Document [ DdlAllowed ]

The following example shows how to create a Database using the %CreateDatabase() method, specifying a package name:

  SET personDB = ##class(%DocDB.Database).%CreateDatabase("MyDBs.People")

The following example shows how to create a Database using the %CreateDatabase() method, taking the ISC.DM default package name:

  SET personDB = ##class(%DocDB.Database).%CreateDatabase("People")

The %SYSTEM.DocDB class provides an interface for managing Document Databases.

Refer to the “Managing Documents” chapter for a description of the API methods used to create or get a document database, to populate a database with documents, and to retrieve data from those documents.

JSON Structure

The InterSystems IRIS Document Database supports JSON dynamic objects and JSON dynamic arrays. You can create these JSON structures using the SET command.

The following example shows how hierarchical data can be stored using JSON. The first SET creates a dynamic abstract object containing nested JSON-structured key:value pairs and arrays. The example then converts the dynamic abstract object to a JSON string, then inserts that JSON string into an existing document database as a document.

  SET dynAbObj = {
   "FullName":"John Smith",
   "FirstName":"John",
   "Address":{
              "street":"101 Main Street",
              "city":"Mapleville",
              "state":"NY",
              "postal code":10234
             },
   "PhoneNumber":
              [
               {"type":"home","number":"212-456-9876"},
               {"type":"cell","number":"401-123-4567"},
               {"type":"work","number":"212-444-5000"}
              ]
  }
  SET jstring = dynAbObj.%ToJSON() // dynamic abstract object to JSON string
  DO personDB.%FromJSON(jstring)   // JSON string inserted into document database

In this example, FullName is stored as a simple key:value pair. Address has a substructure which is stored as an object consisting of key:value pairs. PhoneNumber has a substructure which is stored as an array.

For further details refer to “Creating and Modifying Dynamic Entities” in Using JSON.

Document

A Document is stored in the %Doc property of an instance of the Database class you create. This is shown in the following example, which stores a JSON array in the %Doc property:

   SET jarry = ["Anne","Bradford","Charles","Deborah"]
   SET myoref = ##class(MyDBs.DB1).%New()
   SET myoref.%Doc = jarry
   SET docoref = myoref.%Doc
   WRITE "%Doc property oref: ",docoref,!
   WRITE "%Doc Property value: ",docoref.%ToJSON()

By default, the %Doc data type is %Library.DynamicAbstractObject, which is the data type used to store a JSON object or a JSON array. You can specify a different data type in the %CreateDatabase() method.

Other Database properties:

    %DocumentId is an IDENTITY property containing a unique integer that identifies a document; %DocumentId counts from 1. In most cases, %DocumentId values are system-assigned. A %DocumentId must be unique; %DocumentIds are not necessarily assigned sequential; gaps may occur in an assignment sequence. Document database also automatically generates an IdKey index for %DocumentId values.

    %LastModified records a UTC timestamp when the Document instance was defined.

De-Normalized Data Structure

The following is a JSON example of a traditional SQL normalized relational data structure. It consists of two documents, which might be contained in two different collections:

{
   "id":123,
   "Name":"John Smith",
   "DOB":"1990-11-23",
   "Address":555
}
{
   "id":555,
   "street":"101 Main Street",
   "city":"Mapleville",
   "state":"NY",
   "postal code":10234
 }

The following is the same data de-normalized, specified as a single document in a collection containing a nested data structure:

{
   "id":123,
   "Name":"John Smith",
   "DOB":"1990-11-23",
   "Address":{
              "street":"101 Main Street",
              "city":"Mapleville",
              "state":"NY",
              "postal code":10234
             }
 }

In SQL converting from the first data structure to the second would involve changing the table data definition then migrating the data.

In DocDB, because there is no fixed schema, these two data structures can co-exist as different representations of the same data. The application code must specify which data structure it will access. You can either migrate the data to the new data structure, or leave the data unchanged in the old data structure format, in which case DocDB migrates data each time it accesses it using the new data structure.

For further details on JSON data structure, refer to the “Flexible Data Structure” chapter of this manual.

Data Types and Values

In DocDB, a key does not have a data type. However, a data value imported to DocDB may have an associated data type. Because the data type is associated with the specific value, replacing the value with another value may result in changing the data type of the key:value pair for that record.

InterSystems IRIS DocDB does not have any reserved words or any special naming conventions. In a key:value pair, any string can be used as a key; any string or number can be used as a value. The key name can be the same as the value: "name":"name". A key name can be the same as its index name.

InterSystems IRIS DocDB represents data values as JSON values, as shown in the following table:

Strings String
Numbers Numbers are represented in canonical form, with the following exception: JSON fractional numbers between 1 and -1 are represented with a leading zero integer (for example, 0.007); the corresponding InterSystems IRIS numbers are represented without the leading zero integer (for example, .007).
$DOUBLE numbers Represented as IEEE double-precision (64–bit) floating point numbers.
Non-printing characters
JSON provides escape code representations of the following non-printing characters:
$CHAR(8): ”\b”
$CHAR(9): ”\t”
$CHAR(10): ”\n”
$CHAR(12): ”\f”
$CHAR(13): ”\r”
All other non-printable characters are represented by an ecaped hexidecimal notation. For example, $CHAR(11) as ”\u000b". Printable characters can also be represented using ecaped hexidecimal (Unicode) notation. For example, the Greek lowercase letter alpha can be represented as ”\u03b1".
Other escaped characters
JSON escapes two printable characters, the double quote character and the backslash character:
$CHAR(34): ”\””
$CHAR(92): ”\\”

JSON Special Values

JSON special values can only be used within JSON objects and JSON arrays. They are different from the corresponding ObjectScript special values. JSON special values are specified without quotation marks (the same values within quotation marks is an ordinary data value). They can be specified in any combination of uppercase and lowercase letters; they are stored as all lowercase letters.

    JSON represents the absence of a value by using the null special value. Because Document Database does not normally include a key:value pair unless there is an actual value, null is only used in special circumstances, such as a placeholder for an expected value. This use of null is shown in the following example:

      SET jsonobj = {"name":"Fred","spouse":null}
      WRITE jsonobj.%ToJSON()

    JSON represents a boolean value by using the true and false special values. This use of boolean values is shown in the following example:

      SET jsonobj = {"name":"Fred","married":false}
      WRITE jsonobj.%ToJSON()

    ObjectScript specifies boolean values using 0 and 1. (Actually “true” can be represented by 1 or by any non-zero number.) These values are not supported as boolean values within JSON documents.

In a few special cases, JSON uses parentheses to clarify syntax:

    If you define a local variable with the name null, true, or false, you must use parentheses within JSON to have it treated as a local variable rather than a JSON special value. This is shown in the following example:

      SET true=1
      SET jsonobj = {"bool":true,"notbool":(true)}
      WRITE jsonobj.%ToJSON()

    If you use the ObjectScript Follows operator (]) within an expression, you must use parentheses within JSON to have it treated as this operator, rather than as a JSON array terminator. In the following example, the expression b]a tests whether b follows a in the collation sequence, and returns an ObjectScript boolean value. The Follows expression must be enclosed in parentheses:

      SET a="a",b="b"
      SET jsonarray=[(b]a)]
      WRITE jsonarray.%ToJSON()
Previous section   Next section