Using Caché Globals
SQL and Object Use of Multidimensional Storage
[Home] [Back] [Next]
InterSystems: The power behind what matters   
Class Reference   
Search:    

This chapter describes how the Caché object and SQL engines make use of multidimensional storage (globals) for storing persistent objects, relational tables, and indices.

Though the Caché object and SQL engines automatically provide and manage data storage structures, it can be useful to understand the details of how this works.
The storage structures used by the object and relational view of data are identical. For simplicity, this document only describes storage from the object perspective.
Data
Every persistent class that uses the %CacheStorage storage class (the default) can store instances of itself within the Caché database using one or more nodes of multidimensional storage (globals).
Every persistent class has a storage definition that defines how its properties are stored within global nodes. This storage definition (referred to as “default structure”) is managed automatically by the class compiler. (You can modify this storage definition or even provide alternate versions of it if you like. This is not discussed in this document.)
Default Structure
The default structure used for storing persistent objects is quite simple:
For example, suppose we define a simple persistent class, MyApp.Person, with two literal properties:
Class MyApp.Person Extends %Persistent
{
Property Name As %String;
Property Age As %Integer;
}
If we create and save two instances of this class, the resulting global will be similar to:
 ^MyApp.PersonD = 2  // counter node
 ^MyApp.PersonD(1) = $LB("",530,"Abraham")
 ^MyApp.PersonD(2) = $LB("",680,"Philip")
Note that the first piece of the $List structure stored in each node is empty; this is reserved for a class name. If we define any subclasses of this Person class, this slot contains the subclass name. The %OpenId method (provided by the %Persistent class) uses this information to polymorphically open the correct type of object when multiple objects are stored within the same extent. This slot shows up in the class storage definition as a property named “%%CLASSNAME”.
For more details, refer to the section on subclasses below.
IDKEY
The IDKEY mechanism allows you to explicitly define the value used as an object ID. To do this, you simply add an IDKEY index definition to your class and specify the property or properties that will provide the ID value. Note that once you save an object, its object ID value cannot change. This means that after you save an object that uses the IDKEY mechanism, you can no longer modify any of the properties on which the object ID is based.
For example, we can modify the Person class used in the previous example to use an IDKEY index:
Class MyApp.Person Extends %Persistent
{
Index IDKEY On Name [ Idkey ];

Property Name As %String;
Property Age As %Integer;
}
If we create and save two instances of the Person class, the resulting global is now similar to:
 ^MyApp.PersonD("Abraham") = $LB("",530,"Abraham")
 ^MyApp.PersonD("Philip") = $LB("",680,"Philip")
Note that there is no longer any counter node defined. Also note that by basing the object ID on the Name property, we have implied that the value of Name must be unique for each object.
If the IDKEY index is based on multiple properties, then the main data nodes has multiple subscripts. For example:
Class MyApp.Person Extends %Persistent
{
Index IDKEY On (Name,Age) [ Idkey ];

Property Name As %String;
Property Age As %Integer;
}
In this case, the resulting global will now be similar to:
 ^MyApp.PersonD("Abraham",530) = $LB("",530,"Abraham")
 ^MyApp.PersonD("Philip",680) = $LB("",680,"Philip")
Important:
There must not be a sequential pair of vertical bars (||) within the values of any property used by an IDKEY index, unless that property is a valid reference to an instance of a persistent class. This restriction is imposed by the way in which the Caché SQL mechanism works. The use of || in IDKey properties can result in unpredictable behavior.
Subclasses
By default, any fields introduced by a subclass of a persistent object are stored in an additional node. The name of the subclass is used as an additional subscript value.
For example, suppose we define a simple persistent MyApp.Person class with two literal properties:
Class MyApp.Person Extends %Persistent
{
Property Name As %String;
Property Age As %Integer;
}
Now we define a persistent subclass, MyApp.Student, that introduces two additional literal properties:
Class MyApp.Student Extends Person
{
Property Major As %String;
Property GPA As %Float;
}
If we create and save two instances of this MyApp.Student class, the resulting global will be similar to:
^MyApp.PersonD = 2  // counter node
^MyApp.PersonD(1) = $LB("Student",19,"Jack")
^MyApp.PersonD(1,"Student") = $LB(3.2,"Physics")

^MyApp.PersonD(2) = $LB("Student",20,"Jill")
^MyApp.PersonD(2,"Student") = $LB(3.8,"Chemistry")
The properties inherited from the Person class are stored in the main node, and those introduced by the Student class are stored in an additional subnode. This structure ensures that the Student data can be used interchangeably as Person data. For example, an SQL query listing names of all Person objects correctly picks up both Person and Student data. This structure also makes it easier for the Class Compiler to maintain data compatibility as properties are added to either the super- or subclasses.
Note that the first piece of the main node contains the string “Student” — this identifies nodes containing Student data.
Parent-Child Relationships
Within parent-child relationships, instances of child objects are stored as subnodes of the parent object to which they belong. This structure ensures that child instance data is physically clustered along with parent data.
For example, here is the definition for two related classes, Invoice:
/// An Invoice class
Class MyApp.Invoice Extends %Persistent
{
Property CustomerName As %String;

/// an Invoice has CHILDREN that are LineItems
Relationship Items As LineItem  [inverse = TheInvoice, cardinality = CHILDREN];
}
and LineItem:
/// A LineItem class
Class MyApp.LineItem Extends %Persistent
{
Property Product As %String;
Property Quantity As %Integer;

/// a LineItem has a PARENT that is an Invoice
Relationship TheInvoice As Invoice [inverse = Items, cardinality = PARENT];
}
If we store several instances of Invoice object, each with associated LineItem objects, the resulting global will be similar to:
^MyApp.InvoiceD = 2  // invoice counter node
^MyApp.InvoiceD(1) = $LB("","Wiley Coyote")
^MyApp.InvoiceD(1,"Items",1) = $LB("","Rocket Roller Skates",2)
^MyApp.InvoiceD(1,"Items",2) = $LB("","Acme Magnet",1)

^MyApp.InvoiceD(2) = $LB("","Road Runner")
^MyApp.InvoiceD(2,"Items",1) = $LB("","Birdseed",30)
For more information on relationships, refer to the Relationships chapter in the Using Caché Objects.
Embedded Objects
Embedded objects are stored by first converting them to a serialized state (by default a $List structure containing the object’s properties) and then storing this serial state in the same way as any other property.
For example, suppose we define a simple serial (embeddable) class with two literal properties:
Class MyApp.MyAddress Extends %SerialObject
{
Property City As %String;
Property State As %String;
}
We now modify our earlier example to add an embedded Home address property:
Class MyApp.MyClass Extends %Persistent
{
Property Name As %String;
Property Age As %Integer;
Property Home As MyAddress;
}
If we create and save two instances of this class, the resulting global is equivalent to:
 ^MyApp.MyClassD = 2  // counter node
 ^MyApp.MyClassD(1) = $LB(530,"Abraham",$LB("UR","Mesopotamia"))
 ^MyApp.MyClassD(2) = $LB(680,"Philip",$LB("Bethsaida","Israel"))
Streams
Global streams are stored within globals by splitting their data into a series of chunks, each smaller than 32K bytes, and writing the chunks into a series of sequential nodes. File streams are stored in external files.
Indices
Persistent classes can define one or more indices; additional data structures are used to make operations (such as sorting or conditional searches) more efficient. Caché SQL makes use of such indices when executing queries. Caché Object and SQL automatically maintain the correct values within indices as insert, update, and delete operations are carried out.
Storage Structure of Standard Indices
A standard index associates an ordered set of one or more property values with the object ID values of the object containing the properties.
For example, suppose we define a simple persistent MyApp.Person class with two literal properties and an index on its Name property:
Class MyApp.Person Extends %Persistent
{
Index NameIdx On Name;

Property Name As %String;
Property Age As %Integer;
}
If we create and save several instances of this Person class, the resulting data and index globals is similar to:
 // data global
 ^MyApp.PersonD = 3  // counter node
 ^MyApp.PersonD(1) = $LB("",34,"Jones")
 ^MyApp.PersonD(2) = $LB("",22,"Smith")
 ^MyApp.PersonD(3) = $LB("",45,"Jones")


 // index global
 ^MyApp.PersonI("NameIdx"," JONES",1) = ""
 ^MyApp.PersonI("NameIdx"," JONES",3) = ""
 ^MyApp.PersonI("NameIdx"," SMITH",2) = ""
Note the following things about the index global:
  1. By default, it is placed in a global whose name is the class name with an “I” (for Index) appended to it.
  2. By default, the first subscript is the index name; this allows multiple indices to be stored in the same global without conflict.
  3. The second subscript contains the collated data value. In this case, the data is collated using the default SQLUPPER collation function. This converts all characters to uppercase (to sort without regard to case) and prepends a space character (to force all data to collate as strings).
  4. The third subscript contains the Object ID value of the object that contains the indexed data value.
  5. The nodes themselves are empty; all the needed data is held within the subscripts. Note that if an index definition specifies that data should be stored along with the index, it is placed in the nodes of the index global.
This index contains enough information to satisfy a number of queries, such as listing all Person class order by Name.
Bitmap Indices
A bitmap index is similar to a standard index except that it uses a series of bitstrings to store the set of object ID values that correspond to the indexed value.
Logical Operation of Bitmap Indices
A bitstring is a string containing a set of bits (0 and 1 values) in a special compressed format. Caché includes a set of functions to efficiently create and work with bitstrings. These are listed in the following table:
BitString Operations
Function Description
$Bit Set or get a bit within a bitstring.
$BitCount Count the number of bits within a bitstring.
$BitFind Find the next occurrence of a bit within a bitstring.
$BitLogic Perform logical (AND, OR) operations on two or more bitstrings.
Within a bitmap index, ordinal positions within a bitstring correspond to rows (Object ID number) within the indexed table. For a given value, a bitmap index maintains a bitstring that contains 1 for each row in which the given value is present, and contains 0 for every row in which it is absent. Note that bitmap indices only work for objects that use the default storage structure with system-assigned, numeric Object ID values.
For example, suppose we have a table similar to the following:
ID State Product
1 MA Hat
2 NY Hat
3 NY Chair
4 MA Chair
5 MA Hat
If the State and Product columns have bitmap indices, then they contain the following values:
A bitmap index on the State column contains the following bitstring values:
MA 1 0 0 1 1
NY 0 1 1 0 0
Note that for the value, “MA”, there is a 1 in the positions (1, 4, and 5) that correspond to the table rows with State equal to “MA”.
Similarly, a bitmap index on the Product column contains the following bitstring values (note that the values are collated to uppercase within the index):
CHAIR 0 0 1 1 0
HAT 1 1 0 0 1
The Caché SQL Engine can execute a number of operations by iterating over, counting the bits within, or performing logical combinations (AND, OR) on the bitstrings maintained by these indices. For example, to find all rows that have State equal to “MA” and Product equal to “HAT”, the SQL Engine can simply combine the appropriate bitstrings together with logical AND.
In addition to these indices, the system maintains an additional index, called an “extent index,” that contains a 1 for every row that exists and a 0 for rows that do not (such as deleted rows). This is used for certain operations, such as negation.
Storage Structure of Bitmap Indices
A bitmap index associates an ordered set of one or more property values with one or more bitstrings containing the Object ID values corresponding to the property values.
For example, suppose we define a simple persistent MyApp.Person class with two literal properties and a bitmap index on its Age property:
Class MyApp.Person Extends %Persistent
{
Index AgeIdx On Age [Type = bitmap];

Property Name As %String;
Property Age As %Integer;
}
If we create and save several instances of this Person class, the resulting data and index globals is similar to:
 // data global
 ^MyApp.PersonD = 3  // counter node
 ^MyApp.PersonD(1) = $LB("",34,"Jones")
 ^MyApp.PersonD(2) = $LB("",34,"Smith")
 ^MyApp.PersonD(3) = $LB("",45,"Jones")

 // index global
 ^MyApp.PersonI("AgeIdx",34,1) = 110...
 ^MyApp.PersonI("AgeIdx",45,1) = 001...

 // extent index global
 ^MyApp.PersonI("$Person",1) = 111...
 ^MyApp.PersonI("$Person",2) = 111...
Note the following things about the index global:
  1. By default, it is placed in a global whose name is the class name with an “I” (for Index) appended to it.
  2. By default, the first subscript is the index name; this allows multiple indices to be stored in the same global without conflict.
  3. The second subscript contains the collated data value. In this case, a collation function is not applied as this is an index on numeric data.
  4. The third subscript contains a chunk number; for efficiency, bitmap indices are divided into a series of bitstrings each containing information for about 64000 rows from the table. Each of these bitstrings are referred to as a chunk.
  5. The nodes contain the bitstrings.
Also note: because this table has a bitmap index, an extent index is automatically maintained. This extent index is stored within the index global and uses the class name, with a “$” character prepended to it, as its first subscript.
Direct Access of Bitmap Indices
The following example uses a class extent index to compute the total number of stored object instances (rows). Note that it uses $Order to iterate over the chunks of the extent index (each chunk contains information for about 64000 rows):
/// Return the number of objects for this class.<BR>
/// Equivalent to SELECT COUNT(*) FROM Person
ClassMethod Count() As %Integer
{
    New total,chunk,data
    Set total = 0
    
    Set chunk = $Order(^MyApp.PersonI("$Person",""),1,data)
    While (chunk '= "") {
        Set total = total + $bitcount(data,1)
        Set chunk = $Order(^MyApp.PersonI("$Person",chunk),1,data)
    }

    Quit total
}