Using Caché Globals
This chapter describes the logical (programmatic) view of globals and provides an overview of how globals are physically stored on disk. Its sections are:
Logical Structure of Globals
A global is a named multidimensional array that is stored within a physical Caché database. Within an application, the mapping of globals to physical databases is based on the current namespace a namespace provides a logical, unified view of one or more physical databases.
Topics related to the logical structure of globals are:
Global Naming Conventions and Limits
The name of a global specifies its purpose and use. There are two types of globals and a separate set of variables, called process-private globals
A global This is what might be called a standard global; typically, these are simply referred to as globals. It is a persistent, multidimensional array that resides in the current namespace.
The naming conventions for globals are:
A global name begins with a caret character (^) prefix. This caret distinguishes a global from a local variable.
The first character after the caret (^) prefix in a global name can be:
The other characters of a global name may be letters, numbers, or the period (.) character. The percent (%) character cannot be used, except as the first character of a global name. The period (.) character cannot be used as the last character of a global name.
A global name may be up to 31 characters long (exclusive of the caret character prefix). You can specify global names that are significantly longer, but Caché treats only the first 31 characters as significant.
Global names are case-sensitive.
In the CACHESYS
database, InterSystems reserves to itself all global names except those starting with z
, and %Z
. In all other databases, InterSystems reserves all global names starting with ISC.
Sample Global Names and Their Uses
The following are examples of the various kinds of global names and how each is used:
Global names can contain only valid identifier characters; by default, these are as specified above. However, your NLS (National Language Support) settings may define a different set of valid identifier characters. Global names cannot contain Unicode characters.
Thus, the following are all valid global names:
SET ^a="The quick "
SET ^A="brown fox "
SET ^A7="jumped over "
SET ^A.7="the lazy "
SET ^A1B2C3="dog's back."
KILL ^a,^A,^A7,^A.7,^A1B2C3 // keeps the database clean
Introduction to Global Nodes and Subscripts
A global typically has multiple nodes
, generally identified by a subscript
or set of subscripts. For a basic example:
This statement refers to the global node ^Demo(1)
, which is a node within the ^Demo
global. This node is identified by one subscript.
This statement refers to the global node ^Demo
, which does not use any subscripts.
The nodes of a global form a hierarchical structure. Caché ObjectScript provides commands that take advantage of this structure. You can, for example, remove a node or remove a node and all its children. For a full discussion, see the next chapter
The following sections provide details on the rules for subscripts
and for global nodes
Subscripts have the following rules:
Subscript values are case-sensitive.
A subscript value can be any Caché ObjectScript expression, provided that the expression does not evaluate to the null string ("").
The value can include characters of all types, including blank spaces, non-printing characters, and (if your installation is a Unicode installation) Unicode characters. (Note that non-printing characters are less practical in subscript values.)
Before resolving a global reference, Caché evaluates each subscript in the same way it evaluates any other expression. In the following example, we set one node of the ^Demo
global, and then we refer to that node in several equivalent ways:
SAMPLES>s ^Demo(1+2+3)="a value"
The preceding rules apply for all Caché supported collations
. For older collations still in use for compatibility reasons, such as pre-ISM-6.1
, the rules for subscripts are more restrictive. For example, character subscripts cannot have a control character as their initial character; and there are limitations on the number of digits that can be used in integer subscripts.
Because of restrictions such as these, there is no guarantee that subscripts used for supported collations will be valid in pre-Caché collations.
Within applications, nodes typically contain the following types of structure:
String or numeric data. With a Unicode version of Caché, string data may contain native Unicode characters.
A string with multiple fields delimited by a special character:
^Data(10) = "Smith^John^Boston"
You can use the Caché ObjectScript $PIECE
function to pull such data apart.
Multiple fields contained within a Caché $LIST
structure. The $LIST
structure is a string containing multiple length-encoded values. It requires no special delimiter characters.
A null string (""). In cases where the subscripts are themselves used as the data, no data is stored in the actual node.
A bitstring. In cases where a global is used to store part of a bitmap index, the value stored within a node is a bitstring. A bitstring is a string containing a logical, compressed set of 1 and 0 values. You can construct a bitstring using the $BIT
Part of a larger set of data. For example, the object and SQL engines store streams
) as a sequential series of 32K nodes within a global. By means of the stream interface, users of streams are unaware that streams are stored in this fashion.
Within a global, nodes are stored in a collated (sorted) order.
Applications typically control the order in which nodes are sorted by applying a conversion to values used as subscripts. For example, the SQL engine, when creating an index on string values, converts all string values to uppercase letters and prepends a space character to make sure that the index is both not case-sensitive and collates as text (even if numeric values are stored as strings).
Maximum Length of a Global Reference
The total length of a global reference that is, the reference to a specific global node or subtree is limited to 511 encoded characters (which may be fewer than 511 typed characters).
For a conservative determination of the size of a given global reference, use the following guidelines:
For the global name: add 1 for each character.
For a purely numeric subscript: add 1 for each digit, sign, or decimal point.
For a subscript that includes nonnumeric characters: add 3 for each character.
If a subscript is not purely numeric, the actual length of the subscript varies depending on the character set used to encode the string. A multibyte character can take up to 3 bytes.
Note that an ASCII character can take up 1 or 2 bytes. If the collation does case folding, an ASCII character can take 1 byte for the character and 1 byte for the disambiguation byte. If the collation does not perform case folding, an ASCII character takes 1 byte.
For each subscript, add 1.
If the sum of these numbers is greater than 511, the reference may be too long.
Because of the way that the limitation is determined, if you must have long subscript or global names, it is helpful to avoid a large number of subscript levels. Conversely, if you are using multiple subscript levels, avoid long global names and long subscripts. Because you may not be able to control the character set(s) you are using, it is useful to keep global names and subscripts shorter.
When there are doubts about particular references, it is useful to create test versions of global references that are of equivalent length to the longest expected global reference (or even a little longer). Data from these tests provides guidance on possible revisions to your naming conventions prior to building your application.
Physical Structure of Globals
Globals are stored within physical files using a highly optimized structure. The code that manages this data structure is also highly optimized for every platform that Caché runs on. These optimizations ensure that operations on globals have high throughput (number of operations per unit of time), high concurrency (total number of concurrent users), efficient use of cache memory, and require no ongoing performance-related maintenance (such as frequent rebuilding, re-indexing, or compaction).
The physical structure used to store globals is completely encapsulated; applications do not worry about physical data structure in any way.
Globals are stored on disk within a series of data blocks; the size of each block (typically 8KB) is determined when the physical database is created. To provide efficient access to data, Caché maintains a sophisticated B-tree-like structure that uses a set of pointer blocks to link together related data blocks. Caché maintains a buffer pool an in-memory cache of frequently referenced blocks to reduce the cost of fetching blocks from disk.
While many database technologies use B-tree-like structures for data storage, Caché is unique in many ways:
The storage mechanism is exposed via a safe, easy-to-use interface.
Subscripts and data are compressed to save disk space as well as valuable in-memory cache space.
The storage engine is optimized for transaction processing operations: inserts, updates, and deletes are all fast. Unlike relational systems, Caché never requires rebuilding indices or data in order to restore performance.
The storage engine is optimized for maximum concurrent access.
Data is automatically clustered for efficient retrieval.
Within data blocks, globals are stored sequentially. Both subscripts and data are stored together. There is a special case for large node values (long strings) which are stored within separate blocks. A pointer to this separate block is stored along with the node subscript.
For example, suppose you have a global with the following contents:
^Data(1999) = 100
^Data(1999,1) = "January"
^Data(1999,2) = "February"
^Data(2000) = 300
^Data(2000,1) = "January"
^Data(2000,2) = "February"
Most likely, this data would be stored within a single data block with a contiguous structure similar to (the real representation is a series of bytes):
An operation on ^Data
can retrieve its entire contents with a minimum number of disk operations.
There are a number of additional techniques used to ensure that inserts, updates, and deletes are performed efficiently.
A global resides within a particular Caché database. Portions of a global can reside in different databases if appropriate mappings are used. A database can be physically located on the current system, or on a remote system accessed through Caché networking. The term dataset
refers to the system and the directory that contain a Caché database. For further details on networking, see the Distributed Data Management Guide
is a logical definition of the datasets and global mappings that together form a set of related information.
A simple global reference applies to the currently selected namespace. The namespace definition can cause this to physically access a database on the local system or a remote system. Different globals can be mapped to different locations or datasets (where a dataset refers to the system and the directory that contain a Caché database).
For example, to create a simple reference to the global ORDER in the namespace to which it currently has been mapped, use the following syntax:
This section describes two topics:
You can map globals and routines from one database to another on the same or different systems. This allows simple references to data which can exist anywhere and is the primary feature of a namespace. You can map whole globals or pieces of globals; mapping a piece of a global (or a subscript) is known as subscript-level mapping (SLM)
. Because you can map global subscripts, data can easily span disks.
Global mapping is applied hierarchically. For example, if the NSX namespace has an associated DBX database, but maps the ^x
global to the DBY database and ^x(1)
to the DBZ database, then any subscripted form of the ^x global except those that are part of the ^x(1)
hierarchy is mapped to DBY; those globals that are part of the ^x(1)
hierarchy are mapped to DBZ. The following diagram illustrates this hierarchy:
Simple Subscript-level Mapping
In this diagram, the globals and their hierarchy appear in gray, and the databases to which they are mapped appear in black.
It is also possible to map part of a mapped, subscripted global to another database, or even back to the database to which the initial global is mapped. Suppose that the previous example had the additional mapping of the ^x(1,2)
global back to the DBY database. This would appear as follows:
More Complex Subscript-level Mapping
Again, the globals and their hierarchy appear in gray, and the databases to which they are mapped appear in black.
Once you have mapped a global from one namespace to another, you can reference the mapped global as if it were in the current namespace with a simple reference, such as ^ORDER
When establishing subscript-level mapping ranges, the behavior of string subscripts differs from that of integer subscripts. For strings, the first character determines the range, while the range for integers uses numeric values. For example, a subscript range of ("A"):("C") contains not only AA but also AC and ABCDEF; by contrast, a subscript range of (1):(2) does not contain 11.
Using Distinct Ranges of Globals and Subscripts
Each of a namespace’s mappings must refer to distinct ranges of globals or subscripts. Mapping validation prevents the establishment of any kind of overlap. For example, if you attempt to use the Management Portal to create a new mapping that overlaps with an existing mapping, the Portal prevents this from occurring and displays an error message.
Extended Global References
There are two forms of extended references:
Explicit namespace reference You specify the name of the namespace where the global is located as part of the syntax of the global reference.
Implied namespace reference You specify the directory and, optionally, the system name as part of the syntax of the global reference. In this case, no global mappings apply, since the physical dataset (directory and system) is given as part of the global reference.
The use of explicit namespaces is preferred, because this allows for redefinition of logical mappings externally, as requirements change, without altering your application code.
Caché supports two forms of extended references:
Bracket syntax, which encloses the extended reference with square brackets ([ ]).
Environment syntax, which encloses the extended reference with vertical bars (| |).
The examples of extended globals references use the Windows directory structure. In practice, the form of such references is operating-system dependent.
You can use bracket syntax to specify an extended global reference with either an explicit namespace or an implied namespace:
In an explicit namespace reference, nspace
is a defined namespace that the global glob
has not currently been mapped or replicated to. In an implied namespace reference, dir
is a directory (the name of which includes a trailing backslash: \
is a system, and glob
is a global within that directory. If nspace
is specified as a carat (^
), the reference is to a process-private global.
You must include quotation marks around the directory and system names or the namespace name unless you specify them as variables. The directory and system together comprise an implied namespace. An implied namespace can reference either:
The specified directory on the specified system.
The specified directory on your local system, if you do not specify a system name in the reference. If you omit the system name from an implied namespace reference, you must supply a double caret (^^) within the directory reference to indicate the omitted system name.
To specify an implied namespace on a remote system:
To specify an implied namespace on the local system:
For example, to access the global ORDER in the C:\BUSINESS\
directory on a machine called SALES:
SET x = ^["C:\BUSINESS\","SALES"]ORDER
To access the global ORDER in the C:\BUSINESS\
directory on your local machine:
SET x = ^["^^C:\BUSINESS\"]ORDER
To access the global ORDER in the defined namespace MARKETING:
SET x = ^["MARKETING"]ORDER
To access the process-private global ORDER:
When creating an implied namespace extended reference involving a mirrored database, you can use its mirrored database path, in the format :mirror:mirror_name:mirror_DB_name
. For example, when referring to the database with the mirror database name mirdb1 in the mirror CORPMIR, you could form an implied reference as follows:
The mirrored database path can be used for both local and remote databases.
The environment syntax is defined as:
can have one of five formats:
The null string ("") The current namespace on the local system.
A defined namespace that global
is not currently mapped to. Namespace names are not case-sensitive. If namespace
has the special value of "^"
, it is a process-private global.
An implied namespace whose default directory is the specified directory on your local system, where dir
includes a trailing backslash (\
An implied namespace whose default directory is the specified directory on the specified remote system, where dir
includes a trailing backslash (\
omitted If there is no "env"
at all, it is a process-private global.
To access the global ORDER in your current namespace on your current system, when no mapping has been defined for ORDER, use the following syntax:
This is the same as the simple global reference:
To access the global ORDER mapped to the defined namespace MARKETING:
SET x = ^|"MARKETING"|ORDER
You can use an implied namespace to access the global ORDER in the directory C:\BUSINESS\
on your local system:
SET x = ^|"^^C:\BUSINESS\"|ORDER
You can use an implied namespace to access the global ORDER in the directory C:\BUSINESS
on a remote system named SALES:
SET x = ^|"^SALES^C:\BUSINESS\"|ORDER
To access the process-private global ORDER:
SET x = ^||ORDER