Using Caché Globals
Global Structure
[Home] [Back] [Next]
InterSystems: The power behind what matters   
Class Reference   
Search:    

This chapter describes the logical (programmatic) view of globals and provides an overview of how globals are physically stored on disk. Its sections are:

Logical Structure of Globals
A global is a named multidimensional array that is stored within a physical Caché database. Within an application, the mapping of globals to physical databases is based on the current namespace — a namespace provides a logical, unified view of one or more physical databases.
Topics related to the logical structure of globals are:
Global Naming Conventions and Limits
The name of a global specifies its purpose and use. There are two types of globals and a separate set of variables, called “process-private globals”:
The naming conventions for globals are:
In the CACHESYS database, InterSystems reserves to itself all global names except those starting with “z”, “Z”, “%z”, and “%Z”. In all other databases, InterSystems reserves all global names starting with “ISC.” and “%ISC.”.
Sample Global Names and Their Uses
The following are examples of the various kinds of global names and how each is used:
Note:
Global names can contain only valid identifier characters; by default, these are as specified above. However, your NLS (National Language Support) settings may define a different set of valid identifier characters. Global names cannot contain Unicode characters.
Thus, the following are all valid global names:
   SET ^a="The quick "
   SET ^A="brown fox "
   SET ^A7="jumped over "
   SET ^A.7="the lazy "
   SET ^A1B2C3="dog's back."
   WRITE ^a,^A,^A7,!,^A.7,^A1B2C3
   KILL ^a,^A,^A7,^A.7,^A1B2C3 // keeps the database clean 
 
Introduction to Global Nodes and Subscripts
A global typically has multiple nodes, generally identified by a subscript or set of subscripts. For a basic example:
 set ^Demo(1)="Cleopatra"
This statement refers to the global node ^Demo(1), which is a node within the ^Demo global. This node is identified by one subscript.
For another example:
 set ^Demo("subscript1","subscript2","subscript3")=12
This statement refers to the global node ^Demo("subscript1","subscript2","subscript3"), which is another node within the same global. This node is identified by three subscripts.
For yet another example:
 set ^Demo="hello world"
This statement refers to the global node ^Demo, which does not use any subscripts.
The nodes of a global form a hierarchical structure. ObjectScript provides commands that take advantage of this structure. You can, for example, remove a node or remove a node and all its children. For a full discussion, see the next chapter.
The following sections provide details on the rules for subscripts and for global nodes.
Global Subscripts
Subscripts have the following rules:
Caution:
The preceding rules apply for all Caché supported collations. For older collations still in use for compatibility reasons, such as “pre-ISM-6.1”, the rules for subscripts are more restrictive. For example, character subscripts cannot have a control character as their initial character; and there are limitations on the number of digits that can be used in integer subscripts.
Because of restrictions such as these, there is no guarantee that subscripts used for supported collations will be valid in pre-Caché collations.
Global Nodes
Unless long strings are enabled for your installation, each global node can contain approximately 32K characters of text. If long strings are enabled, the limit is much larger. (See General System Limits in the Caché Programming Orientation Guide.)
Within applications, nodes typically contain the following types of structure:
  1. String or numeric data. With a Unicode version of Caché, string data may contain native Unicode characters.
  2. A string with multiple fields delimited by a special character:
    ^Data(10) = "Smith^John^Boston"
    You can use the ObjectScript $PIECE function to pull such data apart.
  3. Multiple fields contained within a Caché $LIST structure. The $LIST structure is a string containing multiple length-encoded values. It requires no special delimiter characters.
  4. A null string (""). In cases where the subscripts are themselves used as the data, no data is stored in the actual node.
  5. A bitstring. In cases where a global is used to store part of a bitmap index, the value stored within a node is a bitstring. A bitstring is a string containing a logical, compressed set of 1 and 0 values. You can construct a bitstring using the $BIT functions.
  6. Part of a larger set of data. For example, the object and SQL engines store streams (BLOBs) as a sequential series of 32K nodes within a global. By means of the stream interface, users of streams are unaware that streams are stored in this fashion.
Collation of Globals
Within a global, nodes are stored in a collated (sorted) order.
Applications typically control the order in which nodes are sorted by applying a conversion to values used as subscripts. For example, the SQL engine, when creating an index on string values, converts all string values to uppercase letters and prepends a space character to make sure that the index is both not case-sensitive and collates as text (even if numeric values are stored as strings).
Maximum Length of a Global Reference
The total length of a global reference — that is, the reference to a specific global node or subtree — is limited to 511 encoded characters (which may be fewer than 511 typed characters).
For a conservative determination of the size of a given global reference, use the following guidelines:
  1. For the global name: add 1 for each character.
  2. For a purely numeric subscript: add 1 for each digit, sign, or decimal point.
  3. For a subscript that includes nonnumeric characters: add 3 for each character.
    If a subscript is not purely numeric, the actual length of the subscript varies depending on the character set used to encode the string. A multibyte character can take up to 3 bytes.
    Note that an ASCII character can take up 1 or 2 bytes. If the collation does case folding, an ASCII character can take 1 byte for the character and 1 byte for the disambiguation byte. If the collation does not perform case folding, an ASCII character takes 1 byte.
  4. For each subscript, add 1.
If the sum of these numbers is greater than 511, the reference may be too long.
Because of the way that the limitation is determined, if you must have long subscript or global names, it is helpful to avoid a large number of subscript levels. Conversely, if you are using multiple subscript levels, avoid long global names and long subscripts. Because you may not be able to control the character set(s) you are using, it is useful to keep global names and subscripts shorter.
When there are doubts about particular references, it is useful to create test versions of global references that are of equivalent length to the longest expected global reference (or even a little longer). Data from these tests provides guidance on possible revisions to your naming conventions prior to building your application.
Physical Structure of Globals
Globals are stored within physical files using a highly optimized structure. The code that manages this data structure is also highly optimized for every platform that Caché runs on. These optimizations ensure that operations on globals have high throughput (number of operations per unit of time), high concurrency (total number of concurrent users), efficient use of cache memory, and require no ongoing performance-related maintenance (such as frequent rebuilding, re-indexing, or compaction).
The physical structure used to store globals is completely encapsulated; applications do not worry about physical data structure in any way.
Globals are stored on disk within a series of data blocks; the size of each block (typically 8KB) is determined when the physical database is created. To provide efficient access to data, Caché maintains a sophisticated B-tree-like structure that uses a set of pointer blocks to link together related data blocks. Caché maintains a buffer pool — an in-memory cache of frequently referenced blocks — to reduce the cost of fetching blocks from disk.
While many database technologies use B-tree-like structures for data storage, Caché is unique in many ways:
How Globals Are Stored
Within data blocks, globals are stored sequentially. Both subscripts and data are stored together. There is a special case for large node values (long strings) which are stored within separate blocks. A pointer to this separate block is stored along with the node subscript.
For example, suppose you have a global with the following contents:
^Data(1999) = 100
^Data(1999,1) = "January"
^Data(1999,2) = "February"
^Data(2000) = 300
^Data(2000,1) = "January"
^Data(2000,2) = "February"
Most likely, this data would be stored within a single data block with a contiguous structure similar to (the real representation is a series of bytes):
Data(1999):100|1:January|2:February|2000:300|1:January|2:February|...
An operation on ^Data can retrieve its entire contents with a minimum number of disk operations.
There are a number of additional techniques used to ensure that inserts, updates, and deletes are performed efficiently.
Referencing Globals
A global resides within a particular Caché database. Portions of a global can reside in different databases if appropriate mappings are used. A database can be physically located on the current system, or on a remote system accessed through Caché networking. The term dataset refers to the system and the directory that contain a Caché database. For further details on networking, see the Distributed Data Management Guide.
A namespace is a logical definition of the datasets and global mappings that together form a set of related information.
A simple global reference applies to the currently selected namespace. The namespace definition can cause this to physically access a database on the local system or a remote system. Different globals can be mapped to different locations or datasets (where a dataset refers to the system and the directory that contain a Caché database).
For example, to create a simple reference to the global ORDER in the namespace to which it currently has been mapped, use the following syntax:
^ORDER
This section describes two topics:
Setting Global Mappings
You can map globals and routines from one database to another on the same or different systems. This allows simple references to data which can exist anywhere and is the primary feature of a namespace. You can map whole globals or pieces of globals; mapping a piece of a global (or a subscript) is known as subscript-level mapping (SLM). Because you can map global subscripts, data can easily span disks.
To establish this type of mapping, see the Add Global, Routine, and Package Mapping to a Namespace section of the “Configuring Caché” chapter of the Caché System Administration Guide.
Global mapping is applied hierarchically. For example, if the NSX namespace has an associated DBX database, but maps the ^x global to the DBY database and ^x(1) to the DBZ database, then any subscripted form of the ^x global — except those that are part of the ^x(1) hierarchy — is mapped to DBY; those globals that are part of the ^x(1) hierarchy are mapped to DBZ. The following diagram illustrates this hierarchy:
Simple Subscript-level Mapping
In this diagram, the globals and their hierarchy appear in gray, and the databases to which they are mapped appear in black.
It is also possible to map part of a mapped, subscripted global to another database, or even back to the database to which the initial global is mapped. Suppose that the previous example had the additional mapping of the ^x(1,2) global back to the DBY database. This would appear as follows:
More Complex Subscript-level Mapping
Again, the globals and their hierarchy appear in gray, and the databases to which they are mapped appear in black.
Once you have mapped a global from one namespace to another, you can reference the mapped global as if it were in the current namespace — with a simple reference, such as ^ORDER or ^X(1).
Important:
When establishing subscript-level mapping ranges, the behavior of string subscripts differs from that of integer subscripts. For strings, the first character determines the range, while the range for integers uses numeric values. For example, a subscript range of ("A"):("C") contains not only AA but also AC and ABCDEF; by contrast, a subscript range of (1):(2) does not contain 11.
Using Distinct Ranges of Globals and Subscripts
Each of a namespace’s mappings must refer to distinct ranges of globals or subscripts. Mapping validation prevents the establishment of any kind of overlap. For example, if you attempt to use the Management Portal to create a new mapping that overlaps with an existing mapping, the Portal prevents this from occurring and displays an error message.
Logging Changes
Successful changes to the mappings through the Portal are also logged in cconsole.log; unsuccessful changes are not logged. Any failed attempts to establish mappings by hand-editing the Caché parameter (CPF) file are logged in cconsole.log; for details on editing the CPF file, see the section Editing the Active CPF File in the “Introduction to the Caché Parameter File” chapter of the Caché Parameter File Reference.
Extended Global References
You can refer to a global located in a namespace other than the current namespace. This is known as an extended global reference or simply an extended reference.
There are two forms of extended references:
The use of explicit namespaces is preferred, because this allows for redefinition of logical mappings externally, as requirements change, without altering your application code.
Caché supports two forms of extended references:
Note:
The examples of extended globals references use the Windows directory structure. In practice, the form of such references is operating-system dependent.
Bracket Syntax
You can use bracket syntax to specify an extended global reference with either an explicit namespace or an implied namespace:
Explicit namespace:
^[nspace]glob
Implied namespace:
^[dir,sys]glob
In an explicit namespace reference, nspace is a defined namespace that the global glob has not currently been mapped or replicated to. In an implied namespace reference, dir is a directory (the name of which includes a trailing backslash: “\”), sys is a system, and glob is a global within that directory. If nspace or dir is specified as a carat (“^”), the reference is to a process-private global.
You must include quotation marks around the directory and system names or the namespace name unless you specify them as variables. The directory and system together comprise an implied namespace. An implied namespace can reference either:
To specify an implied namespace on a remote system:
["dir","sys"]
To specify an implied namespace on the local system:
["^^dir"]
For example, to access the global ORDER in the C:\BUSINESS\ directory on a machine called SALES:
  SET x = ^["C:\BUSINESS\","SALES"]ORDER
To access the global ORDER in the C:\BUSINESS\ directory on your local machine:
   SET x = ^["^^C:\BUSINESS\"]ORDER
To access the global ORDER in the defined namespace MARKETING:
   SET x = ^["MARKETING"]ORDER
To access the process-private global ORDER:
   SET x = ^["^"]ORDER
Note:
When creating an implied namespace extended reference involving a mirrored database, you can use its mirrored database path, in the format :mirror:mirror_name:mirror_DB_name. For example, when referring to the database with the mirror database name mirdb1 in the mirror CORPMIR, you could form an implied reference as follows:
["^^:mirror:CORPMIR:mirdb1"]
The mirrored database path can be used for both local and remote databases.
Environment Syntax
The environment syntax is defined as:
^|"env"|global
"env" can have one of five formats:
To access the global ORDER in your current namespace on your current system, when no mapping has been defined for ORDER, use the following syntax:
   SET x = ^|""|ORDER
This is the same as the simple global reference:
   SET x = ^ORDER
To access the global ORDER mapped to the defined namespace MARKETING:
   SET x = ^|"MARKETING"|ORDER
You can use an implied namespace to access the global ORDER in the directory C:\BUSINESS\ on your local system:
   SET x = ^|"^^C:\BUSINESS\"|ORDER
You can use an implied namespace to access the global ORDER in the directory C:\BUSINESS on a remote system named SALES:
   SET x = ^|"^SALES^C:\BUSINESS\"|ORDER
To access the process-private global ORDER:
   SET x = ^||ORDER
   SET x=^|"^"|ORDER