Multidimensional Data Model

Introduction to the Multidimensional Data Model in Caché

Rich Multidimensional Data Structure

Caché’s high-performance database uses a multidimensional data model that allows efficient and compact storage of data in a rich data structure. With Caché, it is possible to access or update data without performing the complicated and time consuming joins required by relational databases.

Although sometimes described as a “hyper-cube” or “n-dimensional space,” a more accurate description of the Caché storage model is a collection of sparse multidimensional arrays called “globals.” Data can be stored in a global with any number of subscripts. What’s more, subscripts are typeless and hence can be anything – string, integer, floating point, etc. This means one subscript might be an integer, such as 34, while another could be a meaningful name, like “LineItems”– even at the same subscript level.

For example, a stock inventory application that provides information about item, size, color, and pattern might have a structure like this:

^Stock(item,size,color,pattern) = quantity

Here’s some sample data:

^Stock(“slip dress”,4,”blue”,”floral”)=3

With this structure, it is very easy to determine if there are any size 4 blue slip dresses with a floral pattern – simply by accessing that data node. If a customer wants a size 4 slip dress and is uncertain about color and pattern, it is easy to display a list of all of those by cycling through all of the data nodes below ^Stock(“slip dress”,4).

In this example, all the data nodes were of a similar nature (they stored a quantity), and they were all stored at the same subscript level (4 subscripts) with similar subscripts (the 3rd subscript was always text representing a color). However, they do not have to be. Not all data nodes have to have the same number or type of subscripts, and they may contain different types of data.

Here is an example of a more complex global with invoice data that has different types of data stored at different subscript levels:

^Invoice(invoice #,”Customer”) = Customer information
^Invoice(invoice #,”Date”) = Invoice date
^Invoice(invoice #,”Items”) = # of Items in the invoice
^Invoice(invoice #,”Items”,1,”PartNum”) = part number of 1st Item
^Invoice(invoice #,”Items”,1,”Quantity”) = quantity of 1st Item
^Invoice(invoice #,”Items”,1,”Price”) = price of 1st Item
^Invoice(invoice #,”Items”,2,”PartNum”) = part number of 2nd Item

Multiple Data Elements per Node

Often only a single data element is stored in a data node, such as a date or quantity, but sometimes it is useful to store multiple data elements together as a single data node. This is particularly useful when there is a set of related data that is often accessed together. It can also improve performance by requiring fewer accesses of the database, especially when networks are involved.

For example, in the previous invoice, each item included a part number, quantity, and price all stored as separate nodes, but they could be stored in a single node:

^Invoice(invoice #,”LineItems”,item #) = $LB(PartNum,Quantity,Price)

To make this simple, Caché supports a list functions which can assemble multiple data elements into a length delimited byte string and later de-assemble them, preserving datatype.

Transaction Processing with a Large Number of Users

Efficient access to data makes the multidimensional model a natural for transaction processing. Caché processes do not have to spend time joining multiple tables, so they run faster.

Logical Locking Promotes High Concurrency

In systems with thousands of users, reducing conflicts between competing processes is critical to providing high performance. One of the biggest conflicts is between transactions wishing to access the same data.

Caché processes do not lock entire pages of data while performing updates. Because transactions require frequent access or changes to small quantities of data, database locking in Caché is done at a logical level. Database conflicts are further reduced by using atomic addition and subtraction operations, which do not require locking. (These operations are particularly useful in incrementing counters used to allocate ID numbers and for modifying statistics counters, both of which are common “hot spots” in a database that would otherwise cause frequent conflicts between competing transactions.)

With Caché, individual transactions run faster, and more transactions can run concurrently.

Multidimensional Model Enables Realistic Description of Data

The multidimensional model is also a natural fit for describing and storing complex data. Developers can create data structures that accurately represent real-world data, thus making it faster to develop applications and easier to maintain them.

Variable Length Data in Sparse Arrays

Because Caché data is inherently of variable length and is stored in sparse arrays, Caché often requires less than half of the space needed by a relational database. In addition to reducing disk requirements, compact data storage enhances performance because more data can be read or written with a single I/O operation, and data can be cached more efficiently.

Declarations and Definitions are Not Required

No declarations, definitions, or allocations of storage are required to directly access or store data in the database, and there is no need to specify the number or type of subscripts or the type or size of data. The multidimensional arrays are inherently typeless, both in their data and subscripts. Global data simply pops into existence as data is inserted with the SET command.

However, to make use of the object access and SQL access of the database, data dictionary information is required. In specifying the data dictionary for objects and SQL, developers have a choice of letting wizards automatically select the multidimensional data structure best suited to their data, or they can directly specify the mapping.

Namespaces

In Caché, data and ObjectScript code are stored in disk files with the name CACHE.DAT (only one per directory). Each such file contains numerous “globals” (multidimensional arrays). Within a file, each global name must be unique, but different files may contain the same global name. These files may be loosely thought of as databases.

Rather than specifying which CACHE.DAT file to use, each Caché process uses a “namespace” to access data. A namespace is a logical map that maps the names of multidimensional global arrays and routine code to CACHE.DAT files, including the Data Server and directory name for that file. If a file is moved from one disk drive or computer to another, the namespace map is changed.

Usually a namespace specifies sharing of certain system information with other namespaces, and the rest of the namespace’s data is in a single CACHE.DAT used only by that namespace. However, this is a flexible structure that allows arbitrary mapping, and it is not unusual for a namespace to map the contents of several CACHE.DAT files