This chapter contains performance tips for DeepSee.
For any cube that uses more than 64,000 records (by default), DeepSee maintains and uses a result cache. When you update a cube in any way, parts of the result cache are considered invalid and are cleared. The details depend upon options in the cube definition (see Cache Buckets and Fact Order,
later in this chapter). Therefore, it is not generally desirable to update the cubes constantly.
The result cache works as follows: Each time a user executes a query (via the Analyzer for example), DeepSee caches the results for that query. The next time any user runs that query, DeepSee checks to see if the cache is still valid. If so, DeepSee then uses the cached values. Otherwise, DeepSee re-executes the query, uses the new values, and caches the new values. The net effect is that performance improves over time as more users run more queries.
When users evaluate pivot tables, DeepSee computes and caches aggregate values that it later reuses whenever possible. To determine whether DeepSee can reuse a cache, DeepSee uses the following logic:
It examines the IDs of the records used in a given scenario (for example, for a given pivot table cell).
It checks the buckets to which those IDs belong. A bucket is a large number of contiguous records in the fact table (details given later).
If the bucket has been updated (because there was a change for at least one ID in the bucket), DeepSee discards any corresponding cache associated with that bucket and regenerates the result.
If the bucket has not been updated, DeepSee reuses the appropriate cache (if available) or generates the result (if not).
In some scenarios, changes to the source records (and the corresponding updates to any cubes) occur primarily in the most recent source records. In such scenarios, it is useful to make sure that you build the fact table in order by age of the records, with the oldest records first. This approach means that the caches for the older rows would not be made invalid by changes to the data. (In contrast, if the older rows and newer rows were mixed throughout the fact table, all the caches would potentially become invalid when changes occurred to newer records.)
To speed up initial
performance as well, you can precompute and cache specific aggregate values that are used in your pivot tables, especially wherever performance is a concern. The feature works as follows:
Within the cube class, you specify an additional XData block (CellCache
) that specifies cube cells that should be precomputed and cached. For details, see the first subsection.
You programmatically precompute these cube cells by using a utility method. See the second subsection.
You must do this after
building the cube.
A simpler option is to simply run any queries ahead of time (that is, before any users work with them).
Your cube class can contain an additional XData block (CellCache
) that specifies cube cells that can be precomputed and cached, which speeds up the initial performance of DeepSee. The following shows an example:
/// This xml document defines aggregates to be precomputed.
XData CellCache [ XMLNamespace = " http://www.intersystems.com/deepsee/cellCache" ]
<cellCache xmlns= "http://www.intersystems.com/deepsee/cellCache" >
<group name= "BS">
<element >[Measures].[Big Sale Count]</element >
<group name= "G1">
<element >[UnitsPerTransaction].[H1].[UnitsSold]</ element>
<element >[Measures].[Amount Sold]</element >
<fact >DxUnitsSold</fact >
<element >[Measures].[Amount Sold]</element >
element is as follows:
It has a name
attribute, which you use later when specifying which groups of cells to precompute.
It contains one or more <item>
element represents a combination of cube indices and corresponds to the information returned by %SHOWPLAN. An <item>
element consists of one or more <element>
can include one or more of either of the following structures, in any combination:
is an MDX expression that evaluates to a member. This can be either a member of a level or it can be a measure name (each measure is a member of the special MEASURES dimension).
This expression cannot be a calculated member.
Each group defines a set of intersections. The number of intersections in a group affects the processing speed when you precompute the cube cells.
classmethod %ComputeAggregateGroup(pCubeName As %String,
pGroupName As %String,
pVerbose As %Boolean = 1) as %Status
is the name of the cube, pGroupName
is the name of the cube, and pVerbose
specifies whether to write progress information while the method is running. For pGroupName
, you can use "*"
to precompute all groups for this cube.
If you use this method, you must first build the cube.
The method processes each group by looping over the fact table and computing the intersections defined by the items within the group. Processing is faster with fewer intersections in a group. The processing is single-threaded, which allows querying in the foreground.