DeepSee Implementation Guide
Performance Tips
[Home] [Back] [Next]
InterSystems: The power behind what matters   
Class Reference   
Search:    

This chapter contains performance tips for DeepSee.

Also see the section Placing the DeepSee Globals in a Separate Database,” earlier in this book.
Result Caching and Cube Updates
For any cube that uses more than 512,000 records (by default), DeepSee maintains and uses a result cache. When you update a cube in any way, parts of the result cache are considered invalid and are cleared. The details depend upon options in the cube definition (see Cache Buckets and Fact Order,” later in this chapter). Therefore, it is not generally desirable to update the cubes constantly.
The result cache works as follows: Each time a user executes a query (via the Analyzer for example), DeepSee caches the results for that query. The next time any user runs that query, DeepSee checks to see if the cache is still valid. If so, DeepSee then uses the cached values. Otherwise, DeepSee re-executes the query, uses the new values, and caches the new values. The net effect is that performance improves over time as more users run more queries.
Specifying the Agent Count
DeepSee sets up a pool of agents that execute queries. This pool consists of a set of agents with high priority and the same number of agents with low priority. You can control the number of agents, which are also used when cubes are built. For details, see Specifying the Agent Count in the chapter “Compiling and Building Cubes” in Defining DeepSee Models.
Cache Buckets and Fact Order
As noted earlier, for large data sets, DeepSee maintains and uses a result cache. In this case, it can be useful to control the order of rows in the fact table, because this affects how DeepSee creates and uses the cache. To do this, you can specify the Initial build order option for the cube; see Other Cube Options in Defining DeepSee Models.
When users evaluate pivot tables, DeepSee computes and caches aggregate values that it later reuses whenever possible. To determine whether DeepSee can reuse a cache, DeepSee uses the following logic:
  1. It examines the IDs of the records used in a given scenario (for example, for a given pivot table cell).
  2. It checks the buckets to which those IDs belong. A bucket is a large number of contiguous records in the fact table (details given later).
In some scenarios, changes to the source records (and the corresponding updates to any cubes) occur primarily in the most recent source records. In such scenarios, it is useful to make sure that you build the fact table in order by age of the records, with the oldest records first. This approach means that the caches for the older rows would not be made invalid by changes to the data. (In contrast, if the older rows and newer rows were mixed throughout the fact table, all the caches would potentially become invalid when changes occurred to newer records.)
For more information, see How the DeepSee Query Engine Works,” later in this book.
Removing Inactive Cache Buckets
When a cache bucket is invalidated (as described in the previous section), it is marked as inactive but is not removed. To remove the inactive cache buckets, call the %PurgeObsoleteCache() method of %DeepSee.Utils. For example:
d ##class(%DeepSee.Utils).%PurgeObsoleteCache("patients")
Precomputing Cube Cells
As noted earlier, when users evaluate pivot tables, DeepSee computes and caches aggregate values that it later reuses whenever possible. This caching means that the more users work with DeepSee, the more quickly it runs. (For details, see How the DeepSee Query Engine Works,” later in this book.)
To speed up initial performance as well, you can precompute and cache specific aggregate values that are used in your pivot tables, especially wherever performance is a concern. The feature works as follows:
Important:
A simpler option is to simply run any queries ahead of time (that is, before any users work with them).
Defining the Cell Cache
Your cube class can contain an additional XData block (CellCache) that specifies cube cells that can be precomputed and cached, which speeds up the initial performance of DeepSee. The following shows an example:
/// This xml document defines aggregates to be precomputed.
XData CellCache [ XMLNamespace = " http://www.intersystems.com/deepsee/cellCache" ]
{
<cellCache xmlns= "http://www.intersystems.com/deepsee/cellCache" >
   <group name= "BS">
      <item>
         <element >[Measures].[Big Sale Count]</element >
      </item>
   </group>
   <group name= "G1">
      <item>
         <element >[UnitsPerTransaction].[H1].[UnitsSold]</ element>
         <element >[Measures].[Amount Sold]</element >
      </item>
      <item>
         <fact >DxUnitsSold</fact >
         <element >[Measures].[Amount Sold]</element >
      </item>
   </group>
</cellCache >
}
The <cellCache> element is as follows:
Each <group> element is as follows:
Each <item> element represents a combination of cube indices and corresponds to the information returned by %SHOWPLAN. An <item> element consists of one or more <element> elements.
An <element> can include one or more of either of the following structures, in any combination:
<fact>fact_table_field_name</fact>
Or:
<element>mdx_member_expression</element >
Where:
Note:
Each group defines a set of intersections. The number of intersections in a group affects the processing speed when you precompute the cube cells.
Precomputing the Cube Cells
To precompute the aggregate values specified by a <group>, use the %ComputeAggregateGroup() method of %DeepSee.Utils. This method is as follows:
classmethod %ComputeAggregateGroup(pCubeName As %String, 
                                   pGroupName As %String, 
                                   pVerbose As %Boolean  = 1) as %Status
Where pCubeName is the name of the cube, pGroupName is the name of the cube, and pVerbose specifies whether to write progress information while the method is running. For pGroupName, you can use "*" to precompute all groups for this cube.
If you use this method, you must first build the cube.
The method processes each group by looping over the fact table and computing the intersections defined by the items within the group. Processing is faster with fewer intersections in a group. The processing is single-threaded, which allows querying in the foreground.