Performance Tips

This chapter contains performance tips for DeepSee.

For more information on performance and troubleshooting options, see the InterSystems Developer CommunityOpens in a new tab. Also see the section “Placing the DeepSee Globals in a Separate Database,” earlier in this book.

Result Caching and Cube Updates

For any cube that uses more than 64,000 records (by default), DeepSee maintains and uses a result cache. When you update a cube in any way, parts of the result cache are considered invalid and are cleared. The details depend upon options in the cube definition (see “Cache Buckets and Fact Order,” later in this chapter). Therefore, it is not generally desirable to update the cubes constantly.

The result cache works as follows: Each time a user executes a query (via the Analyzer for example), DeepSee caches the results for that query. The next time any user runs that query, DeepSee checks to see if the cache is still valid. If so, DeepSee then uses the cached values. Otherwise, DeepSee re-executes the query, uses the new values, and caches the new values. The net effect is that performance improves over time as more users run more queries.

Specifying the Agent Count

DeepSee sets up a pool of agents that execute queries. This pool consists of a set of agents with high priority and the same number of agents with low priority. You can control the number of agents, which are also used when cubes are built. For details, see “Specifying the Agent Count” in the chapter “Compiling and Building Cubes” in Defining DeepSee Models.

Cache Buckets and Fact Order

As noted earlier, for large data sets, DeepSee maintains and uses a result cache. In this case, it can be useful to control the order of rows in the fact table, because this affects how DeepSee creates and uses the cache. To do this, you can specify the Initial build order option for the cube; see “Other Cube Options” in Defining DeepSee Models.

When users evaluate pivot tables, DeepSee computes and caches aggregate values that it later reuses whenever possible. To determine whether DeepSee can reuse a cache, DeepSee uses the following logic:

It examines the IDs of the records used in a given scenario (for example, for a given pivot table cell).
It checks the buckets to which those IDs belong. A bucket is a large number of contiguous records in the fact table (details given later).
- If the bucket has been updated (because there was a change for at least one ID in the bucket), DeepSee discards any corresponding cache associated with that bucket and regenerates the result.
- If the bucket has not been updated, DeepSee reuses the appropriate cache (if available) or generates the result (if not).

In some scenarios, changes to the source records (and the corresponding updates to any cubes) occur primarily in the most recent source records. In such scenarios, it is useful to make sure that you build the fact table in order by age of the records, with the oldest records first. This approach means that the caches for the older rows would not be made invalid by changes to the data. (In contrast, if the older rows and newer rows were mixed throughout the fact table, all the caches would potentially become invalid when changes occurred to newer records.)

For more information, see “How the DeepSee Query Engine Works,” later in this book.

Removing Inactive Cache Buckets

When a cache bucket is invalidated (as described in the previous section), it is marked as inactive but is not removed. To remove the inactive cache buckets, call the %PurgeObsoleteCache() method of %DeepSee.UtilsOpens in a new tab. For example:

d ##class(%DeepSee.Utils).%PurgeObsoleteCache("patients")

Precomputing Cube Cells

As noted earlier, when users evaluate pivot tables, DeepSee computes and caches aggregate values that it later reuses whenever possible. This caching means that the more users work with DeepSee, the more quickly it runs. (For details, see “How the DeepSee Query Engine Works,” later in this book.)

To speed up initial performance as well, you can precompute and cache specific aggregate values that are used in your pivot tables, especially wherever performance is a concern. The feature works as follows:

Within the cube class, you specify an additional XData block (CellCache) that specifies cube cells that should be precomputed and cached. For details, see the first subsection.
You programmatically precompute these cube cells by using a utility method. See the second subsection.
You must do this after building the cube.

Important:

A simpler option is to simply run any queries ahead of time (that is, before any users work with them).

Defining the Cell Cache

Your cube class can contain an additional XData block (CellCache) that specifies cube cells that can be precomputed and cached, which speeds up the initial performance of DeepSee. The following shows an example:

/// This xml document defines aggregates to be precomputed.
XData CellCache [ XMLNamespace = " http://www.intersystems.com/deepsee/cellCache" ]
{
<cellCache xmlns= "http://www.intersystems.com/deepsee/cellCache" >
   <group name= "BS">
      <item>
         <element >[Measures].[Big Sale Count]</element >
      </item>
   </group>
   <group name= "G1">
      <item>
         <element >[UnitsPerTransaction].[H1].[UnitsSold]</ element>
         <element >[Measures].[Amount Sold]</element >
      </item>
      <item>
         <fact >DxUnitsSold</fact >
         <element >[Measures].[Amount Sold]</element >
      </item>
   </group>
</cellCache >
}

The <cellCache> element is as follows:

It must be in the namespace "http://www.intersystems.com/deepsee/cellCache"
It contains zero or more <group> elements.

Each <group> element is as follows:

It has a name attribute, which you use later when specifying which groups of cells to precompute.
It contains one or more <item> elements.

Each <item> element represents a combination of cube indices and corresponds to the information returned by %SHOWPLAN. An <item> element consists of one or more <element> elements.

An <element> can include one or more of either of the following structures, in any combination:

<fact>fact_table_field_name</fact>

Or:

<element>mdx_member_expression</element >

Where:

fact_table_field_name is the field name in the fact table for a level or measure, as given by the factName attribute for that level or measure.
mdx_member_expression is an MDX expression that evaluates to a member. This can be either a member of a level or it can be a measure name (each measure is a member of the special MEASURES dimension).
This expression cannot be a calculated member.

Note:

Each group defines a set of intersections. The number of intersections in a group affects the processing speed when you precompute the cube cells.

Precomputing the Cube Cells

To precompute the aggregate values specified by a <group>, use the %ComputeAggregateGroup() method of %DeepSee.UtilsOpens in a new tab. This method is as follows:

classmethod %ComputeAggregateGroup(pCubeName As %String, 
                                   pGroupName As %String, 
                                   pVerbose As %Boolean  = 1) as %Status

Where pCubeName is the name of the cube, pGroupName is the name of the cube, and pVerbose specifies whether to write progress information while the method is running. For pGroupName, you can use "*" to precompute all groups for this cube.

If you use this method, you must first build the cube.

The method processes each group by looping over the fact table and computing the intersections defined by the items within the group. Processing is faster with fewer intersections in a group. The processing is single-threaded, which allows querying in the foreground.