Using PMML Models in Caché
This article discusses how to use Caché PMML (Predictive Modelling Markup Language) runtime support. It discusses the following topics:
PMML (Predictive Modelling Markup Language) is an XML-based standard that expresses analytics models. It provides a way for applications to define statistical and data mining models so that they can be easily reused and shared. The standard is particularly helpful because the analytics tools used to generate models (tools such as PMML:R, KNIME, SAS, and SPSS) are very different in architecture from the tools used in a Caché or Ensemble production environment.
In a typical scenario, data scientists use an analytical tool to produce a data mining model based on large amounts of historical data, which is then exported to PMML. The model can then be deployed in a runtime environment and executed on incoming observations, predicting values for the model’s target metrics.
Caché provides runtime support for PMML 4.1
When that class is compiled, the system generates the code needed to execute the model or models described in it.
Caché provides an API for executing the models, based on the data input that you provide.
Caché provides a sample test page that uses the API.
Caché supports PMML 4.1
and the following PMML models:
namespace provides a sample that you can use to become better acquainted with PMML.
This sample includes a copy of the Iris dataset, a well-known sample used in predictive analytics. The Iris dataset provides measurements for the petal and sepal measurements for approximately 50 flowers in three different species of irises. These measurements are strongly predictive of the iris species.
To set up this sample, use the following command:
Then you can use the PMML models in the DataMining.PMML.Iris
class. This class contains a PMML definition that includes the following models:
A tree model that predicts the iris species, based on petal and sepal measurements
A general regression model that predicts the sepal length, based on the sepal width, petal measurements, and species
To create a class that contains PMML models:
For Class name
, type a fully qualified class name.
XMLNamespace = "http://www.intersystems.com/deepsee/pmml"
Class DataMining.PMML.Iris Extends %DeepSee.PMML.Definition
XData PMML [ XMLNamespace = "http://www.intersystems.com/deepsee/pmml" ]
<X-SQLDataSource name="Analysis dataset">
<X-FieldMap fieldName="PetalLength" spec="PetalLength" />
<X-FieldMap fieldName="PetalWidth" spec="PetalWidth" />
<X-FieldMap fieldName="SepalLength" spec="SepalLength" />
<X-FieldMap fieldName="SepalWidth" spec="SepalWidth" />
<X-FieldMap fieldName="Species" spec="Species" />
<X-SQL>SELECT PetalLength, PetalWidth, SepalLength, SepalWidth, UPPER(Species) Species
<DataField name="PetalLength" displayName="PetalLength" optype="continuous" dataType="double" />
<DataField name="PetalWidth" displayName="PetalWidth" optype="continuous" dataType="double" />
<DataField name="SepalLength" displayName="SepalLength" optype="continuous" dataType="double" />
<DataField name="SepalWidth" displayName="SepalWidth" optype="continuous" dataType="double" />
<DataField name="Species" displayName="Species" optype="categorical" dataType="string">
<Value value="IRIS-SETOSA" property="valid" />
<Value value="IRIS-VERSICOLOR" property="valid" />
<Value value="IRIS-VIRGINICA" property="valid" />
defines a data source in terms of an SQL query. This element defines a mapping from the SQL fields to the data fields in the PMML definition.
defines a mapping from the measures and dimensions of a given cube to the data fields in the PMML definition.
Caché uses these classes to execute the model or models.
Open the Management Portal.
Switch to the appropriate namespace.
The system then displays a page like the following (partially shown):
, select a model class, and then click OK
Click a model from the Model
Click an option from the Data source
drop-down list. Options include:
Caché iterates through the records and then displays a summary of the results. The details depend upon the model. The following shows an example:
You can also test the model with a single input record. To do so, press Test
, which displays a dialog box like the following:
The fields listed in Data object
correspond to the data fields in your PMML definition.
To use this page, select a model from the Model
drop-down list. The model determines which fields are input fields and which are output fields. Then enter values into the input fields. When you have entered all the input values, the page displays the predicted value for the output field for the given model. For example:
Caché also provides a direct API that you can use to execute PMML models.
To run a predictive model for a single record:
Create an instance of the generated class PackageName.ClassName.Data
and set its properties. The purpose of this instance is to contain the input values.
Method %ExecuteModel(ByRef pData As %DeepSee.PMML.Data,
Output pOutput As %DeepSee.PMML.ModelOutput) as %Status
, use the data object that you created in step 2.
To see the details for the output, use ZWRITE. The pOutput
object includes one property for each <OutputField> in the <Output> element of the model definition. If there is no <Output> element, pOutput
includes a single field named after the predicted <MiningField> element.
This section discusses some additional options:
is the quoted name of an output field of a Caché PMML model class.
is the optional number of a series (row) in the plugin. Specify 1 or omit this argument.
Specifies the cube on which this KPI is executed.
Specifies the name of the model to execute. If specified, this must be a model in the given model class. If left blank, the first model in the class will be executed.
Note that not all aggregations might make sense for each output field.
Specifies whether or not to include null predictions when aggregating results. Available values are "ignore"
(the default) and "count"
The order in which you list the parameters does not affect the results.
You can specify up to 16 parameters and their values.
The special %CONTEXT
parameter to cause the plugin to consider the context of query, which is otherwise ignored. For details, see the reference for the %KPI
function in the DeepSee MDX Reference
For example, use the following syntax to get the average value for the output field MyField
for a PMML model class named Test.MyModel
, which contains only a single model:
To include record-level predictions in a DeepSee listing, you can use the $$$PMML token in the listing query. This token takes the PMML definition class name and the model name as its primary parameters. As an optional third argument, you can pass the name of the predicted feature you wish to include in the query (this argument defaults to "predictedValue"
The following shows the definition of a listing query that uses this token:
UserID, TotalWagered, PercentLost "Lost %" , $$$PMML[MyPMML.Poker,PercentLost] "Predicted Loss %"
After you run a predictive model with a batch of input records, you can export the results to a DeepSee cube. This option enables you to visualize the results in a different way. The cube contains two levels: ActualValue
To export the results to a cube, use the PMML test page
and click Export
. Caché prompts you for the following information:
Result class name
Specify the name of the persistent class to which the results are written. This is used as the source class for the cube.
Link to source class
Specify the class that contains the source records. The result class includes a property named Record
that points to this class.
Select this if you want to empty the result class (Result class name
) before performing the export. Or clear this if you want to append the newly exported data to the end of the result class table.
Specify the logical name of the cube.
Select this if you have performed this export earlier and now want to overwrite the classes with new data and definitions.
The system then displays the Build Cube
dialog box, where you can build the given cube. Click either Build
. You can also later access this cube via the DeepSee Architect and build it there.
After you build the cube, use the DeepSee Analyzer to examine it. The following shows an example. The ActualValue
level is used as rows and the PredictedValue
levels is used as columns: