PROBABILITY (SQL)

A function that applies a specified trained model to return the probability that the specified label is the predicted label value. This allows you to evaluate the relative strength of predictions of that value.

Synopsis

PROBABILITY ( model-name FOR label-value)

PROBABILITY ( model-name USE  trained-model-name
  FOR label-value ) 

PROBABILITY ( model-name FOR label-value 
  WITH  feature-columns-clause ) 

PROBABILITY ( model-name USE  trained-model-name 
  FOR label-value WITH feature-columns-clause ] )

Description

The PROBABILITY function applies a given model to a given table, returning the probability that, for each row in the table, the model would predict the specified value. This probability is returned as a value from 0 to 1. This function can only be used with classification models (not regression models).

FOR

FOR provides the output value that PROBABILITY finds the probability of.

For example:

SELECT * FROM flower_dataset WHERE PROBABILITY(iris_flower FOR 'iris-setosa') > 0.6

Uses the iris_flower model to return each row in flower_dataset where the probability of the result being “iris-setosa” is greater than 0.6.

Omitting FOR implies a value of 1. For example:

SELECT PROBABILITY(IsSpam) FROM email_data

Implicitly forms this query:

SELECT PROBABILITY(IsSpam FOR 1) FROM email_data

When the value provided for FOR is invalid for the specified trained model, there is a SQLCODE –400 error with the following message:

[%msg: <PREDICT execution error: ERROR #2853: Specified positive label value not found in the dataset.>]

USE

If a trained model is not explicitly named by USE, PROBABILITY uses the default trained model for the specified model definition.

For example, if multiple models are trained:

CREATE MODEL MyModel PREDICTING( label ) FROM data
TRAIN MODEL MyModel AS FirstModel
TRAIN MODEL MyModel AS SecondModel NOT DEFAULT

FirstModel is the default model for MyModel. This means that PROBABILITY queries would use FirstModel for predictions. To specify use of SecondModel:

PROBABILITY( MyModel FOR label-value USE SecondModel)

WITH

PROBABILITY is a smart function, mapping the feature columns of the specified dataset to those in the model implicitly when there is no WITH clause. You can use a WITH clause to specify the mapping of columns between the dataset and your model. For example:

SELECT PROBABILITY(iris_flower FOR 'iris-setosa' WITH petal_length = length_petal) FROM flower_dataset

This query matches the petal_length column from the iris_flower model to the length_petal column from flower_dataset.

Required Security Privileges

Calling PROBABILITY requires %USE_MODEL privileges; otherwise, there is a SQLCODE –99 error (Privilege Violation). To assign %USE_MODEL privileges, use the GRANT command.

Arguments

model-name

The name of the trained model.

FOR label-value

The output value. See details above.

USE trained-model-name

An optional argument. The name of a non-default trained model. See details above.

WITH feature-columns-clause

An optional argument. The specific columns to provide as input for your trained model. See details above.

Examples

CREATE MODEL PatientReadmission PREDICTING ( IsReadmitted ) FROM patient_data
TRAIN MODEL PatientReadmission
SELECT * FROM new_patient_data WHERE PROBABILITY( PatientReadmission FOR 1) > 0.8