PROBABILITY (SQL)
Synopsis
PROBABILITY ( model-name FOR label-value)
PROBABILITY ( model-name USE trained-model-name
FOR label-value )
PROBABILITY ( model-name FOR label-value
WITH feature-columns-clause )
PROBABILITY ( model-name USE trained-model-name
FOR label-value WITH feature-columns-clause ] )
Description
The PROBABILITY function applies a given model to a given table, returning the probability that, for each row in the table, the model would predict the specified value. This probability is returned as a value from 0 to 1. This function can only be used with classification models (not regression models).
FOR
FOR provides the output value that PROBABILITY finds the probability of.
For example:
SELECT * FROM flower_dataset WHERE PROBABILITY(iris_flower FOR 'iris-setosa') > 0.6
Uses the iris_flower model to return each row in flower_dataset where the probability of the result being “iris-setosa” is greater than 0.6.
Omitting FOR implies a value of 1. For example:
SELECT PROBABILITY(IsSpam) FROM email_data
Implicitly forms this query:
SELECT PROBABILITY(IsSpam FOR 1) FROM email_data
When the value provided for FOR is invalid for the specified trained model, there is a SQLCODE –400 error with the following message:
[%msg: <PREDICT execution error: ERROR #2853: Specified positive label value not found in the dataset.>]
USE
If a trained model is not explicitly named by USE, PROBABILITY uses the default trained model for the specified model definition.
For example, if multiple models are trained:
CREATE MODEL MyModel PREDICTING( label ) FROM data
TRAIN MODEL MyModel AS FirstModel
TRAIN MODEL MyModel AS SecondModel NOT DEFAULT
FirstModel is the default model for MyModel. This means that PROBABILITY queries would use FirstModel for predictions. To specify use of SecondModel:
PROBABILITY( MyModel FOR label-value USE SecondModel)
WITH
PROBABILITY is a smart function, mapping the feature columns of the specified dataset to those in the model implicitly when there is no WITH clause. You can use a WITH clause to specify the mapping of columns between the dataset and your model. For example:
SELECT PROBABILITY(iris_flower FOR 'iris-setosa' WITH petal_length = length_petal) FROM flower_dataset
This query matches the petal_length column from the iris_flower model to the length_petal column from flower_dataset.
Required Security Privileges
Calling PROBABILITY requires %USE_MODEL privileges; otherwise, there is a SQLCODE –99 error (Privilege Violation). To assign %USE_MODEL privileges, use the GRANT command.
Arguments
model-name
The name of the trained model.
FOR label-value
The output value. See details above.
USE trained-model-name
An optional argument. The name of a non-default trained model. See details above.
WITH feature-columns-clause
An optional argument. The specific columns to provide as input for your trained model. See details above.
Examples
CREATE MODEL PatientReadmission PREDICTING ( IsReadmitted ) FROM patient_data
TRAIN MODEL PatientReadmission
SELECT * FROM new_patient_data WHERE PROBABILITY( PatientReadmission FOR 1) > 0.8