VALIDATE MODEL (SQL)
Synopsis
VALIDATE MODEL model-name [ AS validation-run-name ]
[ USE trained-model-name]
[ WITH feature-column-clause ]
FROM model-source
Arguments
model-name | The name of a model to validate. |
AS validation-run-name | Optional — A name to save your validation run as. See details below. |
USE trained-model-name | Optional — The name of a non-default trained model to be validated. See details below. |
WITH feature-column-clause | Optional — The specific columns from your dataset that you want to use for validating your model. |
FROM model-source | The table or view from which the model is being validated. This can be a table, view, or results of a join. See details below. |
Description
The VALIDATE MODEL command calculates validation metrics for a given trained model, based on its performance on a specified testing dataset. Each command creates a validation run.
Naming
AS allows you to explicitly name your validation run.
If a validation run is not explicitly named with AS, its name consists of the trained model with an appended running integer. We can see the difference by querying the INFORMATION_SCHEMA.ML_VALIDATION_RUNS table:
CREATE MODEL TitanicModel PREDICTING (Survived) FROM IntegratedML_dataset_titanic.passenger
TRAIN MODEL TitanicModel
VALIDATE MODEL TitanicModel FROM IntegratedML_dataset_titanic.passenger
VALIDATE MODEL TitanicModel FROM IntegratedML_dataset_titanic.passenger
VALIDATE MODEL TitanicModel FROM IntegratedML_dataset_titanic.passenger
VALIDATE MODEL TitanicModel AS TitanicValidation FROM IntegratedML_dataset_titanic.passenger
SELECT MODEL_NAME, TRAINED_MODEL_NAME, VALIDATION_RUN_NAME FROM INFORMATION_SCHEMA.ML_VALIDATION_RUNS
MODEL_NAME | TRAINED_MODEL_NAME | VALIDATION_RUN_NAME |
---|---|---|
TitanicModel | TitanicModel_t1 | TitanicModel_t1_v1 |
TitanicModel | TitanicModel_t1 | TitanicModel_t1_v2 |
TitanicModel | TitanicModel_t1 | TitanicModel_t1_v3 |
TitanicModel | TitanicModel_t1 | TitanicValidation |
USE
USE allows you to specify the trained model to perform validation on. If a trained model is not explicitly named by USE, the statement validates the default trained model for the specified model definition.
We can see the difference by querying the INFORMATION_SCHEMA.ML_VALIDATION_RUNS table:
CREATE MODEL TitanicModel PREDICTING (Survived) FROM IntegratedML_dataset_titanic.passenger
TRAIN MODEL TitanicModel AS FirstModel
TRAIN MODEL TitanicModel AS SecondModel
TRAIN MODEL TitanicModel AS ThirdModel
VALIDATE MODEL TitanicModel FROM IntegratedML_dataset_titanic.passenger
VALIDATE MODEL TitanicModel FROM IntegratedML_dataset_titanic.passenger
VALIDATE MODEL TitanicModel USE FirstModel FROM IntegratedML_dataset_ti
tanic.passenger
VALIDATE MODEL TitanicModel USE SecondModel FROM IntegratedML_dataset_titanic.passenger
SELECT MODEL_NAME, TRAINED_MODEL_NAME FROM INFORMATION_SCHEMA.ML_VALIDATION_RUNS
MODEL_NAME | TRAINED_MODEL_NAME |
---|---|
TitanicModel | ThirdModel |
TitanicModel | ThirdModel |
TitanicModel | FirstModel |
TitanicModel | SecondModel |
FROM Considerations
While you used a training set to train your model, you should use other data, a testing data set, to validate your model. Using your training data to validate a model only evaluates goodness of fit, as opposed to evaluating the model’s predictive performance on other data.
This data should be of the same schema as your training data, including the feature columns and label column.
Required Security Privileges
Calling VALIDATE MODEL requires %USE_MODEL privileges; otherwise, there is a SQLCODE –99 error (Privilege Violation). To assign %USE_MODEL privileges, use the GRANT command.
Validation Metrics
The output of VALIDATE MODEL is a set of validation metrics that is viewable in the INFORMATION_SCHEMA.ML_VALIDATION_METRICS table.
For regression models, the following metrics are saved:
-
Variance
-
R-squared
-
Mean squared error
-
Root mean squared error
For classification models, the following metrics are saved:
-
Precision — This is calculated by dividing the number of true positives by the number of predicted positives (sum of true positives and false positives).
-
Recall — This is calculated by dividing the number of true positives by the number of actual positives (sum of true positives and false negatives).
-
F-Measure — This is calculated by the following expression:
F = 2 * (precision * recall) / (precision + recall)
-
Accuracy — This is calculated by dividing the number of true positives and true negatives by the total number of rows (sum of true positives, false positives, true negatives, and false negatives) across the entire test set.
-
Area under the ROC curve — This is the value of the computed area under the receiver operator characteristic curve. The higher this value is, the better the model is at recognizing differences between classes.
-
Support — This is the number of instances that a label appears in the validation data set, allowing users to understand the validation metrics within the context of how frequently the label appears.
Examples
VALIDATE MODEL PatientReadmission FROM Patient_test
VALIDATE MODEL PatientReadmission AS PatientValidation USE PatientReadmission_Model FROM Patient_test