Skip to main content

VALIDATE MODEL (SQL)

Validates a model.

Synopsis

VALIDATE MODEL model-name [ AS validation-run-name ]
       [ USE trained-model-name]
       [ WITH feature-column-clause ]
       FROM model-source 

Arguments

model-name The name of a model to validate.
AS validation-run-name Optional — A name to save your validation run as. See details below.
USE trained-model-name Optional — The name of a non-default trained model to be validated. See details below.
WITH feature-column-clause Optional — The specific columns from your dataset that you want to use for validating your model.
FROM model-source The table or view from which the model is being validated. This can be a table, view, or results of a join. See details below.

Description

The VALIDATE MODEL command calculates validation metrics for a given trained model, based on its performance on a specified testing dataset. Each command creates a validation run.

Naming

AS allows you to explicitly name your validation run.

If a validation run is not explicitly named with AS, its name consists of the trained model with an appended running integer. We can see the difference by querying the INFORMATION_SCHEMA.ML_VALIDATION_RUNS table:

CREATE MODEL TitanicModel PREDICTING (Survived) FROM IntegratedML_dataset_titanic.passenger

TRAIN MODEL TitanicModel

VALIDATE MODEL TitanicModel FROM IntegratedML_dataset_titanic.passenger

VALIDATE MODEL TitanicModel FROM IntegratedML_dataset_titanic.passenger

VALIDATE MODEL TitanicModel FROM IntegratedML_dataset_titanic.passenger

VALIDATE MODEL TitanicModel AS TitanicValidation FROM IntegratedML_dataset_titanic.passenger

SELECT MODEL_NAME, TRAINED_MODEL_NAME, VALIDATION_RUN_NAME FROM INFORMATION_SCHEMA.ML_VALIDATION_RUNS
MODEL_NAME TRAINED_MODEL_NAME VALIDATION_RUN_NAME
TitanicModel TitanicModel_t1 TitanicModel_t1_v1
TitanicModel TitanicModel_t1 TitanicModel_t1_v2
TitanicModel TitanicModel_t1 TitanicModel_t1_v3
TitanicModel TitanicModel_t1 TitanicValidation

USE

USE allows you to specify the trained model to perform validation on. If a trained model is not explicitly named by USE, the statement validates the default trained model for the specified model definition.

We can see the difference by querying the INFORMATION_SCHEMA.ML_VALIDATION_RUNS table:

CREATE MODEL TitanicModel PREDICTING (Survived) FROM IntegratedML_dataset_titanic.passenger

TRAIN MODEL TitanicModel AS FirstModel

TRAIN MODEL TitanicModel AS SecondModel

TRAIN MODEL TitanicModel AS ThirdModel

VALIDATE MODEL TitanicModel FROM IntegratedML_dataset_titanic.passenger

VALIDATE MODEL TitanicModel FROM IntegratedML_dataset_titanic.passenger

VALIDATE MODEL TitanicModel USE FirstModel FROM IntegratedML_dataset_ti
tanic.passenger

VALIDATE MODEL TitanicModel USE SecondModel FROM IntegratedML_dataset_titanic.passenger

SELECT MODEL_NAME, TRAINED_MODEL_NAME FROM INFORMATION_SCHEMA.ML_VALIDATION_RUNS
MODEL_NAME TRAINED_MODEL_NAME
TitanicModel ThirdModel
TitanicModel ThirdModel
TitanicModel FirstModel
TitanicModel SecondModel

FROM Considerations

While you used a training set to train your model, you should use other data, a testing data set, to validate your model. Using your training data to validate a model only evaluates goodness of fit, as opposed to evaluating the model’s predictive performance on other data.

This data should be of the same schema as your training data, including the feature columns and label column.

Required Security Privileges

Calling VALIDATE MODEL requires %USE_MODEL privileges; otherwise, there is a SQLCODE –99 error (Privilege Violation). To assign %USE_MODEL privileges, use the GRANT command.

Validation Metrics

The output of VALIDATE MODEL is a set of validation metrics that is viewable in the INFORMATION_SCHEMA.ML_VALIDATION_METRICS table.

For regression models, the following metrics are saved:

  • Variance

  • R-squared

  • Mean squared error

  • Root mean squared error

For classification models, the following metrics are saved:

  • Precision — This is calculated by dividing the number of true positives by the number of predicted positives (sum of true positives and false positives).

  • Recall — This is calculated by dividing the number of true positives by the number of actual positives (sum of true positives and false negatives).

  • F-Measure — This is calculated by the following expression:

    F = 2 * (precision * recall) / (precision + recall)

  • Accuracy — This is calculated by dividing the number of true positives and true negatives by the total number of rows (sum of true positives, false positives, true negatives, and false negatives) across the entire test set.

  • Area under the ROC curve — This is the value of the computed area under the receiver operator characteristic curve. The higher this value is, the better the model is at recognizing differences between classes.

  • Support — This is the number of instances that a label appears in the validation data set, allowing users to understand the validation metrics within the context of how frequently the label appears.

Examples

VALIDATE MODEL PatientReadmission FROM Patient_test

VALIDATE MODEL PatientReadmission AS PatientValidation USE PatientReadmission_Model FROM Patient_test

See Also

FeedbackOpens in a new tab