Trains a machine learning model.
Description
The TRAIN MODEL statement tells a provider to train a model using the specified model definition. The provider is specified by the ML configuration.
FROM
The FROM clause supplies the data for training your model.
Examples highlighting acceptable use and omission of FROM:
FROM in TRAIN MODEL
CREATE MODEL model_b PREDICTING ( label ) WITH ( column_1, column_2, column_3)
TRAIN MODEL model_b FROM table
FROM in CREATE MODEL
CREATE MODEL model_a PREDICTING ( label ) FROM table
TRAIN MODEL model_a
Note:
Omitting FROM from your TRAIN MODEL statement means that you use the default query from CREATE MODEL.
WITH
WITH allows you to explicitly match the feature columns in your data to the model definition schema. Each column is a standard identifier.
FOR
FOR allows you to explicitly match the label column in your data to the model definition schema. For example, if your label column in your model definition is named column_a but is named column_b in your training data, you can match the columns as follows:
CREATE MODEL model_a PREDICTING ( column_a ) FROM table_a
TRAIN MODEL model_a FOR column_b FROM table_b
Naming
AS allows you to explicitly name your trained model.
Model definitions and trained models exist in the same schema. If a trained model is not explicitly named with AS, its name consists of the model definition name with an appended running integer. We can see the difference by querying the INFORMATION_SCHEMA.ML_TRAINED_MODELS table:
CREATE MODEL TitanicModel PREDICTING (Survived binary) FROM IntegratedML_dataset_titanic.passenger
TRAIN MODEL TitanicModel
TRAIN MODEL TitanicModel
TRAIN MODEL TitanicModel
TRAIN MODEL TitanicModel AS TrainedTitanic
SELECT MODEL_NAME, TRAINED_MODEL_NAME FROM INFORMATION_SCHEMA.ML_TRAINED_MODELS
MODEL_NAME |
TRAINED_MODEL_NAME |
TitanicModel |
TitanicModel_t1 |
TitanicModel |
TitanicModel_t2 |
TitanicModel |
TitanicModel_t3 |
TitanicModel |
TrainedTitanic |
Not Default
Each model definition has a default trained model. Without user-specification, the most recently trained model becomes the default. Using the NOT DEFAULT clause allows you to train a new model without the result becoming the default trained model:
CREATE MODEL TitanicModel PREDICTING (Survived) FROM IntegratedML_dataset_titanic.passenger
TRAIN MODEL TitanicModel As FirstModel
TRAIN MODEL TitanicModel As SecondModel NOT DEFAULT
SELECT MODEL_NAME, DEFAULT_TRAINED_MODEL_NAME FROM INFORMATION_SCHEMA.ML_MODELS
MODEL_NAME |
DEFAULT_TRAINED_MODEL_NAME |
TitanicModel |
FirstModel |
Without using NOT DEFAULT, the DEFAULT_TRAINED_MODEL field would otherwise read “SecondModel”
USING Clause Considerations
You can pass provider-specific parameters in a USING clause for a more customized training run. This clause accepts a JSON string with one or more key-value pairs. The list of parameters that you can use depends on the provider.
For instance, when training with AutoML as your provider you can change the random seed:
TRAIN MODEL IsSpam USING {"seed": 3}
See Providers for information about which parameters you can pass for each provider.
Passing NULL Values
Passing data with NULL values in the label column, in a TRAIN MODEL statement, will result in a trained model with undefined behavior. Users should carefully screen for NULL values as part of their data preparation process.
Required Security Privileges
Calling TRAIN MODEL requires %MANAGE_MODEL privileges; otherwise, there is a SQLCODE –99 error (Privilege Violation). To assign %MANAGE_MODEL privileges, use the GRANT command.