TRAIN MODEL (SQL)
Synopsis
TRAIN MODEL model-name
[ AS preferred-name ]
[ NOT DEFAULT]
[ FOR label-column ]
[ WITH feature-column-clause ]
[ FROM model-source ]
[ USING json-object ]
Arguments
model-name | The name of the machine learning model to train. |
AS preferred-name | Optional — An alternative name to save the trained model as. See details below. |
NOT DEFAULT | Optional — A clause to train a model without setting it as the default trained model. See details below. |
FOR label-column | Optional — The name of the column being predicted, aka, the label column. See details below. |
WITH feature-column-clause | Optional — Inputs to the model, aka the feature columns, as either the name of a column or as a comma-separated list of the names of columns. |
FROM model-source | The table or view from which the model is being built. This can be a table, view, or results of a join. See details below. |
USING json-object-string | Optional — A JSON string specifying one or more key-value pairs. See details below. |
Description
The TRAIN MODEL statement tells a provider to train a model using the specified model definition. The provider is specified by the ML configuration.
FROM
The FROM clause supplies the data for training your model.
-
This clause is required if your CREATE MODEL statement did NOT specify a FROM clause.
-
This clause is optional if your CREATE MODEL statement specified a FROM clause.
Examples highlighting acceptable use and omission of FROM:
FROM in TRAIN MODEL
CREATE MODEL model_b PREDICTING ( label ) WITH ( column_1, column_2, column_3)
TRAIN MODEL model_b FROM table
FROM in CREATE MODEL
CREATE MODEL model_a PREDICTING ( label ) FROM table
TRAIN MODEL model_a
Omitting FROM from your TRAIN MODEL statement means that you use the default query from CREATE MODEL.
WITH
WITH allows you to explicitly match the feature columns in your data to the model definition schema. Each column is a standard identifier.
FOR
FOR allows you to explicitly match the label column in your data to the model definition schema. For example, if your label column in your model definition is named column_a but is named column_b in your training data, you can match the columns as follows:
CREATE MODEL model_a PREDICTING ( column_a ) FROM table_a
TRAIN MODEL model_a FOR column_b FROM table_b
Naming
AS allows you to explicitly name your trained model.
Model definitions and trained models exist in the same schema. If a trained model is not explicitly named with AS, its name consists of the model definition name with an appended running integer. We can see the difference by querying the INFORMATION_SCHEMA.ML_TRAINED_MODELS table:
CREATE MODEL TitanicModel PREDICTING (Survived binary) FROM IntegratedML_dataset_titanic.passenger
TRAIN MODEL TitanicModel
TRAIN MODEL TitanicModel
TRAIN MODEL TitanicModel
TRAIN MODEL TitanicModel AS TrainedTitanic
SELECT MODEL_NAME, TRAINED_MODEL_NAME FROM INFORMATION_SCHEMA.ML_TRAINED_MODELS
MODEL_NAME | TRAINED_MODEL_NAME |
---|---|
TitanicModel | TitanicModel_t1 |
TitanicModel | TitanicModel_t2 |
TitanicModel | TitanicModel_t3 |
TitanicModel | TrainedTitanic |
Not Default
Each model definition has a default trained model. Without user-specification, the most recently trained model becomes the default. Using the NOT DEFAULT clause allows you to train a new model without the result becoming the default trained model:
CREATE MODEL TitanicModel PREDICTING (Survived) FROM IntegratedML_dataset_titanic.passenger
TRAIN MODEL TitanicModel As FirstModel
TRAIN MODEL TitanicModel As SecondModel NOT DEFAULT
SELECT MODEL_NAME, DEFAULT_TRAINED_MODEL_NAME FROM INFORMATION_SCHEMA.ML_MODELS
MODEL_NAME | DEFAULT_TRAINED_MODEL_NAME |
---|---|
TitanicModel | FirstModel |
Without using NOT DEFAULT, the DEFAULT_TRAINED_MODEL field would otherwise read “SecondModel”
USING Clause Considerations
You can pass provider-specific parameters in a USING clause for a more customized training run. This clause accepts a JSON string with one or more key-value pairs. The list of parameters that you can use depends on the provider.
For instance, when training with AutoML as your provider you can change the random seed:
TRAIN MODEL IsSpam USING {"seed": 3}
See Providers for information about which parameters you can pass for each provider.
Passing NULL Values
Passing data with NULL values in the label column, in a TRAIN MODEL statement, will result in a trained model with undefined behavior. Users should carefully screen for NULL values as part of their data preparation process.
Required Security Privileges
Calling TRAIN MODEL requires %MANAGE_MODEL privileges; otherwise, there is a SQLCODE –99 error (Privilege Violation). To assign %MANAGE_MODEL privileges, use the GRANT command.
Examples
TRAIN MODEL EmailFilter
TRAIN MODEL model_5 AS MyModel USING {"seed": 3}
TRAIN MODEL LoanDefault FROM LoanData