Description
The TRAIN MODEL statement tells a provider to train a model
using the specified model definition. The provider is specified by
the ML configuration.
FROM
The FROM clause supplies the data for training
your model.
Examples highlighting acceptable use and omission of FROM:
FROM in TRAIN MODEL
CREATE MODEL model_b PREDICTING ( label ) WITH ( column_1, column_2, column_3)
TRAIN MODEL model_b FROM table
FROM in CREATE MODEL
CREATE MODEL model_a PREDICTING ( label ) FROM table
TRAIN MODEL model_a
Note:
Omitting FROM from your TRAIN MODEL statement means that you use the default query from CREATE
MODEL.
WITH
WITH allows you to explicitly match the feature
columns in your data to the model definition schema. Each column is
a standard identifier.
FOR
FOR allows you to explicitly match the label
column in your data to the model definition schema. For example, if
your label column in your model definition is named column_a but is named column_b in your training data,
you can match the columns as follows:
CREATE MODEL model_a PREDICTING ( column_a ) FROM table_a
TRAIN MODEL model_a FOR column_b FROM table_b
Naming
AS allows you to explicitly name your trained
model.
Model definitions and trained models exist in the same schema.
If a trained model is not explicitly named with AS, its name consists of the model definition name with an appended
running integer. We can see the difference by querying the INFORMATION_SCHEMA.ML_TRAINED_MODELS
table:
CREATE MODEL TitanicModel PREDICTING (Survived binary) FROM IntegratedML_dataset_titanic.passenger
TRAIN MODEL TitanicModel
TRAIN MODEL TitanicModel
TRAIN MODEL TitanicModel
TRAIN MODEL TitanicModel AS TrainedTitanic
SELECT MODEL_NAME, TRAINED_MODEL_NAME FROM INFORMATION_SCHEMA.ML_TRAINED_MODELS
MODEL_NAME |
TRAINED_MODEL_NAME |
TitanicModel |
TitanicModel_t1 |
TitanicModel |
TitanicModel_t2 |
TitanicModel |
TitanicModel_t3 |
TitanicModel |
TrainedTitanic |
Not Default
Each model definition has a default trained model. Without user-specification,
the most recently trained model becomes the default. Using the NOT DEFAULT clause allows you to train a new model without
the result becoming the default trained model:
CREATE MODEL TitanicModel PREDICTING (Survived) FROM IntegratedML_dataset_titanic.passenger
TRAIN MODEL TitanicModel As FirstModel
TRAIN MODEL TitanicModel As SecondModel NOT DEFAULT
SELECT MODEL_NAME, DEFAULT_TRAINED_MODEL_NAME FROM INFORMATION_SCHEMA.ML_MODELS
MODEL_NAME |
DEFAULT_TRAINED_MODEL_NAME |
TitanicModel |
FirstModel |
Without using NOT DEFAULT, the DEFAULT_TRAINED_MODEL
field would otherwise read “SecondModel”
USING Clause Considerations
You can pass provider-specific parameters in a USING clause for a more customized training run. This clause accepts a
JSON string with one or more key-value pairs. The list of parameters
that you can use depends on the provider.
For instance, when training with AutoML as your provider you
can change the random seed:
TRAIN MODEL IsSpam USING {"seed": 3}
See Providers for information about which parameters you can pass for each provider.
Passing NULL Values
Passing data with NULL values in the label column, in a TRAIN MODEL statement, will result in a trained model with
undefined behavior. Users should carefully screen for NULL values
as part of their data preparation process.
Required Security Privileges
Calling TRAIN MODEL requires %MANAGE_MODEL
privileges; otherwise, there is a SQLCODE –99 error (Privilege
Violation). To assign %MANAGE_MODEL privileges, use the GRANT command.