Skip to main content

TRAIN MODEL (SQL)

Trains a machine learning model.

Synopsis

TRAIN MODEL model-name
      [ AS preferred-name ]
      [ NOT DEFAULT]
      [ FOR label-column ]
      [ WITH feature-column-clause ]
      [ FROM model-source ]
      [ USING json-object ] 

Arguments

model-name The name of the machine learning model to train.
AS preferred-name Optional — An alternative name to save the trained model as. See details below.
NOT DEFAULT Optional — A clause to train a model without setting it as the default trained model. See details below.
FOR label-column Optional — The name of the column being predicted, aka, the label column. See details below.
WITH feature-column-clause Optional — Inputs to the model, aka the feature columns, as either the name of a column or as a comma-separated list of the names of columns.
FROM model-source The table or view from which the model is being built. This can be a table, view, or results of a join. See details below.
USING json-object-string Optional — A JSON string specifying one or more key-value pairs. See details below.

Description

The TRAIN MODEL statement tells a provider to train a model using the specified model definition. The provider is specified by the ML configuration.

FROM

The FROM clause supplies the data for training your model.

  • This clause is required if your CREATE MODEL statement did NOT specify a FROM clause.

  • This clause is optional if your CREATE MODEL statement specified a FROM clause.

Examples highlighting acceptable use and omission of FROM:

FROM in TRAIN MODEL

CREATE MODEL model_b PREDICTING ( label ) WITH ( column_1, column_2, column_3)
TRAIN MODEL model_b FROM table

FROM in CREATE MODEL

CREATE MODEL model_a PREDICTING ( label ) FROM table
TRAIN MODEL model_a
Note:

Omitting FROM from your TRAIN MODEL statement means that you use the default query from CREATE MODEL.

WITH

WITH allows you to explicitly match the feature columns in your data to the model definition schema. Each column is a standard identifier.

FOR

FOR allows you to explicitly match the label column in your data to the model definition schema. For example, if your label column in your model definition is named column_a but is named column_b in your training data, you can match the columns as follows:

CREATE MODEL model_a PREDICTING ( column_a ) FROM table_a
TRAIN MODEL model_a FOR column_b FROM table_b

Naming

AS allows you to explicitly name your trained model.

Model definitions and trained models exist in the same schema. If a trained model is not explicitly named with AS, its name consists of the model definition name with an appended running integer. We can see the difference by querying the INFORMATION_SCHEMA.ML_TRAINED_MODELS table:

CREATE MODEL TitanicModel PREDICTING (Survived binary) FROM IntegratedML_dataset_titanic.passenger
TRAIN MODEL TitanicModel
TRAIN MODEL TitanicModel
TRAIN MODEL TitanicModel
TRAIN MODEL TitanicModel AS TrainedTitanic
SELECT MODEL_NAME, TRAINED_MODEL_NAME FROM INFORMATION_SCHEMA.ML_TRAINED_MODELS
MODEL_NAME TRAINED_MODEL_NAME
TitanicModel TitanicModel_t1
TitanicModel TitanicModel_t2
TitanicModel TitanicModel_t3
TitanicModel TrainedTitanic

Not Default

Each model definition has a default trained model. Without user-specification, the most recently trained model becomes the default. Using the NOT DEFAULT clause allows you to train a new model without the result becoming the default trained model:

CREATE MODEL TitanicModel PREDICTING (Survived) FROM IntegratedML_dataset_titanic.passenger
TRAIN MODEL TitanicModel As FirstModel
TRAIN MODEL TitanicModel As SecondModel NOT DEFAULT
SELECT MODEL_NAME, DEFAULT_TRAINED_MODEL_NAME FROM INFORMATION_SCHEMA.ML_MODELS
MODEL_NAME DEFAULT_TRAINED_MODEL_NAME
TitanicModel FirstModel

Without using NOT DEFAULT, the DEFAULT_TRAINED_MODEL field would otherwise read “SecondModel”

USING Clause Considerations

You can pass provider-specific parameters in a USING clause for a more customized training run. This clause accepts a JSON string with one or more key-value pairs. The list of parameters that you can use depends on the provider.

For instance, when training with AutoML as your provider you can change the random seed:

TRAIN MODEL IsSpam USING {"seed": 3}

See Providers for information about which parameters you can pass for each provider.

Passing NULL Values

Passing data with NULL values in the label column, in a TRAIN MODEL statement, will result in a trained model with undefined behavior. Users should carefully screen for NULL values as part of their data preparation process.

Required Security Privileges

Calling TRAIN MODEL requires %MANAGE_MODEL privileges; otherwise, there is a SQLCODE –99 error (Privilege Violation). To assign %MANAGE_MODEL privileges, use the GRANT command.

Examples

TRAIN MODEL EmailFilter
TRAIN MODEL model_5 AS MyModel USING {"seed": 3}
TRAIN MODEL LoanDefault FROM LoanData

See Also

FeedbackOpens in a new tab