CREATE MODEL (SQL)
Synopsis
CREATE MODEL [ IF NOT EXISTS ] model-name PREDICTING ( label-column ) FROM model-source [ USING json-object ]
CREATE MODEL [ IF NOT EXISTS ] model-name PREDICTING ( label-column ) WITH feature-column-clause [ USING json-object ]
CREATE MODEL [ IF NOT EXISTS ] model-name PREDICTING ( label-column ) WITH feature-column-clause FROM model-source [ USING json-object ]
Arguments
This synopsis shows the valid forms of CREATE MODEL. The CREATE MODEL command must have either a FROM or WITH clause (or both).
model-name | The name for the model definition being created. A valid identifier, subject to the same additional naming restrictions as a table name. A model name is unqualified (modelname). An unqualified model name takes the default schema name. |
PREDICTING ( label-column ) | The name of the column being predicted, aka, the label column. A standard identifier. See details below. |
WITH feature-column-clause | Inputs to the model, aka the feature columns, as either the name of a column and it’s datatype or as a comma-separated list of the names of columns and datatypes. Each column name is a standard identifier. |
FROM model-source | The table or view from which the model is being built. This can be a table, view, or results of a join. |
USING json-object-string | Optional — A JSON string specifying one or more key-value pairs. See more details below. |
Description
The CREATE MODEL command creates a model definition of the structure specified. This includes, at a minimum:
-
The model name
-
The label column
-
The feature column(s)
Predicting
You must specify the output column (or label column) that your model predicts, given the input columns (or feature columns). For example, if you are designing a SpamFilter model which identifies emails that are spam mail, you may have a label column named IsSpam, which is a boolean value designating whether a given email is spam or not. You can also specify the data type of this column; otherwise, IntegratedML infers the type:
CREATE MODEL SpamFilter PREDICTING (IsSpam) FROM EmailData
CREATE MODEL SpamFilter PREDICTING (IsSpam binary) FROM EmailData
WITH and FROM
A model definition must contain a WITH and/or FROM to specify the schema characteristics of the model.
Using WITH, you can specify which input columns (features) to include in your model definition. Unless you use FROM in your statement, you must also specify the data type of each column:
CREATE MODEL SpamFilter PREDICTING (IsSpam) WITH (email_length int, subject_title varchar)
CREATE MODEL SpamFilter PREDICTING (IsSpam) WITH (email_length, subject_title) FROM EmailData
FROM allows you to use every single column from a specified table or view, without having to identify each column individually:
CREATE MODEL SpamFilter PREDICTING (IsSpam) FROM EmailData
This clause is fully general, and can specify any subquery expression. IntegratedML infers the data types of each column. By using FROM, you supply a default data set for future TRAIN MODEL statements using this model definition. You can use FROM along with WITH to both supply a default data set and to explicitly name feature columns.
Without a WITH clause, IntegratedML infers the data types of each column, and implicitly uses the result of the FROM clause as if it were the following query:
SELECT * FROM model-source
USING
You can specify a default USING clause for your model definition. This clause accepts a JSON string with one or more key-value pairs. When TRAIN MODEL is executed, by default the USING clause of the model definition is used. All parameters specified in the USING clause of your ML configuration overwrite those same parameters in the USING clause of your model definition.
You must make sure that the parameters you specify are recognized by the provider you select. Failing to do so may result in an error when training.
Required Security Privileges
Calling CREATE MODEL requires %MANAGE_MODEL privileges; otherwise, there is a SQLCODE –99 error (Privilege Violation). To assign %MANAGE_MODEL privileges, use the GRANT command.
Model Naming Conventions
Model names follow identifier conventions, subject to the restrictions below. By default, model names are simple identifiers. A model name should not exceed 256 characters. Model names are not case-sensitive. For further details, see the “Identifiers” chapter of Using InterSystems SQL.
InterSystems IRIS uses the model name to generate a corresponding class name. A class name contains only alphanumeric characters (letters and numbers) and must be unique within the first 96 characters. To generate this class name, InterSystems IRIS first strips punctuation characters from the model name, and then generates an identifier that is unique within the first 96 characters, substituting an integer (beginning with 0) for the final character when needed to create a unique class name. InterSystems IRIS generates a unique class name from a valid model name, but this name generation imposes the following restrictions on the naming of models:
-
A model name must include at least one letter. Either the first character of the view name or the first character after initial punctuation characters must be a letter
-
InterSystems IRIS supports 16-bit (wide) characters for model names. A character is a valid letter if it passes the $ZNAME test.
-
If the first character of the model name is a punctuation character, the second character cannot be a number. This results in an SQLCODE -400 error, with a %msg value of “ERROR #5053: Class name 'schema.name' is invalid” (without the punctuation character). For example, specifying the model name %7A generates the %msg “ERROR #5053: Class name 'User.7A' is invalid”.
-
Because generated class names do not include punctuation characters, it is not advisable (though possible) to create a model name that differs from an existing model name only in its punctuation characters. In this case, InterSystems IRIS substitutes an integer (beginning with 0) for the final character of the name to create a unique class name.
-
A model name may be much longer than 96 characters, but model names that differ in their first 96 alphanumeric characters are much easier to work with.
A model name can only be unqualified. An unqualified model name (viewname ) takes the system-wide default schema name.
Examples
CREATE MODEL PatientReadmit PREDICTING (IsReadmitted) FROM patient_table USING {"seed": 3}
CREATE MODEL PatientReadmit PREDICTING (IsReadmitted) WITH (age, gender, encounter_type, admit_reason, starttime, endtime, prior_visits, diagnosis, comorbitities)