Creates a model definition.
Description
The CREATE MODEL command creates a model definition of the
structure specified. This includes, at a minimum:
-
The model name
-
The label column
-
The feature column(s)
Predicting
You must specify the output column (or label column) that your
model predicts, given the input columns (or feature columns). For
example, if you are designing a SpamFilter model
which identifies emails that are spam mail, you may have a label column
named IsSpam, which is a boolean value designating
whether a given email is spam or not. You can also specify the data
type of this column; otherwise, IntegratedML infers the type:
CREATE MODEL SpamFilter PREDICTING (IsSpam) FROM EmailData
CREATE MODEL SpamFilter PREDICTING (IsSpam binary) FROM EmailData
WITH and FROM
A model definition must contain a WITH and/or FROM to specify the schema characteristics of the model.
WITH
Using WITH, you can specify which input columns
(features) to include in your model definition. Note that you must
specify the data type of each column, even when using a FROM clause in your statement:
CREATE MODEL SpamFilter PREDICTING (IsSpam) WITH (email_length int, subject_title varchar)
CREATE MODEL SpamFilter PREDICTING (IsSpam) WITH (email_length int, subject_title varchar) FROM EmailData
FROM
FROM allows you to use every single column
from a specified table or view, without having to identify each column
individually:
CREATE MODEL SpamFilter PREDICTING (IsSpam) FROM EmailData
This clause is fully general, and can specify any subquery expression.
IntegratedML infers the data types of each column. By using FROM, you supply a default data set for future TRAIN MODEL statements using this model definition. You
can use FROM along with WITH to both supply a default data set and to explicitly name feature
columns.
Without a WITH clause, IntegratedML infers
the data types of each column, and implicitly uses the result of the FROM clause as if it were the following query:
SELECT * FROM model-source
USING
You can specify a default USING clause for
your model definition. This clause accepts a JSON string with one
or more key-value pairs. When TRAIN MODEL is executed,
by default the USING clause of the model definition
is used. All parameters specified in the USING clause
of your ML configuration overwrite those same parameters in the USING clause of your model definition.
You must make sure that the parameters you specify are recognized
by the provider you select. Failing to do so may result in an error
when training.
Required Security Privileges
Calling CREATE MODEL requires %MANAGE_MODEL
privileges; otherwise, there is a SQLCODE –99 error (Privilege
Violation). To assign %MANAGE_MODEL privileges, use the GRANT command.
Model Naming Conventions
Model names follow identifier conventions, subject to the restrictions
below. By default, model names are simple identifiers. A model name
should not exceed 256 characters. Model names are not case-sensitive.
For further details, see the “Identifiers” chapter of Using InterSystems SQL.
InterSystems IRIS uses the model name to generate a corresponding
class name. A class name contains only alphanumeric characters (letters
and numbers) and must be unique within the first 96 characters. To
generate this class name, InterSystems IRIS first strips punctuation
characters from the model name, and then generates an identifier that
is unique within the first 96 characters, substituting an integer
(beginning with 0) for the final character when needed to create a
unique class name. InterSystems IRIS generates a unique class name
from a valid model name, but this name generation imposes the following
restrictions on the naming of models:
-
A model name must include at least one letter. Either
the first character of the view name or the first character after
initial punctuation characters must be a letter
-
InterSystems IRIS supports 16-bit (wide) characters
for model names. A character is a valid letter if it passes the $ZNAME test.
-
If the first character of the model name is a punctuation
character, the second character cannot be a number. This results in
an SQLCODE -400 error, with a %msg value of “ERROR #5053: Class
name 'schema.name' is invalid” (without the punctuation character).
For example, specifying the model name %7A generates the %msg “ERROR
#5053: Class name 'User.7A' is invalid”.
-
Because generated class names do not include punctuation
characters, it is not advisable (though possible) to create a model
name that differs from an existing model name only in its punctuation
characters. In this case, InterSystems IRIS substitutes an integer
(beginning with 0) for the final character of the name to create a
unique class name.
-
A model name may be much longer than 96 characters,
but model names that differ in their first 96 alphanumeric characters
are much easier to work with.
A model name can only be unqualified. An unqualified model name
(viewname ) takes the system-wide default schema name.