Time series models are available in InterSystems IRIS 2023.2 as an Experimental Feature. This means they are not supported for production environments. However, the feature is well-tested and InterSystems believes it can add significant value to customers.
The CREATE MODEL command creates a model definition of the structure specified. This includes, at a minimum:
Regression and classification models are largely created in the same way and have the same considerations. However, time series models employ a slightly different syntax because they require different considerations. These differences between these types of models are enumerated in the applicable clauses below.
You must specify the output column (or label column) that your model predicts, given the input columns (or feature columns). For example, if you are designing a SpamFilter model which identifies emails that are spam mail, you may have a label column named IsSpam, which is a boolean value designating whether a given email is spam or not. You can also specify the data type of this column; otherwise, IntegratedML infers the type:
CREATE MODEL SpamFilter PREDICTING (IsSpam) FROM EmailData
CREATE MODEL SpamFilter PREDICTING (IsSpam binary) FROM EmailData
When creating a time series model, you will often want to predict values for multiple columns. To do so, specify the names of the columns that you would like to predict in a comma-separated list. You may also specify the data type of this column; otherwise, IntegratedML infers the type. To specify that the model should predict values for every column in the table, use an asterisk (*).
CREATE TIME SERIES MODEL WeatherForecast PREDICTING (Temp, Precipitation, Humidity, UVIndex) BY (Date) FROM WeatherData
CREATE TIME SERIES MODEL WeatherForecast PREDICTING (*) BY (DATE) FROM WeatherData
WITH and FROM
A classification or regression model definition must contain a WITH or FROM or both to specify the schema characteristics of the model. A time series model must contain a FROM clause and cannot have a WITH.
Using WITH, you can specify which input columns (features) to include in your model definition. Note that you must specify the data type of each column, even when using a FROM clause in your statement:
CREATE MODEL SpamFilter PREDICTING (IsSpam) WITH (email_length int, subject_title varchar)
CREATE MODEL SpamFilter PREDICTING (IsSpam) WITH (email_length int, subject_title varchar) FROM EmailData
FROM allows you to use every single column from a specified table or view, without having to identify each column individually:
CREATE MODEL SpamFilter PREDICTING (IsSpam) FROM EmailData
This clause is fully general, and can specify any subquery expression. IntegratedML infers the data types of each column. By using FROM, you supply a default data set for future TRAIN MODEL statements using this model definition. You can use FROM along with WITH to both supply a default data set and to explicitly name feature columns.
Without a WITH clause, IntegratedML infers the data types of each column, and implicitly uses the result of the FROM clause as if it were the following query:
SELECT * FROM model-source
You can specify a default USING clause for your model definition. This clause accepts a JSON string with one or more key-value pairs. When TRAIN MODEL is executed, by default the USING clause of the model definition is used. All parameters specified in the USING clause of your ML configuration overwrite those same parameters in the USING clause of your model definition.
You must make sure that the parameters you specify are recognized by the provider you select. Failing to do so may result in an error when training.
Time Series Parameters
Time series models also support four optional parameters in a USING clause:
FORWARD specifies the number of timesteps in the future that you would like to predict as a positive integer. Predicted rows will appear after the latest time or date in the original dataset. You may specify both this and the BACKWARD setting at the same time.
BACKWARD specifies the number of timesteps in the past that you would like to predict as a positive integer. Predicted rows will appear before the earliest time or date in the original dataset. You may specify both this and the FORWARD setting at the same time. This option is only used with the DataRobot provider.
FREQUENCY specifies both the size and unit of the predicted timesteps as a positive integer followed by a letter that denotes the unit of time. If this value is not specified, the most common timestep in the data is supplied.
The letter abbreviations for units of time are outlined in the following table:
||Unit of Time
Required Security Privileges
Calling CREATE MODEL requires %MANAGE_MODEL privileges; otherwise, there is a SQLCODE –99 error (Privilege Violation). To assign %MANAGE_MODEL privileges, use the GRANT command.
Model Naming Conventions
Model names follow identifier conventions, subject to the restrictions below. By default, model names are simple identifiers. A model name should not exceed 256 characters. Model names are not case-sensitive. For further details, see the “Identifiers” chapter of Using InterSystems SQL.
InterSystems IRIS uses the model name to generate a corresponding class name. A class name contains only alphanumeric characters (letters and numbers) and must be unique within the first 96 characters. To generate this class name, InterSystems IRIS first strips punctuation characters from the model name, and then generates an identifier that is unique within the first 96 characters, substituting an integer (beginning with 0) for the final character when needed to create a unique class name. InterSystems IRIS generates a unique class name from a valid model name, but this name generation imposes the following restrictions on the naming of models:
A model name must include at least one letter. Either the first character of the view name or the first character after initial punctuation characters must be a letter
InterSystems IRIS supports 16-bit (wide) characters for model names. A character is a valid letter if it passes the $ZNAME test.
If the first character of the model name is a punctuation character, the second character cannot be a number. This results in an SQLCODE -400 error, with a %msg value of “ERROR #5053: Class name 'schema.name' is invalid” (without the punctuation character). For example, specifying the model name %7A generates the %msg “ERROR #5053: Class name 'User.7A' is invalid”.
Because generated class names do not include punctuation characters, it is not advisable (though possible) to create a model name that differs from an existing model name only in its punctuation characters. In this case, InterSystems IRIS substitutes an integer (beginning with 0) for the final character of the name to create a unique class name.
A model name may be much longer than 96 characters, but model names that differ in their first 96 alphanumeric characters are much easier to work with.
A model name can only be unqualified. An unqualified model name (viewname) takes the system-wide default schema name.