Thanks for your feedback!
Need to tell us more? Click here or use the Feedback button.

Is this page helpful?

Contents

Accessing Architect
Creating a Domain
Model Elements
Save, Compile, and Build

Domain Architect

Important:

InterSystems has deprecated InterSystems IRIS® Natural Language Processing (NLP). It may be removed from future versions of InterSystems products. The following documentation is provided as reference for existing users only. Existing users who would like assistance identifying an alternative solution should contact the WRCOpens in a new tab.

InterSystems IRIS provides the Domain Architect as an interactive interface for creating and populating NLP domains and performing analysis on the indexed data. Domain Architect is accessed using the InterSystems IRIS Management Portal.

It consists of three tools:

Domain Architect (discussed on this page): for creating an NLP domain and populating it with source text data.
Domain Explorer: for analyzing the data in an NLP domain by looking at specific entities.
Indexing Results: for displaying how NLP analyzed the text data in a source, using highlighting to show different types of entities.

All functionality provided through the Domain Architect is also available by using ObjectScript to invoke NLP class methods and properties.

Accessing Architect

The starting point for accessing the Domain Architect is the Management Portal Analytics option. From there you select the Text Analytics option. This displays the Domain Architect option.

All NLP domains exist within a specific namespace.

A namespace must be enabled for NLP before it can be used. Selecting an enabled namespace displays the NLP Domain Architect option.

Note:

If selecting an enabled namespace does not display the Domain Architect option, you do not have a valid license for NLP. Look at Licensed to in the Management Portal header. Review or activate your license key.

Enabling a Namespace

A namespace must be enabled for NLP before it can be used with Domain Architect.

If the current namespace is enabled, the Domain Architect option displays the Domain Architect.
To select another enabled namespace, use the Switch option at the top of the Management Portal interface page. This displays the list of available namespaces, from which you can make your selection.
If the current namespace is not enabled, the Analytics option displays a list of analytics-enabled namespaces. Select one of these displayed namespaces.
If no namespaces are enabled, the Analytics option does not display any options.

To enable a namespace for NLP from the Management Portal, select System Administration, Security, Applications, Web Applications. This displays a list of web applications; the third column indicates if a listed item is a namespace (“Yes”) or not. Select the desired namespace name from the list. This display the Edit Web Application page. In the Enable section of the page select the Analytics check box. Click the Save button.

You cannot enable the %SYS namespace. This is because you cannot create NLP domains in the %SYS namespace.

You can set your Management Portal default namespace. From the Management Portal select System Administration, Security, Users. Select the name of the desired user. This allows you to edit the user definition. From the General tab, select a Startup Namespace from the drop-down list. Click Save.

Creating a Domain

From the Domain Architect press the New button to define a domain. You specify the following domain values (in the specified order):

Domain name: The name you assign to a domain must be unique for the current namespace (not just unique within its package class). A domain name may be of any length and contain any typeable characters, including spaces (the % character is valid, but should be avoided). Domain names are not case-sensitive. However, because Domain Architect uses the domain name to generate a default domain definition class name, it is recommended that you follow class naming conventions when naming a domain, unless there are compelling reasons to do otherwise.
Definition class name: the domain definition package name and class name, separated by a period. If you first specified the domain name, clicking on the Definition class name generates default names for the domain definition package and class. The package name defaults to User. The class name defaults to the domain name, stripped of non-alphanumeric characters. You can accept or modify this default.
A valid package name and a valid class name can contain only alphanumeric characters, and are case-sensitive. Specifying a package name that differs from an existing package name only in lettercase results in an error. Within a package, specifying a class name that differs from an existing class name only in lettercase results in an error.
Allow Custom Updates: optionally select this box if you wish to enable adding data or dictionaries to this domain manually; the default is to not allow custom updates.

Click the Finish button to create the domain. This displays the Model Elements selection screen.

You must Save and Compile a newly created domain before exiting that domain.

If you attempt to create a duplicate domain name, the Domain Architect issues a “Domain name already in use” error.

For other ways to create a domain, refer to NLP Domains. Note that Domain Architect is the only domain creation interface that allows you to define a domain definition package name and class name.

Opening a Domain

Creating a domain using the Management Portal interface immediately opens the domain, allowing you to begin immediately to manage this new domain.

To manage an existing domain, click the Open button to list all existing domains in the namespace. This display lists the packages that contain domains. Select a package to display its domains. Select an existing domain. This displays the Model Elements selection screen.

Modifying a Domain Definition

Creating or opening a domain displays the Model Elements window. If you click on the domain name in this window, the Details tab displays the Domain Name field, the Domain Tables Package field, and the Allow Custom Updates and Disabled check boxes. You can modify these characteristics of the domain:

Changing the Domain Name changes the Model Elements display name. It does not change the Definition class name displayed by the Open domain button.
Specifying the Domain Tables Package causes table projections of the domain’s data to be generated in the specified package.
Selecting the Allow Custom Updates check box allows the manual loading of data sources and dictionaries into this domain using interfaces other than Domain Architect.
Selecting the Disabled check box prevents the loading of all data (source data, metadata, dictionary matching data) during the Build operation. Each of these types of data also has its own Disabled check box that allows you to disable loading of each types of data separately.

You must Save and Compile a renamed domain before exiting that domain.

Deleting a Domain

To delete the current domain, click the Delete button. This displays the Drop domain data window. you can either delete just the domain contents or delete the domain definition. Click Drop domain & definition class to delete the domain and its associated class definition, including the specifications of data sources, skiplists, and other model elements.

Model Elements

After creating a domain, or opening an existing domain, you can define model elements for the domain. To add or modify model elements, click on the expansion triangle next to one of the headings. Initially, no expansion occurs. Once you have defined some model elements, clicking the expansion triangle shows the model elements you have defined.

To add a model element, click the heading. Then click the Add button shown in the Details tab on the right side. Specify the name and values. The model element is automatically generated when you leave the Details area. Model elements are listed in the order of their creation, with the most-recently-created element at the top of the list; modifying a model element does not change its position in the list.

To modify a model element, expand the heading, then click a defined model element. The current values are shown in the Details tab on the right side. Modify the name and/or values as desired. The model element is automatically re-generated when you leave the Details area.

Once you have created model elements, clicking on the Expand All button (or one of the expansion triangles) displays these defined values. The Element Type column shows the type of each model element. Clicking on the red “X” deletes that model element.

The Save button saves all changes. The Domain Architect page heading is followed by an asterisk (*) if there are unsaved changes. Click Save to save your changes.

The Undo button reverses the most recent unsaved change. You can click Undo repeatedly to reverse unsaved changes in the reverse order that they were made. Once changes are saved, this button disappears.

The following Model Elements are provided:

Domain Settings

This model element allows you to modify the characteristics of the domain. All Domain Settings are optional and take default values. Domain Settings provides the following options:

Languages: select one or more languages that you wish NLP to identify in the text data. If you check more than one language, automatic language identification is activated. This increases the processing required for texts. Therefore, you should not select multiple languages unless there is a real likelihood that texts in the selected language will be part of the data set. The default language is English.
Add Parameter: this button allows you to specify a domain parameter value. You can only add a domain parameter to an empty domain; this means that you must add all desired domain parameters before you Build the domain with Data Locations specified. Otherwise, the Compile to add, modify, or delete domain parameters fails with an error message; you can use the Delete button to drop domain contents to allow you to add, modify, or delete domain parameters.
To add a parameter, specify the domain parameter name and the new value. Domain parameter names are case-sensitive. You can use either name form. For example, Name=SortField, Value=1 or Name=$$$IKPSORTFIELD, Value=1. No validation is performed. All unspecified domain parameters take their default values. To view the parameters that you have added, expand the Domain Settings heading.
Maximum Concept Length: the largest number of words that should be indexed as a concept. This option is provided to prevent a long sequence of words from being indexed as a concept. The default (0) uses the language-specific default for the maximum number of words. This default should be used unless there are compelling reasons to modify it.
Manage User Dictionary: this button displays a “Manage User Dictionary” box that allows you to specify one or more strings to the user dictionary. Each specified string either specifies a string that will rewrite to a new string, or specifies a string to which you assign an attribute label from a drop-down list.

Metadata Fields

Add Metadata: this button allows you to specify a source metadata field. For each metadata field you specify the field name, the data type (String, Number, or Date), the supported operators, and the storage type. After creating a domain, you can optionally specify one or more metadata fields that you can use as criteria for filtering sources. A metadata field is data associated with an NLP data source that is not itself NLP indexed data. For example, the date and time that a text source was loaded is a metadata field for that source. Metadata fields must be defined before loading text data sources into a domain.

Case Sensitive check box: By default, a metadata field is not case-sensitive; you can select this check box to make it case-sensitive.

Disabled check box: You can select the Disabled check box to disable all metadata fields, or you can select the Disabled check box displayed with an individual metadata field to disable just that metadata field. A disabled field is not loaded during the Build operation.

The metadata fields that you specify here appear in the Data Locations Add data from table and Add data from query details under the title “Metadata mappings”.

Data Locations

Specifies the source for adding data. Option are Add data from table, Add data from query, Add data from files, Add RSS data, and Add data from global.

The Drop existing data before build check box allows you to specify whether source text data already indexed in this NLP domain should be deleted before adding the source text data specified here. To use this check box to drop data, data loading must not be disabled. To drop existing data without loading new data, use the Delete button Drop domain contents only option.
The Disabled check box allows you to disable source indexing; disabled source data is not loaded during the Build operation. If data loading is disabled, the Drop existing data before build check box is ignored.
A Build operation for a large number of texts may take some time. If you have already loaded the data locations and wish to add or modify metadata or a matching dictionary you can click the Data Locations Disabled check box to index these model elements without reloading the data locations.

After specifying data locations, you must Save and Compile the domain, then select the Build button to build the data indexes.

Add Data from Table

This option allows you to specify data stored in an existing SQL table in the current namespace. It provides the following fields:

Name: you can either specify a name or take the default name for the extracted result set table. Follows SQL table naming conventions. The default name is Table_1 (with the integer incrementing for each additional extracted result set table you define).
Batch Mode: a check box indicating whether or not to load source text data in batch mode.
Schema: from this drop-down list select an existing schema in the current namespace.
Table Name: from this drop-down list select an existing table in the selected schema.
ID Field: from this drop-down list select a field from the selected table to serve as the ID field (primary record identifier). An ID field must contain unique, non-null values.
Selecting –custom– from the drop-down list allows you to input a field name; for example, a hidden RowId field or a field that does not (yet) exist. Field names are not case-sensitive. Selecting –custom– also displays the Show Default Options button. This button selects the first non-hidden field in the table from the drop-down list and also allows you to return to the drop-down list of fields.
Group Field: an SQL select-item expression that retrieves a secondary record identifier from the selected table. This field defaults to the initial ID Field selection.
Selecting –custom– from the drop-down list allows you to input a field name; for example, a hidden RowId field or a field that does not (yet) exist. Field names are not case-sensitive. Selecting –custom– also displays the Show Default Options button. This button selects the first non-hidden field in the table from the drop-down list and also allows you to return to the drop-down list of fields.
Data Field: from this drop-down list select a field from the selected table to serve as the data field. The data field contains the text data loaded for NLP indexing. You can specify a field of data type %String or %Stream.GlobalCharacter (character stream data).
Selecting –custom– from the drop-down list allows you to input a field name; for example, a hidden RowId field or a field that does not (yet) exist. Field names are not case-sensitive. Selecting –custom– also displays the Show Default Options button. This button selects the first non-hidden field in the table from the drop-down list and also allows you to return to the drop-down list of fields.
Where Clause: you can optionally specify an SQL WHERE clause to limit which records are included in the result set table. Do not include the WHERE keyword. The Where Clause is not validated.

If you have defined one or more Metadata Fields for this domain, the Metadata mapping option allows you to specify a metadata field for this table. From the drop-down list you can select a field from the selected table, select – not mapped –, or select – custom –. If you select – custom – the Architect displays an empty field in which you can specify the custom mapping.

If you have not defined any Metadata Fields for this domain, the Metadata mapping option provides a Declare Metadata button that directs you to the Add Metadata domain option.

Add Data from Query

Add data from query is similar to Add data from table, but allows you to specify a fully-formed SQL query for an existing table (or tables), from which you provides the following fields:

Name: you can either specify a name or take the default name for the extracted result set table. Follows SQL table naming conventions. The default name is Query_1 (with the integer incrementing for each additional extracted result set table you define).
Batch Mode: a check box indicating whether or not to load source text data in batch mode.
SQL: the query text, an InterSystems SQL SELECT statement. Defining a query allows you to select fields from more than one table by using JOIN syntax. When specifying more than one table, assign column aliases to selected fields. Defining a query also allows you to specify an expression field that you can use as the Group field.
If the query text contains a syntax error or references a non-existent item, the Save button displays the SQLCODE error and message. When you acknowledge the error, the query is saved. You can go back into the SQL query text field to correct the error.
The following field selection drop-down lists display the selected fields. They do not display table alias prefixes. If the field has a column alias, this alias is listed rather than the field name.
ID Field: from this drop-down list select a field from the selected table to serve as the ID field. An ID field must contain unique, non-null values.
Group Field: from this drop-down list select a select-item expression (such as an SQL function expression) from the query to serve as a secondary record identifier (group field). For example, YEAR(EventDate).
Data Field: from this drop-down list select a field from the selected table to serve as the data field. The data field contains the text data loaded for NLP indexing.

If you have defined one or more Metadata Fields for this domain, the Metadata mapping option allows you to select either – not mapped – or – custom – for each defined metadata field. The default is – not mapped –. If you select – custom – the Architect displays an empty field in which you can specify the custom mapping.

If you have not defined any Metadata Fields for this domain, the Metadata mapping option provides a Declare Metadata button that directs you to the Add Metadata domain option.

The Model Elements window Element Type column displays a truncated form of the query you defined; the query is truncated after the first table name in the FROM clause. The full query is shown in the Details window.

Add Data from Files

This option allows you to specify data stored in files. It provides the following fields:

Name: you can either specify a name or take the default name for the extracted data file. The default name is File_1 (with the integer incrementing for each additional extracted data file you define).
Path: the complete directory path to the directory containing the desired files. The Path syntax is filesystem dependent; on a Windows system it might look like the following: C:\\temp\NLPSources\
Extensions: the file extension, such as txt or xml. Do not include the dot prefix when specifying the file extension. Specify multiple extensions as a comma-separated list with no dots and no spaces; for example, txt,xml. If specified, only files with the specified extensions are included in the resulting extracted data. If the Extensions field is left blank (the default) all files are included, regardless of their extensions.
Encoding: a drop-down list of the types of character set encoding to use to process the files.
Filter Condition: a condition used to restrict which files are to included in the resulting extracted data.
Recursive: a check box indicating whether to select files recursively. When checked, data can be extracted from the files in the specified directory and files in all of its subdirectories, and their sub-subdirectories, etc. When not checked, data can be extracted only from files in the specified directory. The default is non-recursive (check box not checked).
Batch Mode: a check box indicating whether or not to load source text data in batch mode.

Add RSS Data

This option allows you to specify data from an RSS stream feed. It provides the following fields:

Name: you can either specify a name or take the default name for the extracted data. The default name is RSS_1 (with the integer incrementing for each additional RSS source you define).
Batch Mode: a check box indicating whether or not to load source text data in batch mode.
Server Name: the name of the host server on which the URL is found.
URL: the navigation path within the server address to the actual RSS feed.
Text Elements: a comma-separated list of text elements to load from the RSS feed. For example title,description. Leave blank for defaults.

Add Data from Global

This option allows you to specify data from an InterSystems IRIS global. It provides the following fields:

Name: you can either specify a name or take the default name for the extracted data. The default name is Global_1 (with the integer incrementing for each additional global source you define).
Batch Mode: a check box indicating whether or not to load source text data in batch mode.
Global Reference: The global from which you wish to extract the source data.
Begin Subscript: the first global subscript in a range of subscripts to include.
End Subscript: the last global subscript in a range of subscripts to include.
Filter Condition: a condition used to restrict which files are to included in the resulting extracted data.

Skiplists

Define skiplists: After creating a domain, you can optionally create one or more skiplists for that domain. A skiplist is a list of terms (words or phrases) that you do not want a query to return. Thus a skiplist allows you to perform NLP operations that ignore specific terms in data sources loaded in the domain.

Name: specify the name of a new skiplist, or take the default name. Skiplist names are not case-sensitive. Specifying a duplicate skiplist name results in a compile error. The default name is Skiplist_1 (with the integer incrementing for each additional skiplist you define).
Entries: specify terms to include in the skiplist, one term per line. Terms should be in lower case. Duplicate terms are permitted. You can copy/paste terms from one skiplist to another. You can include blank lines to separate groups of terms. A line return at the end of your list of terms is optional; blank lines are not counted as entries.

If you add, modify, or delete a skiplist, you must Save and Compile the domain for this change to take effect.

Because defining skiplists has no effect on how data is loaded into a domain, changes to skiplists do not require re-building the domain.

Skiplists are compiled, then supplied to the Domain Explorer, which allows you to specify none, one, or multiple skiplists when performing analysis of source text data loaded into the domain. A skiplist is applied to some (but not all) Domain Explorer analytics.

Matching

The Matching option provides the Add Dictionary option to define a dictionary and specify its items and terms.

The Matching option provides four check box options, as follows:

Disabled: You can select the Disabled check box to disable building of all dictionaries, or you can select the Disabled check box displayed with an individual dictionary to disable the building of that dictionary. Selecting Disabled check boxes allows you to build only those dictionaries that you have changed. The default is off.
Drop Before Build: default on
Auto Execute: default on
Ignore Dictionary Errors: default on

Add Dictionary

The Add Dictionary button displays the dictionary definition options: dictionary name (with a supplied default), an optional description, the dictionary language selected from a drop-down list of NLP supported languages, and the disabled check box. The default name is Dictionary_1 (with the integer incrementing for each additional dictionary you define).

The Add Item button displays the item definition options: item name (with a supplied default), a uri name (with a supplied default), the item language selected from a drop-down list of NLP supported languages, and the disabled check box. To define more items, select the dictionary name. Items are listed in order of creation, with the most recent at the top of the list. Within each item you can define one or more terms. The default name is Item_1, the default uri name is uri:1 (with the integer incrementing for each additional item you define for this dictionary).

The Add Term button displays the term definition options: a string specifying the term, the term language selected from a drop-down list of NLP supported languages, and the disabled check box. To define more terms, select the item name. Terms are listed in order of creation, with the most recent at the top of the list.

Save, Compile, and Build

You must save, compile, and build a domain (in that order) using the buttons provided. You must save and compile a domain after adding, modifying, or deleting any Model Elements.

The Save button saves the current domain definition. Architect greys out (disables) the Save button if no domain definition is open. Architect does not issue an error if you save a domain definition without changing it.

The Compile button compiles the current domain definition. It compiles all of the classes and routines that comprise the domain definition. If you have not saved changes that you made to the domain definition, the compile operation prompts you to save the domain definition before compiling.

The Build button loads the specified sources into the current domain. If you have made changes to the Data Locations, Metadata Fields, or Matching dictionaries, you must build the domain. The Build Domain window displays progress messages such as the following:

13:50:48: Loading data...
13:51:49: Finished loading 3 sources
13:51:49: Creating dictionaries and profiles...
13:51:49: Finished creating 1 dictionaries, 1 items, 3 terms and 0 formats
13:51:49: Matching sources...
13:51:50: Finished matching sources
13:51:50: Successfully built domain 'mydomain'

The build operation can be time-consuming. If a Disabled check box is checked for a model element, the Build operation does not load the corresponding sources. Selecting Disabled check boxes allows you to build only those model elements that you have changed.

Domain Explorer

NLP Implementation