docs.intersystems.com
Home  /  Application Development: Analytics Options  /  Using InterSystems Natural Language Processing (NLP)  /  Domain Architect


Using InterSystems Natural Language Processing (NLP)
Domain Architect
[Back]  [Next] 
InterSystems: The power behind what matters   
Search:  


InterSystems IRIS Data Platform™ provides the Domain Architect as an interactive interface for creating and populating NLP domains and performing analysis on the indexed data. Domain Architect is accessed using the InterSystems IRIS Management Portal.
It consists of three tools:
All functionality provided through the Domain Architect is also available by using ObjectScript to invoke NLP class methods and properties.
Accessing Architect
The starting point for accessing the Domain Architect is the Management Portal Analytics option. From there you select the Text Analytics option. This displays the Domain Architect option.
All NLP domains exist within a specific namespace. Therefore, you must specify which namespace you wish to use by selecting the Switch option at the top of any Management Portal interface page. This displays the list of available namespaces, from which you can make your selection.
A namespace must be enabled for NLP before it can be used. Selecting an enabled namespace displays the NLP Domain Architect option.
Note:
If selecting an enabled namespace does not display the Domain Architect option, you do not have a valid license for NLP. Look at Licensed to in the Management Portal header. Review or activate your license key.
Enabling a Namespace
A namespace must be enabled for NLP before it can be used with Domain Architect.
To enable a namespace for NLP from the Management Portal, select System Administration, Security, Applications, Web Applications. This displays a list of web applications; the third column indicates if a listed item is a namespace (“Yes”) or not. Select the desired namespace name from the list. This display the Edit Web Application page. In the Enable section of the page select the Analytics check box. Click the Save button.
You cannot enable the %SYS namespace. This is because you cannot create NLP domains in the %SYS namespace.
You can set your Management Portal default namespace. From the Management Portal select System Administration, Security, Users. Select the name of the desired user. This allows you to edit the user definition. From the General tab, select a Startup Namespace from the drop-down list. Click Save.
Creating a Domain
From the Domain Architect press the New button to define a domain. You specify the following domain values (in the specified order):
Click the Finish button to create the domain. This displays the Model Elements selection screen.
You must Save and Compile a newly created domain before exiting that domain.
If you attempt to create a duplicate domain name, the Domain Architect issues a “Domain name already in use” error.
For other ways to create a domain, refer to NLP Domains. Note that Domain Architect is the only domain creation interface that allows you to define a domain definition package name and class name.
Opening a Domain
Creating a domain using the Management Portal interface immediately opens the domain, allowing you to begin immediately to manage this new domain.
To manage an existing domain, click the Open button to list all existing domains in the namespace. This display lists the packages that contain domains. Select a package to display its domains. Select an existing domain. This displays the Model Elements selection screen.
Changing the Domain Name and Check Boxes
Creating or opening a domain displays the Model Elements window. If you click on the domain name in this window, the Details tab displays the Domain Name field, the Domain Tables Package field, and the Allow Custom Updates and Disabled check boxes. You can modify these characteristics of the domain. Changing the Domain Name does not change the Definition class name.
Checking the Allow Custom Updates check box allows the manual loading of data sources and dictionaries into this domain using interfaces other than Domain Architect.
Checking the Disabled check box prevents the loading of all data (source data, metadata, dictionary matching data) during the Build operation. Each of these types of data also has its own Disabled check box that allows you to disable loading of each types of data separately.
You must Save and Compile a renamed domain before exiting that domain.
Deleting a Domain
To delete the current domain, click the Delete button. This displays the Drop domain data window. you can either delete just the domain contents or delete the domain definition. Click Drop domain & definition class to delete the domain and its associated class definition, including the specifications of data sources, blacklists, and other model elements.
Model Elements
After creating a domain, or opening an existing domain, you can define model elements for the domain. To add or modify model elements, click on the expansion triangle next to one of the headings. Initially, no expansion occurs. Once you have defined some model elements, clicking the expansion triangle shows the model elements you have defined.
To add a model element, click the heading. Then click the Add button shown in the Details tab on the right side. Specify the name and values. The model element is automatically generated when you leave the Details area. Model elements are listed in the order of their creation, with the most-recently-created element at the top of the list; modifying a model element does not change its position in the list.
To modify a model element, expand the heading, then click a defined model element. The current values are shown in the Details tab on the right side. Modify the name and/or values as desired. The model element is automatically re-generated when you leave the Details area.
Once you have created model elements, clicking on the Expand All button (or one of the expansion triangles) displays these defined values. The Element Type column shows the type of each model element. Clicking on the red “X” deletes that model element.
The Save button saves all changes. The Domain Architect page heading is followed by an asterisk (*) if there are unsaved changes. Click Save to save your changes.
The Undo button reverses the most recent unsaved change. You can click Undo repeatedly to reverse unsaved changes in the reverse order that they were made. Once changes are saved, this button disappears.
The following Model Elements are provided:
Domain Settings
This model element allows you to modify the characteristics of the domain. All Domain Settings are optional and take default values. Domain Settings provides the following options:
Metadata Fields
Add Metadata: this button allows you to specify a source metadata field. For each metadata field you specify the field name, the data type (String, Number, or Date), the supported operators, and the storage type. After creating a domain, you can optionally specify one or more metadata fields that you can use as criteria for filtering sources. A metadata field is data associated with an NLP data source that is not itself NLP indexed data. For example, the date and time that a text source was loaded is a metadata field for that source. Metadata fields must be defined before loading text data sources into a domain.
Case Sensitive check box: By default, a metadata field is not case-sensitive; you can select this check box to make it case-sensitive.
Disabled check box: You can select the Disabled check box to disable all metadata fields, or you can select the Disabled check box displayed with an individual metadata field to disable just that metadata field. A disabled field is not loaded during the Build operation.
The metadata fields that you specify here appear in the Data Locations Add data from table and Add data from query details under the title “Metadata mappings”.
Data Locations
Specifies the source for adding data. Option are Add data from table, Add data from query, Add data from files, Add RSS data, and Add data from global.
After specifying data locations, you must Save and Compile the domain, then select the Build button to build the data indices.
Add Data from Table
This option allows you to specify data stored in an existing SQL table in the current namespace. It provides the following fields:
If you have defined one or more Metadata Fields for this domain, the Metadata mapping option allows you to specify a metadata field for this table. From the drop-down list you can select a field from the selected table, select – not mapped –, or select – custom –. If you select – custom – the Architect displays an empty field in which you can specify the custom mapping.
If you have not defined any Metadata Fields for this domain, the Metadata mapping option provides a Declare Metadata button that directs you to the Add Metadata domain option.
Add Data from Query
Add data from query is similar to Add data from table, but allows you to specify a fully-formed SQL query for an existing table (or tables), from which you provides the following fields:
If you have defined one or more Metadata Fields for this domain, the Metadata mapping option allows you to select either – not mapped – or – custom – for each defined metadata field. The default is – not mapped –. If you select – custom – the Architect displays an empty field in which you can specify the custom mapping.
If you have not defined any Metadata Fields for this domain, the Metadata mapping option provides a Declare Metadata button that directs you to the Add Metadata domain option.
The Model Elements window Element Type column displays a truncated form of the query you defined; the query is truncated after the first table name in the FROM clause. The full query is shown in the Details window.
Add Data from File
This option allows you to specify data stored in files. It provides the following fields:
Add RSS Data
This option allows you to specify data from an RSS stream feed. It provides the following fields:
Add Data from Global
This option allows you to specify data from an InterSystems IRIS global. It provides the following fields:
Blacklists
Define blacklists: After creating a domain, you can optionally create one or more blacklists for that domain. A blacklist is a list of terms (words or phrases) that you do not want a query to return. Thus a blacklist allows you to perform NLP operations that ignore specific terms in data sources loaded in the domain.
If you add, modify, or delete a blacklist, you must Save and Compile the domain for this change to take effect.
Because defining blacklists has no effect on how data is loaded into a domain, changes to blacklists do not require re-building the domain.
Blacklists are compiled, then supplied to the Domain Explorer, which allows you to specify none, one, or multiple blacklists when performing analysis of source text data loaded into the domain. A blacklist is applied to some (but not all) Domain Explorer analytics.
Matching
The Matching option provides the Add Dictionary option to define a dictionary and specify its items and terms.
The Matching option provides four check box options, as follows:
Add Dictionary
The Add Dictionary button displays the dictionary definition options: dictionary name (with a supplied default), an optional description, the dictionary language selected from a drop-down list of NLP supported languages, and the disabled check box. The default name is Dictionary_1 (with the integer incrementing for each additional dictionary you define).
The Add Item button displays the item definition options: item name (with a supplied default), a uri name (with a supplied default), the item language selected from a drop-down list of NLP supported languages, and the disabled check box. To define more items, select the dictionary name. Items are listed in order of creation, with the most recent at the top of the list. Within each item you can define one or more terms. The default name is Item_1, the default uri name is uri:1 (with the integer incrementing for each additional item you define for this dictionary).
The Add Term button displays the term definition options: a string specifying the term, the term language selected from a drop-down list of NLP supported languages, and the disabled check box. To define more terms, select the item name. Terms are listed in order of creation, with the most recent at the top of the list.
Save, Compile, and Build
You must save, compile, and build a domain (in that order) using the buttons provided. You must save and compile a domain after adding, modifying, or deleting any Model Elements.
The Save button saves the current domain definition. Architect greys out (disables) the Save button if no domain definition is open. Architect does not issue an error if you save a domain definition without changing it.
The Compile button compiles the current domain definition. It compiles all of the classes and routines that comprise the domain definition. If you have not saved changes that you made to the domain definition, the compile operation prompts you to save the domain definition before compiling.
The Build button loads the specified sources into the current domain. If you have made changes to the Data Locations, Metadata Fields, or Matching dictionaries, you must build the domain. The Build Domain window displays progress messages such as the following:
13:50:48: Loading data...
13:51:49: Finished loading 3 sources
13:51:49: Creating dictionaries and profiles...
13:51:49: Finished creating 1 dictionaries, 1 items, 3 terms and 0 formats
13:51:49: Matching sources...
13:51:50: Finished matching sources
13:51:50: Successfully built domain 'mydomain'
The build operation can be time-consuming. If a Disabled check box is checked for a model element, the Build operation does not load the corresponding sources. Selecting Disabled check boxes allows you to build only those model elements that you have changed.
Domain Explorer
There are two ways to access the Domain Explorer:
The Domain Explorer is a Zen page query display interface with broad application. It shows a wealth of information about the source text data indexed in a domain. It initially displays a list of either the top (most-frequently-occurring) concepts, or the dominant (highest dominance) concepts. You can toggle between these two lists.
If you select an entity, the Domain Explorer provides analysis of similar entities and related concepts, and analysis of the appearance of the specified entity in larger text units (sources, paths, and CRCs). This provides a contextual at-a-glance view of what's in your data.
The Domain Explorer provides generic filters that support selecting subsets of the sources in a domain based on metadata criteria. This interface provides a sample of how NLP Smart Indexing can be used to quickly overview and navigate a large set of documents.
Selecting a Domain
By default, the Domain Explorer displays analysis of the domain that was current in Domain Architect when you invoked the Domain Explorer.
To select another domain:
  1. Select the Gear icon at the upper right of the Domain Explorer. This displays the Settings box.
  2. The Settings box contains the Switch domain drop-down list. Select a domain from this list.
The number at the top right of the Domain Explorer is the number of sources loaded in the selected domain that are available for data analysis. This number can be limited by applying filters.
Listing All Concepts
The Domain Explorer initially provides concept analysis of the data sources loaded in the domain. There are two ways to list concepts, by frequency or by dominance. You can toggle between these two by selecting the frequency or dominance button:
Analyzing a Specified Entity
There are two ways to display analysis of a specific entity:
Selecting an entity displays two kinds of analysis of that entity: associated entities and specified entity in context.
Associated Entities
Selecting an entity displays the following listings:
Selecting an entity from the Similar Entities, Related Concepts, or Proximity Profile listings changes all listings to analysis of that entity. It does not change the Top Concepts and Dominant Concepts listings.
Entity in Context
Selecting an entity also displays the following listings of that entity in context:
Note:
If Japanese is the only language supported for the domain, the Domain Explorer display differs as follows: the Related Concepts and CRCs listings are not shown. An Entity Vectors listing is substituted for the Paths listing.
Full Text Box
The Eye icon displays the full text of a selected source. This text box is identified by the external ID of the source. For example, :SQL:1171:1171.
The source text is tagged as follows:
This full text box provides the following option buttons:
Limiting the Sources to Analyze
You can limit the scope of your data analysis by using filters. A filter includes or excludes data sources that are loaded in the domain from analysis. By default, the Domain Explorer analyzes all data sources loaded in the domain.
Indexing Results
You can access the Indexing Results tool in two ways:
At the top left of the Indexing Results window is a drop-down list that shows the sources loaded into the domain. (The domain is shown in the drop-down list at the top right.) Select a data source from the drop-down list, then press the manual input button.
This displays three listings: Indexed Sentences, Concepts, and CRCs
Indexed Sentences
The sentences in the source are listed in order, one sentence per line, with NLP indexing indicated by highlighting. The sentence text is highlighted as follows:
Concepts and CRCs
The Indexing Results displays two listings, one of all concepts in the source, one of all of the CRCs in the source
The sort by buttons at the top of the window allow you to toggle the Concepts and CRCs listings to display either frequency counts or dominance values in descending order.
In the Concepts listing, the most dominant concept(s) are given a dominance of 1000. Less dominant concepts are given smaller integer values, with larger sources tending to have lower least-dominant values. For example, a source containing 25 concepts might have a dominance range between 1000 and 83; a source containing 300 concepts might have a dominance range between 1000 and 2.
Note:
If Japanese is the only language supported for the domain, the Indexing Results display substitutes a single Entities listing for the Concepts and CRCs listings.