Skip to main content

InterSystems IRIS Natural Language Processing (NLP) Tools

Important:

InterSystems has deprecatedOpens in a new tab InterSystems IRIS® Natural Language Processing (NLP). It may be removed from future versions of InterSystems products. The following documentation is provided as reference for existing users only. Existing users who would like assistance identifying an alternative solution should contact the WRCOpens in a new tab.

The primary NLP interface for creating a domain and populating it with data is the Domain Architect. The principal interface for analyzing NLP data is through APIs containing class object methods and properties which may be invoked from ObjectScript programs. The NLP tools described in this chapter are meant to assist in maintaining NLP functionality and in testing and examining NLP data. All NLP functionality provided here is also available using the iKnow APIs.

The NLP tools described in this chapter are:

NLP Shell Interface

The InterSystems IRIS Natural Language Processing Shell can be used to return information about existing domains and indexed sources.

All NLP operations occur within a namespace. Therefore, before invoking the NLP Shell, you should specify the desired namespace using the $namespace variable.

From the Terminal you can activate the NLP Shell interface as follows:

USER>DO $System.iKnow.Shell()

This returns the NLP Shell prompt. Typing ? at the NLP Shell prompt displays a list of the NLP Shell commands. Typing an NLP Shell command followed by space and a ? displays information about that command.

The NLP Shell commands and options are case-sensitive and must be specified in all lowercase letters.

List, Show, and Summarize Sources

The following NLP Shell example makes use of an existing domain named “mydomain” in the current namespace. The domain “mydomain” contains 100 sources. The use domain mydomain command specifies using “mydomain”, making it the current domain for the NLP Shell. The list source command lists the sources in the current domain. By default, a list command lists the first page, with a page size of ten items. To list additional pages, you can specify the > command, as shown in this example. To change the page size to 20 items, specify use pagesize 20.

The show source 92 command lists the contents of source 92, broken into sentences. It lists the first page of 10 sentences; you can use the > command to list additional pages of sentences. The text of each sentence is prefaced by its sentence Id, and is followed by a boolean indicating whether the Shell had to truncate the displayed sentence text. (For the purpose of this example, I have added line wraps to the sentence texts.) The show summary 92 6 command summarizes the contents of source 92 to 6 sentences, and displays these 6 sentences. If the summary number is larger than the number of sentences in the source, show summary displays all of the sentences in the source.

USER>DO $System.iKnow.Shell()
 
Welcome to the iKnow shell
Type '?' for help
Type 'quit' to exit
 
iKnow> use domain mydomain
Current domain: mydomain (1)
iKnow> list source
 srcId          externalId
   100    :SQL:Accident:96
    99    :SQL:Accident:98
    98    :SQL:Accident:94
    97   :SQL:Accident:100
    96    :SQL:Accident:80
    95    :SQL:Accident:99
    94    :SQL:Accident:95
    93    :SQL:Accident:88
    92    :SQL:Accident:97
    91    :SQL:Incident:90
iKnow> >
 srcId         externalId
    90   :SQL:Accident:93
    89   :SQL:Accident:92
    88   :SQL:Accident:91
    87   :SQL:Accident:85
    86   :SQL:Accident:86
    85   :SQL:Accident:89
    84   :SQL:Accident:87
    83   :SQL:Accident:83
    82   :SQL:Accident:78
    81   :SQL:Accident:82
iKnow> show source 92
 sentId       sentenceValue                                                                  sentenceIsTruncated
   5090       On March 7, 2001, about 1500 Alaska standard time, a wheel/ski equipped 
              Cessna 180 airplane, N9383C, sustained substantial damage during takeoff from 
              a snow-covered area at Ophir, Alaska.                                          0
   5091       The airplane was being operated as a visual flight rules (VFR) cross-country
              personal flight to McGrath, Alaska, when the accident occurred.                0
   5092       The airplane was operated by the pilot.                                        0
   5093       The commercial certificated pilot, and the sole passenger, were not injured.   0
   5094       Visual meteorological conditions prevailed.                                    0
   5095       During a telephone conversation with the National Transportation Safety Board
              (NTSB) investigator-in-charge (IIC), on March 8, 2001, the pilot reported he
              landed near Ophir earlier in the day.                                          0
   5096       When he was planning to depart, the surface of the snow had become crusty.     0
   5097       The pilot said he began a takeoff run toward the south, but the airplane 
              did not become airborne until it was within about 50 yards from several trees. 0
   5098       During the initial climb, the left horizontal stabilizer collided with a 
              spruce tree about 25 feet above the ground.                                    0
   5099       The airplane began a descending left turn toward the ground, and collided
              with several trees while the pilot was making an emergency landing.            0
iKnow> show summary 92 6
 sentId       sentenceValue                                                                  sentenceIsTruncated
   5090       On March 7, 2001, about 1500 Alaska standard time, a wheel/ski equipped 
              Cessna 180 airplane, N9383C, sustained substantial damage during takeoff from 
              a snow-covered area at Ophir, Alaska.                                          0
   5091       The airplane was being operated as a visual flight rules (VFR) cross-country
              personal flight to McGrath, Alaska, when the accident occurred.                0
   5095       During a telephone conversation with the National Transportation Safety Board
              (NTSB) investigator-in-charge (IIC), on March 8, 2001, the pilot reported he
              landed near Ophir earlier in the day.
   5097       The pilot said he began a takeoff run toward the south, but the airplane 
              did not become airborne until it was within about 50 yards from several trees. 0
   5099       The airplane began a descending left turn toward the ground, and collided
              with several trees while the pilot was making an emergency landing.            0
   5100       The airplane received damage to the left main landing gear, the wings, 
              and the left stabilizer.                                                       0
iKnow> quit
Bye bye
 
USER>

Filter Sources

The following NLP Shell example makes use of an existing domain named “mydomain” in the current namespace. The domain “mydomain” contains 100 sources. The use domain mydomain command specifies using “mydomain”, making it the current domain for the NLP Shell. The list source command lists the first 10 sources in the current domain. The filter source 92 94 97 as myfilter command defines a filter named “myfilter” which filters out all sources except those specified by source Id. The use filter myfilter command establishes “myfilter” as the current filter. Now when the NLP Shell issues a list source command, it applies “myfilter” and lists only the three sources specified in “myfilter”:

USER>DO $System.iKnow.Shell()
 
Welcome to the iKnow shell
Type '?' for help
Type 'quit' to exit
 
iKnow> use domain mydomain
Current domain: mydomain (1)
iKnow> list source
 srcId          externalId
   100    :SQL:Accident:96
    99    :SQL:Accident:98
    98    :SQL:Accident:94
    97   :SQL:Accident:100
    96    :SQL:Accident:80
    95    :SQL:Accident:99
    94    :SQL:Accident:95
    93    :SQL:Accident:88
    92    :SQL:Accident:97
    91    :SQL:Incident:90
iKnow> filter source 92 94 97 as myfilter
iKnow> use filter myfilter
Current filter: myfilter
 
iKnow> list source
 srcId          externalId
    97   :SQL:Accident:100
    94    :SQL:Accident:95
    92    :SQL:Accident:97
iKnow> quit
Bye bye
 
USER>

After a filter has been applied, a subsequent use filter filtername replaces the current filter with the new filter. To disable a filter, specify use filter 0.

NLP Data Upgrade Utility

Each version of the NLP data structures is assigned a system version number. Each NLP domain is assigned a VersionOpens in a new tab property value. All new domains are created with the same Version property as the current system version. Therefore, these two integer values are usually the same.

  • When you update to a newer version of InterSystems IRIS® data platform, the NLP data structures system version increments if NLP indexing has changed. Therefore, updating to a newer version of InterSystems IRIS does not necessarily increment the NLP data structures system version.

  • When you create a domain, it always takes the current data structures system version as its Version property value. Therefore, existing domains created under an earlier system version have the Version property value of that earlier system version.

The initial InterSystems IRIS version number is 5. Earlier version numbers may be present if you port existing domains to InterSystems IRIS.

You can use the GetCurrentSystemVersion()Opens in a new tab method of the %iKnow.DomainOpens in a new tab class to determine the NLP data structures system version for the current InterSystems IRIS instance. You can use the GetAllDomains query to list all domains with their domain Version numbers, as shown in the Listing All Domains section of the “Setting Up the NLP Environment” chapter.

If the NLP data structures system version does not match the domain Version, these older NLP domains cannot take advantage of the new NLP features and performance improvements introduced with this new system version. Older domains will remain operational, but cannot take advantage of new NLP data structure features until you upgrade the domain. Upgrading a domain increments its Version property. This upgrade operation requires the automatic re-indexing of the domain data. It does not require access to the original source texts. Each domain must be upgraded individually.

To upgrade a domain, use the UpgradeDomain()Opens in a new tab method of the %iKnow.Utils.UpgradeUtilsOpens in a new tab class. Further details are provided in the InterSystems Class Reference documentation.

Note that the re-indexing that occurs when you upgrade a domain changes the domain Id, but does not change the domain name. Thus upgrading a domain may, in some cases, require changes to programs that reference the domain by a specific domain Id integer. For this reason, a domain should always be referenced by its Id property (or domain name). Coding practices that reference a domain by a literal integer Id value should be avoided.

FeedbackOpens in a new tab