Skip to main content

Smart Matching: Using a Dictionary

Important:

InterSystems has deprecatedOpens in a new tab InterSystems IRIS® Natural Language Processing (NLP). It may be removed from future versions of InterSystems products. The following documentation is provided as reference for existing users only. Existing users who would like assistance identifying an alternative solution should contact the WRCOpens in a new tab.

Smart Matching means combining the results of the NLP Indexing process with some external knowledge you have in the form of a dictionary, taxonomy, or ontology. What makes NLP matching “smart” is that those Indexing results help you judge the quality of a match because they identify which words belong together to form concepts and relations. For example, NLP can identify if a match for your dictionary term "flu" is actually referring to the concept "flu" or "bird flu" or “no flu symptoms” in your indexed text source. In the second case, which is called a partial match, it is clear the match should or could be treated differently than a full (exact) match where the dictionary term corresponds exactly to the entity in the indexed text source. In the third case, the match is identified as a partial match with negation.

To perform Smart Matching, you must create or acquire a dictionary. Creating and populating a dictionary is described in the previous chapter. Once you have a populated dictionary, you can perform matching operations using the contents of the dictionary.

How Dictionary Matching Works

NLP matches each term in the dictionary with the same level construct in the source texts (a Concept term is matched against a Concept in source text; a CRC term is matched against a CRC in source text). These matches can be exact, or can be partial matches. If the match between term and source text is exact, NLP tags the source text passage with the dictionary item associated with that term. If the match between term and source text is not exact, NLP scores the degree of match between them. This match scoring involves calculating a match score for each component entity (concept or relation), then, if required, using these entity match scores to calculate the match score for a CRC, path, or sentence. If the scoring of a partial match achieve a configured minimal match score, NLP tags the source text passage with the dictionary item associated with the term (see the MinimalMatchScoreOpens in a new tab property in %iKnow.Matching.MatchingProfileOpens in a new tab).

Match Scoring

NLP generates a match score, a floating point number, which is calculated from all of the entity matches detected between the dictionary term and the unit of source text. For an entity match, this match score can range between 0 (no match) and 1 (an exact match). For CRC or path matches, this match includes this range, but can be greater than 1. The algorithm used to calculate this score is complex, but includes the following considerations:

  • A full match of an entity can be exact (same words in same order) or scattered (same words in different order). The match score for a scattered match is determined by multiplying 1 (exact match) by the value of the ScatteredMatchMultiplierOpens in a new tab property of the matching profile.

  • A partial match of an entity (concept or relation) is assigned a percentage based on the percentage of the source text string that matches the dictionary term.

  • In a partial match of an entity, a matching relation is assigned only half as much value as a matching concept. You can change this ratio (for example, to an equal evaluation of relation and concept matches) by setting the RelationshipScoreMultiplierOpens in a new tab property of the matching profile.

  • When matching a CRC, path, or sentence, the match scores from the entity matches are added, then divided by the length of the dictionary term, multiplied by the number of matching entities, then multiplied by the DisorderMultiplierOpens in a new tab property, a value representing the degree of disorder (difference in the sequence of entities) between the unit of source text and the dictionary term.

  • An entity match score can be modified if the entity is part of a negation. By default, the NegationMultiplierOpens in a new tab property value is 1 which causes the match score calculation to treat positive entities and a negated entities as equivalent. This default is generally recommended. You can set the NegationMultiplier to 0, which causes the match score calculation to skip negated entities. In most cases, a 0 value leads to these matches being skipped altogether, unless it's a composite match with enough non-negated matched entities to get the score above the MinimalMatchScore threshold. You can also set this property to a value between 0 and 1, which modifies the entity-level match scores for negated entities, causing them to be considered partial matches. For example, a value of 0.5 will halve the entity-level score for a negated entity.

The above is not an exact formula for obtaining a match score. It is provided to show the principal considerations used when NLP calculates a match score.

NLP provides a matching profile (%iKnow.Matching.MatchingProfileOpens in a new tab), which consists of the numeric properties mentioned above, and others. NLP uses this matching profile when calculating a match score. NLP provides default property values for the matching profile. Unless otherwise specified, the default matching profile is assigned to each dictionary. This default matching profile provides accurate matching for most applications. Creating and assigning a custom matching profile is described later in this chapter.

Matching A String

You can use the %iKnow.Matching.MatchingAPIOpens in a new tab class to perform matches between a text string and a populated dictionary (or multiple dictionaries). There are two types of string matches:

  • Matching an entity-length string

  • Matching a string containing multiple entities. This string may contain one or more sentences.

Matching an Entity String

The GetDictionaryMatches()Opens in a new tab method matches a single-entity string against a dictionary and returns the match items. Because this method treats the string as a single entity, it can provide very specific match information. Because GetDictionaryMatches() takes a string variable, you do not have to index a single-entity string to match it against a dictionary.

  SET domn="entitytestdomain"
  IF (##class(%iKnow.Domain).NameIndexExists(domn))
     { SET domo=##class(%iKnow.Domain).NameIndexOpen(domn)
       SET domId=domo.Id
     }
  ELSE {
     SET domo=##class(%iKnow.Domain).%New(domn)
     DO domo.%Save()
     SET domId=domo.Id }
  /* ... */
CreateDictionary
  SET dictname="AviationTerms"
  SET dictdesc="A dictionary of aviation terms"
  SET dictId=##class(%iKnow.Matching.DictionaryAPI).CreateDictionary(domId,dictname,dictdesc)
  IF dictId=-1 {WRITE "Dictionary ",dictname," already exists",!
                GOTO ResetForNextTime }
  ELSE {WRITE "created a dictionary ",dictId,!}
PopulateDictionary
  SET itemId=##class(%iKnow.Matching.DictionaryAPI).CreateDictionaryItem(domId,dictId,
       "aircraft",domId_dictId_"aircraft")
    SET term1Id=##class(%iKnow.Matching.DictionaryAPI).CreateDictionaryTerm(domId,itemId,
       "airplane")
    SET term2Id=##class(%iKnow.Matching.DictionaryAPI).CreateDictionaryTerm(domId,itemId,
       "single-engine airplane")
    SET term3Id=##class(%iKnow.Matching.DictionaryAPI).CreateDictionaryTerm(domId,itemId,
       "helicopter")
DisplayDictionary
  SET stat=##class(%iKnow.Matching.DictionaryAPI).GetDictionaryItemsAndTerms(.result,domId,dictId)
  SET i=1
  WHILE $DATA(result(i)) {
      WRITE $LISTTOSTRING(result(i),",",1),!
      SET i=i+1 }
  WRITE "End of items in dictionary ",dictId,!!
DoMatching
  SET mystring="A small single-engine two-person airplane cabin"
  SET stat=##class(%iKnow.Matching.MatchingAPI).GetDictionaryMatches(.num,domId,mystring,$LB(dictId))
  IF stat'=1 {WRITE "get matches status is:",$System.Status.DisplayError(stat),!
             QUIT }
  WRITE "The string is: ",mystring,!
  WRITE "The matches are:",!
  SET j=1  
  WHILE $DATA(num(j)) {
      WRITE "match number ",j," is ",$LISTTOSTRING(num(j)),!
      SET j=j+1 }
  WRITE "End of match items for dictionary ",dictId,!!
ResetForNextTime
  IF dictId = -1 {
     SET dictId=##class(%iKnow.Matching.DictionaryAPI).GetDictionaryId(domId,dictname)}
  SET stat=##class(%iKnow.Matching.DictionaryAPI).DropDictionary(domId,dictId)
  IF stat {WRITE "deleted dictionary ",dictId,! }

GetDictionaryMatches() provides a matched word bit map that shows the match of each word in the string with the dictionary term. For example, 00101 shows the match of the dictionary term “single-engine airplane” to the string “A small single-engine two-person airplane / cabin”. The bit map stops when the match completes, so there is no bit for the word “cabin”. In this example, the isScattered boolean is 0 because the words “single-engine” and “airplane” are in the same order in the dictionary term and in the string.

Each dictionary match returns a list of match elements. The match in the previous example returns a list of match elements as follows (your id numbers may differ):

Position Meaning Value in Example
1 dictionary Id 6
2 item Id 5
3 dictionary item URI (domain Id + dict Id + item value) 26aircraft
4 term Id 13
5 term value single-engine airplane
6 element Id (same as term Id) 13
7 type of match (term, format, or unknown) term
8 match score .333333
9 matched word bits 00101
10 is scattered boolean 0
11 format output null

Matching a Sentence String

The GetMatchesBySource()Opens in a new tab method matches a multiple-entity string against a dictionary and returns the match items. (Commonly such a string is sentence-length (or longer)). You must first index the string, then match it against the dictionary. The following example matches a string against the AviationTerms dictionary:

DomainCreateOrOpen
  SET dname="onestringdomain"
  IF (##class(%iKnow.Domain).NameIndexExists(dname))
      { SET domoref=##class(%iKnow.Domain).NameIndexOpen(dname)
        GOTO DeleteOldData }
  ELSE 
     { SET domoref=##class(%iKnow.Domain).%New(dname)
       DO domoref.%Save()
       GOTO LoadString }
DeleteOldData
  SET stat=domoref.DropData()
  IF stat { GOTO LoadString }
  ELSE    { WRITE "DropData error ",$System.Status.DisplayError(stat)
            QUIT}
LoadString
  SET domId=domoref.Id
  SET myloader=##class(%iKnow.Source.Loader).%New(domId)
  SET ^mystring="A single-engine airplane reported poor visibility and strong gusty winds. "_
               "No winds were predicted by local airport ground personnel."
  DO myloader.BufferSource("ref",^mystring)
  DO myloader.ProcessBuffer()
GetExtId
  DO ##class(%iKnow.Queries.SourceAPI).GetByDomain(.result,domId,1,20)
  SET i=1
  WHILE $DATA(result(i)) {
     SET extId = $LISTGET(result(i),2)
     SET i=i+1 }
CreateDictionary
  SET dictname="AviationTerms"
  SET dictdesc="A dictionary of aviation terms"
  SET dictId=##class(%iKnow.Matching.DictionaryAPI).CreateDictionary(domId,dictname,dictdesc)
  IF dictId=-1 {WRITE "Dictionary ",dictname," already exists",!
                GOTO ResetForNextTime }
  ELSE {WRITE "created a dictionary ",dictId,!}
PopulateDictionaryItem1
  SET itemId=##class(%iKnow.Matching.DictionaryAPI).CreateDictionaryItem(domId,dictId,
       "aircraft",domId_dictId_"aircraft")
    SET term1Id=##class(%iKnow.Matching.DictionaryAPI).CreateDictionaryTerm(domId,itemId,
       "single-engine airplane")
    SET term2Id=##class(%iKnow.Matching.DictionaryAPI).CreateDictionaryTerm(domId,itemId,
       "helicopter")
PopulateDictionaryItem2
 SET itemId2=##class(%iKnow.Matching.DictionaryAPI).CreateDictionaryItemAndTerm(domId,dictId,
        "weather",domId_dictId_"weather")
    SET i2term1Id=##class(%iKnow.Matching.DictionaryAPI).CreateDictionaryTerm(domId,itemId2,
        "strong winds")
    SET i2term2Id=##class(%iKnow.Matching.DictionaryAPI).CreateDictionaryTerm(domId,itemId2,
        "visibility")
    SET i2term3Id=##class(%iKnow.Matching.DictionaryAPI).CreateDictionaryTerm(domId,itemId2,
        "winds")
DisplayDictionary
  SET stat=##class(%iKnow.Matching.DictionaryAPI).GetDictionaryItemsAndTerms(.result,domId,dictId)
  SET i=1
  WHILE $DATA(result(i)) {
      WRITE $LISTTOSTRING(result(i),",",1),!
      SET i=i+1 }
  WRITE "End of items in dictionary ",dictId,!!
DoMatching
  SET stat=##class(%iKnow.Matching.MatchingAPI).GetMatchesBySource(.num,domId,extId,$LB(dictId))
  IF stat'=1 {WRITE "get matches status is:",$System.Status.DisplayError(stat),!
             QUIT }
  WRITE "The string is: ",^mystring,!
  WRITE "The matches are:",!
  SET j=1  
  WHILE $DATA(num(j)) {
      WRITE "match ",j,": ditem ",$LISTGET(num(j),4),
            " dterm ",$LISTGET(num(j),5),
            " matchscore ",$LISTGET(num(j),8),
            " negated? ",$LISTGET(num(j),15),!
              SET j=j+1 }
  WRITE "End of match items for dictionary ",dictId,!!
ResetForNextTime
  IF dictId = -1 {
     SET dictId=##class(%iKnow.Matching.DictionaryAPI).GetDictionaryId(domId,dictname)}
  SET stat=##class(%iKnow.Matching.DictionaryAPI).DropDictionary(domId,dictId)
  IF stat {WRITE "deleted dictionary ",dictId,! }
  ELSE    { WRITE "DropDictionary error ",$System.Status.DisplayError(stat) } 

The following is a match between “No winds” in the string and dictionary term “winds” in item “weather”. (This match string is returned in %List format, here shown as a comma-separated string): 5,6,12,26weather,33,0,6,.5,1,0,1,1,1,1,1. These element values are explained below:

Position Meaning Value in Example
1 match number 5
2 dictionary Id 6
3 item Id 12
4 dictionary item URI (domain Id + dict Id + item value) 26weather
5 term Id 33
6 target type 0
7 target Id 6
8 match score .5
9 matching concept count 1
10 matching relation count 0
11 partial match count 1
12 first matched position in path 1
13 last matched position in path 1
14 is ordered 1
15 negated entity count 1

Matching Sources

Refer to A Note on Program Examples for details on the coding and data used in the examples in this book.

The GetTotalItemScoresBySource()Opens in a new tab method matches a source against a dictionary and returns the match scores for each dictionary item. The following two examples use a domain-independent dictionary. In the dictionary definition, and in any %iKnow.Matching.DictionaryAPIOpens in a new tab method referencing the dictionary, specify the domain Id as 0. In any %iKnow.Matching.MatchingAPIOpens in a new tab method using the dictionary within a domain, specify the dictionary Id as a negative number. Hence, SET dictId=-^mydictId.

The following example matches all of the sources in the domain against the AviationTermsND dictionary and returns each source’s match scores for each dictionary item. You must create the dictionary before running this program.

CreateDictionary
  SET ^mydictname="AviationTermsND"_$HOROLOG
  SET dictdesc="A dictionary of aviation terms"
  SET ^mydictId=##class(%iKnow.Matching.DictionaryAPI).CreateDictionary(0,^mydictname,dictdesc)
  IF ^mydictId=-1 {WRITE "Dictionary ",^mydictname," already exists",!
                QUIT }
  ELSE {WRITE "created a dictionary ",^mydictId,!}
PopulateDictionaryItem1
  SET itemId=##class(%iKnow.Matching.DictionaryAPI).CreateDictionaryItem(0,^mydictId,
       "aircraft",0_^mydictId_"aircraft")
    SET term1Id=##class(%iKnow.Matching.DictionaryAPI).CreateDictionaryTerm(0,itemId,
       "single-engine airplane")
    SET term2Id=##class(%iKnow.Matching.DictionaryAPI).CreateDictionaryTerm(0,itemId,
       "helicopter")
PopulateDictionaryItem2
 SET itemId2=##class(%iKnow.Matching.DictionaryAPI).CreateDictionaryItemAndTerm(0,^mydictId,
        "weather",0_^mydictId_"weather")
    SET i2term1Id=##class(%iKnow.Matching.DictionaryAPI).CreateDictionaryTerm(0,itemId2,
        "strong winds")
    SET i2term2Id=##class(%iKnow.Matching.DictionaryAPI).CreateDictionaryTerm(0,itemId2,
        "visibility")
    SET i2term3Id=##class(%iKnow.Matching.DictionaryAPI).CreateDictionaryTerm(0,itemId2,
        "winds")
PopulateDictionaryItem3
  SET itemId=##class(%iKnow.Matching.DictionaryAPI).CreateDictionaryItem(0,^mydictId,
       "flight plan",0_^mydictId_"flight plan")
    SET term3Id=##class(%iKnow.Matching.DictionaryAPI).CreateDictionaryTerm(0,itemId,
       "flight plan")
DisplayDictionary1
  SET stat=##class(%iKnow.Matching.DictionaryAPI).GetDictionaryItemsAndTerms(.result,0,^mydictId)
  WRITE "Status is: ",stat,!
  SET i=1
  WHILE $DATA(result(i)) {
      WRITE $LISTTOSTRING(result(i),",",1),!
      SET i=i+1 }
  WRITE "End of items in dictionary ",^mydictId,!!
DomainCreateOrOpen
  SET dname="mydomain"
  IF (##class(%iKnow.Domain).NameIndexExists(dname))
     { WRITE "The ",dname," domain already exists",!
       SET domoref=##class(%iKnow.Domain).NameIndexOpen(dname)
       GOTO DeleteOldData }
  ELSE 
     { WRITE "The ",dname," domain does not exist",!
       SET domoref=##class(%iKnow.Domain).%New(dname)
       DO domoref.%Save()
       WRITE "Created the ",dname," domain with domain ID ",domoref.Id,!
       GOTO ListerAndLoader }
DeleteOldData
  SET stat=domoref.DropData()
  IF stat { WRITE "Deleted the data from the ",dname," domain",!!
            GOTO ListerAndLoader }
  ELSE    { WRITE "DropData error ",$System.Status.DisplayError(stat)
            QUIT}
ListerAndLoader
  SET domId=domoref.Id
  SET flister=##class(%iKnow.Source.SQL.Lister).%New(domId)
  SET myloader=##class(%iKnow.Source.Loader).%New(domId)
QueryBuild
   SET myquery="SELECT Top 10 ID AS UniqueVal,Type,NarrativeFull FROM Aviation.Event"
   SET idfld="UniqueVal"
   SET grpfld="Type"
   SET dataflds=$LB("NarrativeFull")
UseLister
  SET stat=flister.AddListToBatch(myquery,idfld,grpfld,dataflds)
      IF stat '= 1 {WRITE "The lister failed: ",$System.Status.DisplayError(stat) QUIT }
UseLoader
  SET stat=myloader.ProcessBatch()
      IF stat '= 1 {WRITE "The loader failed: ",$System.Status.DisplayError(stat) QUIT }
DoMatching
  SET dictId=-^mydictId
  SET num=0
  FOR j=1:1:10 {SET extId=##class(%iKnow.Queries.SourceAPI).GetExternalId(domId,j)
   SET mstat=##class(%iKnow.Matching.MatchingAPI).GetTotalItemScoresBySource(.mresult,domId,
                     extId,$LB(dictId))
    IF mstat '=1 {WRITE "End of sources",!  QUIT }
    SET k=1
    IF $DATA(mresult(k))=0 {WRITE "no dictionary matches for this source",!}
    ELSE {
    WHILE $DATA(mresult(k)) {
       WRITE $PIECE($LISTTOSTRING(mresult(k)),",",2)," "
       WRITE $PIECE($LISTTOSTRING(mresult(k)),",",4)
       WRITE " matches: ",$PIECE($LISTTOSTRING(mresult(k)),",",6)
       WRITE " score: ",$PIECE($LISTTOSTRING(mresult(k)),",",7),!
       SET k=k+1 }
    }
    SET srcname=$PIECE($PIECE(extId,":",3,4),"\",$l(extId,"\"))
    WRITE "End of ",srcname," match items for dictionary ",dictId,!!
  }

The GetMatchesBySource()Opens in a new tab method matches each source against a dictionary and returns the match items.

The following example matches all of the sources in the domain against the domain-independent AviationTermsND dictionary (defined above) and returns each source’s match items, with match score and negation (if present):

DomainCreateOrOpen
  SET dname="mydomain"
  IF (##class(%iKnow.Domain).NameIndexExists(dname))
      { SET domoref=##class(%iKnow.Domain).NameIndexOpen(dname)
        GOTO DeleteOldData }
  ELSE 
     { SET domoref=##class(%iKnow.Domain).%New(dname)
       DO domoref.%Save()
       GOTO ListerAndLoader }
DeleteOldData
  SET stat=domoref.DropData()
  IF stat { GOTO ListerAndLoader }
  ELSE    { WRITE "DropData error ",$System.Status.DisplayError(stat)
            QUIT}
ListerAndLoader
  SET domId=domoref.Id
  SET flister=##class(%iKnow.Source.SQL.Lister).%New(domId)
  SET myloader=##class(%iKnow.Source.Loader).%New(domId)
QueryBuild
   SET myquery="SELECT Top 10 ID AS UniqueVal,Type,NarrativeFull FROM Aviation.Event"
   SET idfld="UniqueVal"
   SET grpfld="Type"
   SET dataflds=$LB("NarrativeFull")
UseLister
  SET stat=flister.AddListToBatch(myquery,idfld,grpfld,dataflds)
      IF stat '= 1 {WRITE "The lister failed: ",$System.Status.DisplayError(stat) QUIT }
UseLoader
  SET stat=myloader.ProcessBatch()
      IF stat '= 1 {WRITE "The loader failed: ",$System.Status.DisplayError(stat) QUIT }
DoMatching
  SET dictId=-^mydictId
  FOR j=1:1:10 {SET extId=##class(%iKnow.Queries.SourceAPI).GetExternalId(domId,j)
       SET stat=##class(%iKnow.Matching.MatchingAPI).GetMatchesBySource(.num,domId,extId,$LB(dictId))
       IF stat'=1 {WRITE "get matches status is:",$System.Status.DisplayError(stat),!
             QUIT }
       WRITE "The matches are:",!
       SET k=1  
       WHILE $DATA(num(k)) {
         IF $LISTGET(num(k),15)>0 {WRITE "neg. "}
         WRITE "match ",k,": ditem ",$LISTGET(num(k),4),
            " dterm ",$LISTGET(num(k),5),
            " matchscore ",$LISTGET(num(k),8),!
            SET k=k+1 }
    WRITE !,"Next Source",!
    }
  WRITE "End of match items for dictionary ",dictId,!

Defining a Matching Profile

Whether a dictionary term and a unit of source text constitute a match is determined by the matching profile properties found in %iKnow.Matching.MatchingProfileOpens in a new tab. There are eight properties that together determine whether something is a match or not. All of these properties take an appropriate default value. By default, every dictionary is assigned the default matching profile.

You can create a custom matching profile with one or more properties whose value differs from the default. You can then assign this custom matching profile to a dictionary. Any properties not specified in the custom matching profile take default values. You can create any number of custom matching profiles. The same matching profile can be applied to multiple dictionaries. You can define the matching profile to be specific to a domain, or to be available to all domains in the namespace.

The following example creates three custom matching profiles. The first is specific to a domain and specifies the optional matching profile name; NLP assigns it a positive integer Id. The second and third are available to all domains in the namespace; NLP assigns each of them a negative integer Id. Because the third specifies the optional matching profile name, it must specify 0 as a placeholder for the domain Id parameter:

    SET domn="mydomain"
  IF (##class(%iKnow.Domain).NameIndexExists(domn))
     { SET domo=##class(%iKnow.Domain).NameIndexOpen(domn)
       SET domId=domo.Id
     }
  ELSE {
     SET domo=##class(%iKnow.Domain).%New(domn)
     DO domo.%Save()
     SET domId=domo.Id }
MatchingProfile
  SET domprof=##class(%iKnow.Matching.MatchingProfile).%New(domId,"mydomainCMP")
     WRITE "profile Id=",domprof.ProfileId,!
     WRITE "profile domain=",domprof.DomainId,!
     WRITE "profile name=",domprof.Name,!!
  SET allprof1=##class(%iKnow.Matching.MatchingProfile).%New()
     WRITE "profile Id=",allprof1.ProfileId,!
     WRITE "profile domain=",allprof1.DomainId,!
     WRITE "profile name=",allprof1.Name,!!
 SET allprof2=##class(%iKnow.Matching.MatchingProfile).%New(0,"namespaceCMP")
     WRITE "profile Id=",allprof2.ProfileId,!
     WRITE "profile domain=",allprof2.DomainId,!
     WRITE "profile name=",allprof2.Name

The following example shows how to define a custom matching profile and assign it to a dictionary. It specifies the custom matching profile instance oref (in this case, customprofile) to the CreateDictionary() method to override the default.

#include %IKPublic
  SET domn="mydomain"
  IF (##class(%iKnow.Domain).NameIndexExists(domn))
     { SET domo=##class(%iKnow.Domain).NameIndexOpen(domn)
       SET domId=domo.Id
     }
  ELSE {
     SET domo=##class(%iKnow.Domain).%New(domn)
     DO domo.%Save()
     SET domId=domo.Id }
CustomizeMatchingProfile
  SET customprofile=##class(%iKnow.Matching.MatchingProfile).%New()
    WRITE "MinimalMatchScore initial value=",customprofile.MinimalMatchScore,!
    SET customprofile.MinimalMatchScore=".4"
    WRITE "MinimalMatchScore custom value=",customprofile.MinimalMatchScore,!
CreateDictionary
  SET dictId=##class(%iKnow.Matching.DictionaryAPI).CreateDictionary(domId,"mydict","","en",customprofile)
  WRITE "created a dictionary with ID=",dictId,!!
  /* . . . */
CleanUpForNextTime
  SET stat=##class(%iKnow.Matching.DictionaryAPI).DropDictionary(domId,dictId)
  IF stat {WRITE "Dropped the dictionary"}
  ELSE {WRITE "DropDictionary error",$System.Status.DisplayError(stat)}

Matching Profile Properties

Whether the terms in a dictionary and a unit of source text constitute a match is determined by the matching profile property values you define in your custom matching profile. These properties determine the match score for each unit of text and the threshold which specifies the minimum match score to report as a match. Because the contents of a dictionary tends to vary depending on the use case, you may wish to customize one or more matching profile properties for your dictionary. Changing any of these properties may dramatically change the number of matches reported. Therefore, you might want to experiment with changes to property values while testing on a small subset of your data before registering the matching profile to match the dictionary to the whole dataset.

The meaning of the properties in %iKnow.Matching.MatchingProfileOpens in a new tab are explained in the class documentation.

MinimalMatchScore

The most common matching profile property to tune is the MinimalMatchScoreOpens in a new tab. This property value is the lower threshold for matches to be saved. You can set it to a fractional value between 1 (only perfect matches saved) and 0 (all potential matches saved); the default is 0.33.

  • By increasing this property value you can filter out low-quality matches if you have an excessive number of match candidates. This would be appropriate if your dictionary is fairly generic and contains many common terms. You may only wish to report CRCs and paths that are a very close match to several of these common terms.

  • By decreasing this property value you can increase the number of match candidates. This would be appropriate if your dictionary is highly specific, consisting only of critical terms that should be flagged at all times. You may wish to report CRCs and paths that loosely match with a dictionary of technical terms, so as to avoid missing a loose, but significant, match.

Setting MinimalMatchScore to 0 returns all possible match results. If you're starting to work on matching a new dictionary to a small subset of your data, you can start by setting MinimalMatchScore to 0, then gradually increase it, filtering out low-quality matches, until you get a reasonable number of results.

Domain Default Matching Profile

You can specify a different domain-wide default matching profile by setting the MAT:DefaultProfile ($$$IKPMATDEFAULTPROFILE) domain parameter. (The “MAT:” prefix indicates a domain parameter specific to matching operations.) The following example defines a domain-specific custom matching profile, then assigns it as the default matching profile for the mydomain domain:

#include %IKPublic
DomainCreateOrOpen
  SET dname="mydomain"
  IF (##class(%iKnow.Domain).NameIndexExists(dname))
      { SET domo=##class(%iKnow.Domain).NameIndexOpen(dname) }
  ELSE 
     { SET domo=##class(%iKnow.Domain).%New(dname) DO domo.%Save()}
   SET domId=domo.Id
   WRITE "domain ",dname," has Id ",domId,!!
CreateProfile
  SET domprof=##class(%iKnow.Matching.MatchingProfile).%New(domId,"mydomainCMP")
     WRITE "profile Id=",domprof.ProfileId,!
     WRITE "profile domain assignment=",domprof.DomainId,!
     WRITE "profile name=",domprof.Name,!
     WRITE "profile properties:",!
     WRITE "default MinimalMatchScore=",domprof.MinimalMatchScore,!
     SET domprof.MinimalMatchScore=.27
     WRITE "changed to MinimalMatchScore=",domprof.MinimalMatchScore,!!
     DO domprof.%Save()
MakeProfileDomainDefault
     SET str=domo.GetParameter($$$IKPMATDEFAULTPROFILE,.dpin)
     WRITE "domain ",domId," DefaultProfile before SET=",dpin,!
     SET sc=domo.SetParameter($$$IKPMATDEFAULTPROFILE,"mydomainCMP")
     IF sc=1 {
     DO domo.GetParameter($$$IKPMATDEFAULTPROFILE,.dpout)
     WRITE "domain ",domId," DefaultProfile after SET=",dpout,! }
     ELSE {WRITE "SetParameter error",! }
CleanUp
   DO ##class(%iKnow.Domain).%DeleteId(domo.Id)
   WRITE "All done"

The following example defines a namespace custom matching profile (not specific to a domain), then assigns it as the default matching profile for the mydomain domain. Note the 0: preface to the profile name in SetParameter($$$IKPMATDEFAULTPROFILE,"0:nodomainCMP"):

#include %IKPublic
DomainCreateOrOpen
  SET dname="mydomain"
  IF (##class(%iKnow.Domain).NameIndexExists(dname))
      { SET domo=##class(%iKnow.Domain).NameIndexOpen(dname) }
  ELSE 
     { SET domo=##class(%iKnow.Domain).%New(dname) DO domo.%Save()}
   SET domId=domo.Id
   WRITE "domain ",dname," has Id ",domId,!!
CreateProfile
  SET domprof=##class(%iKnow.Matching.MatchingProfile).%New(0,"nodomainCMP")
     WRITE "profile Id=",domprof.ProfileId,!
     WRITE "profile domain assignment=",domprof.DomainId,!
     WRITE "profile name=",domprof.Name,!
     WRITE "profile properties:",!
     WRITE "default MinimalMatchScore=",domprof.MinimalMatchScore,!
     SET domprof.MinimalMatchScore=.27
     WRITE "changed to MinimalMatchScore=",domprof.MinimalMatchScore,!!
     DO domprof.%Save()
MakeProfileDomainDefault
     SET str=domo.GetParameter($$$IKPMATDEFAULTPROFILE,.dpin)
     WRITE "domain ",domId," DefaultProfile before SET=",dpin,!
     SET sc=domo.SetParameter($$$IKPMATDEFAULTPROFILE,"0:nodomainCMP")
     IF sc=1 {
     DO domo.GetParameter($$$IKPMATDEFAULTPROFILE,.dpout)
     WRITE "domain ",domId," DefaultProfile after SET=",dpout,! }
     ELSE {WRITE "SetParameter error",! }
CleanUp
   DO ##class(%iKnow.Domain).%DeleteId(domo.Id)
   WRITE "All done"

You can also set individual domain parameters that influence matching operations.

Matching Single Relations

By default, the NLP matching algorithm skips dictionary terms that match a single relation entity. For example, if you have a dictionary term "to", NLP does not attempt to match the entity "goes to" in the sentence "Pete goes to work". This default optimizes matching performance when your dictionary primarily contains and targets concepts.

However, if your dictionary deliberately targets single relation elements, you can change this domain parameter default by setting the MAT:SkipRelations ($$$IKPMATSKIPRELS) domain parameter to 0. This causes all entities to be matched against all dictionary terms, regardless of the type of entity.

This option is set as a domain parameter — not a matching profile property — because this skipping step occurs at a time when it is not yet known what the matched terms might be, and hence could not apply any dictionary-specific profiles. Therefore, it is implemented as a domain parameter, ensuring it applies to all dictionaries within the domain.

Other Matching Operation Domain Parameters

Several of the settable domain parameters can influence matching operations: MAT:SkipRelations, MatchScoreMargin, FullMatchOnly, and EntityLevelMatchOnly. If you change these domain parameters, any sources matched using a previous domain parameter setting will have to be explicitly re-matched to reflect the new domain parameter value.

For further information on these domain parameters, refer to the Domain Parameters appendix to this manual.

FeedbackOpens in a new tab