Using iKnow
Smart Matching: Creating a Dictionary
[Home] [Back] [Next]
InterSystems: The power behind what matters   
Class Reference   
Search:    

Smart Matching means combining the results of the iKnow Indexing process with some external knowledge you have in the form of a dictionary, taxonomy, or ontology. What makes iKnow matching “smart” is that those Indexing results help you judge the quality of a match because they identify which words belong together to form concepts and relations. For example, iKnow can identify if a match for your dictionary term "flu" is actually referring to the concept "flu" or the concept "bird flu" in your indexed text source. In the latter case, which is called a partial match, it is clear the match should or could be treated differently than the full match where the dictionary term corresponds exactly to the entity in the indexed text source.

To perform Smart Matching, you must create or acquire a dictionary. If you are creating a dictionary, you must then populate it with the items and terms that you wish to use for matching. Once you have a populated dictionary, you can perform matching operations using the contents of the dictionary.
Note:
Dictionary definition is not supported for Japanese at this time.
This chapter describes:
Introducing Dictionary Structure and Matching
To populate an iKnow dictionary you first create an item, then associate one or more terms with that item. Commonly a dictionary consists of multiple items, with each item associated with multiple terms. An item is a word or phrase that is a relevant tag for many entities in the source texts. When an entity in the source texts is determined to be a match, it is tagged with the item. For example, the item “ship” is a relevant tag for “ship”, “boat”, “sail”, “oars”, and so forth.
To perform this matching, you populate each item in the dictionary with match terms. A term can be single entity (like “motor boat”) or a phrase or sentence (like “boats are rowed with oars or paddles”). iKnow indexes each term in the dictionary using the same language model used for the source texts. iKnow then matches each term with the same content unit in the source texts (a Concept term is matched against a Concept in source text; a CRC term is matched against a CRC in source text). If iKnow identifies a match between a term and a unit of source text, iKnow tags the source text passage with the associated dictionary item. This matching frequently is not identical, but requires iKnow to use a scoring algorithm to determine if the term and source text warrant being tagged as a match.
The iKnow dictionary facility supports stemming if stemming has been activated for the current domain. This means that a single dictionary term can match any other form of the same word in a source text.
Terminology
A Dictionary is a way to group different terms that have something to do with one another in a logical way. A dictionary could for example be Cities, ICD10 codes, or French wines. As a dictionary is the level of aggregation used within the matching APIs, it is specific to the use case to decide what level of real-world grouping should correspond to a dictionary. Taking a higher level (such as "all ICD10 codes") will yield better performance and use lower disk space, but a lower level (such as "a separate one for all ICD10 categories") might offer grouped results with greater granularity. Each dictionary has a name and a description.
A Dictionary Item is a uniquely identifiable item in your dictionary. Examples of a dictionary item could be cities, the individual codes in ICD10 or individual chateaux. Each dictionary typically has many dictionary items (lots of small dictionaries with few items can decrease performance). A dictionary item has a URI, which should be unique within the domain and can be used as an external identifier, and an optional description. This URI can be used when building rules to interpret matching results later on.
A Dictionary Term is a string that could appear somewhere in a text and represent the Dictionary Item it belongs to. For example, "Antwerp", "Anvers" and "Antwerpen" could be different terms associated with the same dictionary item representing the city of Antwerp. Dictionary terms are the free text strings on which the actual matching is based when doing string-based matching and could be different spellings, translations or synonyms of what your Dictionary Item stands for. These strings are passed through the engine and, when containing more than just a single entity, will automatically be transformed into a more complex structure to be able to match across the boundaries of a single concept (CRC or Path). A dictionary term should also have a language associated with it, if it needs to be processed by the engine.
When processing a new dictionary term by passing it through the iKnow engine, one or more Dictionary Elements are generated to represent the different entities identified within the term. For example, a dictionary term "failure of the liver" would be translated into the three elements "failure", "of" and "liver", with "the" being discarded as non-relevant. These elements are generated and managed automatically and only figure in some types of output, so you shouldn't worry too much about them.
If you want to identify dates, numbers or other formatted pieces of string, you can use Dictionary Formats to specify them, and these can then be included in a Dictionary Term, either representing the complete term, or just a single element within a more complex one. A format is a meaningful pattern of characters, such as a date format. You could associate the formats “nn/nn/nnnn” and “nnnn-nn-nn” with the item named Date. iKnow tags any occurrence of these formats in the source texts with the Date item.
Creating a Dictionary
To define a dictionary use the %iKnow.Matching.DictionaryAPI class methods to define and populate a dictionary, as described in this section. You can define a dictionary specifically for a domain, or define a dictionary that is domain-independent and can be used by any domain in the current namespace.
%iKnow.Matching.DictionaryAPI has a number of methods to create a new dictionary and to assign it items, terms, and formats:
Dictionaries and Domains
Each dictionary you create can either be specific to a domain, or can be domain-independent and usable by any domain in the current namespace:
Just as several domains can all have a domain-specific dictionary with the same dictId value, both a domain-specific dictionary and a domain-independent dictionary can have the same integer dictId value. Dictionary match operations can use any combination of domain-specific dictionaries (specified as positive integer IDs) and domain-independent dictionaries (specified as negative integer IDs).
Queries in the Matching API returning matching results will return negative identifiers (for the dictId, itemId, and termId) when the match corresponds to an entry in a domain-independent dictionary. All queries will return the combined results for domain-specific and domain-independent dictionary matches, with the exception of GetDictionaryMatches() and GetDictionaryMatchesById(), which only return results for either domain-specific or domain-independent dictionaries, depending on the values specified in the dictIds parameter. The default is domain-specific dictionary matches.
Dictionary Creation Examples
The following example creates a dictionary named "AviationTerms" and populates it with two items and their associated terms. This dictionary is assigned to a specific domain.
  SET domId=##class(%iKnow.Domain).GetOrCreateId("mydomain")
  /* ... */
CreateDictionary
  SET dictname="AviationTerms"
  SET dictdesc="A dictionary of aviation terms"
  SET dictId=##class(%iKnow.Matching.DictionaryAPI).CreateDictionary(domId,dictname,dictdesc)
  IF dictId=-1 {WRITE "Dictionary ",dictname," already exists",!
                GOTO ResetForNextTime }
  ELSE {WRITE "created a dictionary ",dictId,!}
PopulateDictionaryItem1
  SET itemId=##class(%iKnow.Matching.DictionaryAPI).CreateDictionaryItem(domId,dictId,
       "aircraft",domId_dictId_"aircraft")
    SET term1Id=##class(%iKnow.Matching.DictionaryAPI).CreateDictionaryTerm(domId,itemId,
       "airplane")
    SET term2Id=##class(%iKnow.Matching.DictionaryAPI).CreateDictionaryTerm(domId,itemId,
       "helicopter")
PopulateDictionaryItem2
 SET itemId2=##class(%iKnow.Matching.DictionaryAPI).CreateDictionaryItemAndTerm(domId,dictId,
        "weather",domId_dictId_"weather")
    SET i2term1Id=##class(%iKnow.Matching.DictionaryAPI).CreateDictionaryTerm(domId,itemId2,
        "meteorological information")
    SET i2term2Id=##class(%iKnow.Matching.DictionaryAPI).CreateDictionaryTerm(domId,itemId2,
        "visibility")
    SET i2term3Id=##class(%iKnow.Matching.DictionaryAPI).CreateDictionaryTerm(domId,itemId2,
        "winds")
DisplayDictionary
  SET stat=##class(%iKnow.Matching.DictionaryAPI).GetDictionaryItemsAndTerms(.result,domId,dictId)
  SET i=1
  WHILE $DATA(result(i)) {
      WRITE $LISTTOSTRING(result(i),",",1),!
      SET i=i+1 }
  WRITE "End of items in dictionary ",dictId,!!
   /* ... */
ResetForNextTime
  IF dictId = -1 {
     SET dictId=##class(%iKnow.Matching.DictionaryAPI).GetDictionaryId(domId,dictname)}
  SET stat=##class(%iKnow.Matching.DictionaryAPI).DropDictionary(domId,dictId)
  IF stat {WRITE "deleted dictionary ",dictId,! }
  ELSE    { WRITE "DropDictionary error ",$System.Status.DisplayError(stat) } 
 
The following example creates a the same dictionary as the previous example, except that this dictionary can be used by any domain within the current namespace:
CreateDictionary
  SET dictname="AviationTerms"
  SET dictdesc="A dictionary of aviation terms"
  SET dictId=##class(%iKnow.Matching.DictionaryAPI).CreateDictionary(0,dictname,dictdesc)
  IF dictId=-1 {WRITE "Dictionary ",dictname," already exists",!
                GOTO ResetForNextTime }
  ELSE {WRITE "created a dictionary ",dictId,!}
PopulateDictionaryItem1
  SET itemId=##class(%iKnow.Matching.DictionaryAPI).CreateDictionaryItem(0,dictId,
       "aircraft",0_dictId_"aircraft")
    SET term1Id=##class(%iKnow.Matching.DictionaryAPI).CreateDictionaryTerm(0,itemId,
       "airplane")
    SET term2Id=##class(%iKnow.Matching.DictionaryAPI).CreateDictionaryTerm(0,itemId,
       "helicopter")
PopulateDictionaryItem2
 SET itemId2=##class(%iKnow.Matching.DictionaryAPI).CreateDictionaryItemAndTerm(0,dictId,
        "weather",0_dictId_"weather")
    SET i2term1Id=##class(%iKnow.Matching.DictionaryAPI).CreateDictionaryTerm(0,itemId2,
        "meteorological information")
    SET i2term2Id=##class(%iKnow.Matching.DictionaryAPI).CreateDictionaryTerm(0,itemId2,
        "visibility")
    SET i2term3Id=##class(%iKnow.Matching.DictionaryAPI).CreateDictionaryTerm(0,itemId2,
        "winds")
DisplayDictionary
  SET domId=##class(%iKnow.Domain).GetOrCreateId("mydomain")
  SET stat=##class(%iKnow.Matching.DictionaryAPI).GetDictionaryItemsAndTerms(.result,0,dictId)
  SET i=1
  WHILE $DATA(result(i)) {
      WRITE $LISTTOSTRING(result(i),",",1),!
      SET i=i+1 }
  WRITE "End of items in dictionary ",dictId,!!
   /* ... */
ResetForNextTime
  IF dictId = -1 {
     SET dictId=##class(%iKnow.Matching.DictionaryAPI).GetDictionaryId(0,dictname)}
  SET stat=##class(%iKnow.Matching.DictionaryAPI).DropDictionary(0,dictId)
  IF stat {WRITE "deleted dictionary ",dictId,! }
  ELSE    { WRITE "DropDictionary error ",$System.Status.DisplayError(stat) } 
 
Defining a Format Term
The %iKnow.Matching.Formats package provides three simple format classes:
You can create additional format classes as needed.
The following example uses %iKnow.Matching.Formats.SimpleSuffixFormat. It first defines a dictionary containing one item: speed. The “speed” item contains two terms: “excessive speed” and the suffix format term “mph” (miles per hour). This suffix format will match any entity that ends with the suffix “mph”, for example “65mph”:
  SET domId=##class(%iKnow.Domain).GetOrCreateId("mydomain")
  /* ... */
CreateDictionary
  SET dictname="Traffic"
  SET dictdesc="A dictionary of traffic enforcement terms"
  SET dictId=##class(%iKnow.Matching.DictionaryAPI).CreateDictionary(domId,dictname,dictdesc)
  IF dictId=-1 {WRITE "Dictionary ",dictname," already exists",!
                GOTO ResetForNextTime }
  ELSE {WRITE "created a dictionary ",dictId,!}
CreateDictionaryItemAndTerms
  SET item1Id=##class(%iKnow.Matching.DictionaryAPI).CreateDictionaryItem(domId,dictId,"speed",domId_dictId_"speed")
  SET term1Id=##class(%iKnow.Matching.DictionaryAPI).CreateDictionaryTerm(domId,item1Id,
            "excessive speed")
  SET term2Id=##class(%iKnow.Matching.DictionaryAPI).CreateDictionaryTermFormat(domId,
            item1Id,"%iKnow.Matching.Formats.SimpleSuffixFormat",$LB("mph",0,3))
  WRITE "dictionary=",dictId,!,"item=",item1Id,!,"terms=",term1Id," ",term2Id,!!
   /* ... */
ResetForNextTime
  IF dictId = -1 {
     SET dictId=##class(%iKnow.Matching.DictionaryAPI).GetDictionaryId(domId,dictname)}
  SET stat=##class(%iKnow.Matching.DictionaryAPI).DropDictionary(domId,dictId)
  IF stat {WRITE "deleted dictionary ",dictId,! }
  ELSE { WRITE "DropDictionary error ",$System.Status.DisplayError(stat) }
 
Multiple Formats in a Dictionary Term
You can input dictionary formats directly as part of a dictionary term. This allows you to create a dictionary term containing multiple elements, including one or more format elements, as well as string elements.
To use this feature, you specify a "coded" description of the format as part of the string submitted to the CreateDictionaryTerm() method. This coded description has the following format:
@@@User.MyFormatClass@@@param1@@@param2@@@
This description consists of the full class name of the format class (implementing %iKnow.Matching.Formats.Format), a @@@ separator, and a @@@-delimited list of the format parameters to be passed to the format class. The entire description is delimited with @@@ markers at the beginning and end.
If the format class takes no parameters, or the defaults are to be used, specify the format class name delimited by @@@ markers.
When including this format in a dictionary term string, you must make sure that iKnow will recognize it as a single entity. For examples, the term "was born in @@@User.MyYearFormat@@@" is interpreted as a single entity, but the term "was born in the year @@@User.MyYearFormat@@@" is not.
If iKnow cannot find the specified format class, the @@@ usage is considered intentional and the whole entity is treated as a simple string element.
Using this syntax makes it easier to load dictionaries from files or tables without requiring separate steps or actions for the formats.
Listing and Copying Dictionaries
The %iKnow.Matching.DictionaryAPI class has a number of methods to count or list existing dictionaries and their items and terms.
The %iKnow.Utils.CopyUtils class has a number of methods to copy a dictionary or all dictionaries from one domain to another.
Listing Existing Dictionaries
The following example lists all of the dictionaries in the domain. For the purpose of demonstration, this example first creates two empty dictionaries, one in English (the default language) and one in French:
  SET domId=##class(%iKnow.Domain).GetOrCreateId("mydomain")
  SET dictname1="Diseases",dictname2="Maladies"
  SET dictdesc1="English disease terms",dictdesc2="French disease terms"
CreateFirstDictionary
  SET dictId1=##class(%iKnow.Matching.DictionaryAPI).CreateDictionary(domId,dictname1,dictdesc1)
  IF dictId1 = -1 {
     SET dictId=##class(%iKnow.Matching.DictionaryAPI).GetDictionaryId(domId,dictname1)
     SET stat=##class(%iKnow.Matching.DictionaryAPI).DropDictionary(domId,dictId)
     IF stat '= 1 { WRITE "DropDictionary error ",$System.Status.DisplayError(stat)
                    QUIT }
     GOTO CreateFirstDictionary }
    ELSE {WRITE "created a dictionary ",dictId1,!}
CreateSecondDictionary
  SET dictId2=##class(%iKnow.Matching.DictionaryAPI).CreateDictionary(domId,dictname2,dictdesc2,"fr")
  IF dictId2 = -1 {
     SET dictId=##class(%iKnow.Matching.DictionaryAPI).GetDictionaryId(domId,dictname2)
     SET stat=##class(%iKnow.Matching.DictionaryAPI).DropDictionary(domId,dictId)
     IF stat '= 1 { WRITE "DropDictionary error ",$System.Status.DisplayError(stat)
                    QUIT }
     GOTO CreateSecondDictionary }
  ELSE {WRITE "created a dictionary ",dictId2,!}

GetDictionaries
  SET stat=##class(%iKnow.Matching.DictionaryAPI).GetDictionaries(.dicts,domId)
  WRITE "get dictionaries status is:",$System.Status.DisplayError(stat),!!
  SET k=1
  WHILE $DATA(dicts(k)) {
      WRITE $LISTTOSTRING(dicts(k)),!
      SET k=k+1 }
  WRITE "End of list of dictionaries"
 
GetDictionaries() lists the Id, name, description, and language for each dictionary.
Copying Dictionaries
You can copy dictionaries from one domain to another within the current namespace.
Extending Dictionary Constructs
Though iKnow only describes simple dictionaries in the Matching API, this does not restrict you from using more advanced tools like ontologies, taxonomies or other more hierarchical constructs. The goal of the Matching API is to provide the hooks for just the matching, rather than yet another generic structure that tries to cover every construct. Therefore, you should just flatten the structure of the ontology or taxonomy you have. By appropriately choosing your dictionary item URIs, you'll be able to reconstruct or interpret the matching results within the context of your ontology or taxonomy.
In the Matching API, the formatting bits are pluggable in the sense that you can provide your own implementation of a class that does for example regular expression matching by implementing the %iKnow.Matching.Formats.Format interface.