Skip to main content

NLP UserDictionary

NLP UserDictionary

A UserDictionary specifies a set of user-defined paired terms applied to the source texts. NLP substitutes each occurrence of the first term of the pair with the second term as part of source text listing. This operation changes the source text used by NLP; all subsequent NLP operations see only the substituted term. For example, if UserDictionary replaces the abbreviation “Dr.” with “Doctor”, every occurrence of “Dr.” is replaced by the word “Doctor” in the data indexed by NLP. The original source file is not changed, but all representations of the source text within NLP contain this substitution. Unlike all other components of NLP, UserDictionary changes the source content before listing and loading.

You can use the UserDictionary to substitute one term for another, to expand acronyms and abbreviations (or the reverse), or to avoid or cause a sentence break.

Substitution pairs are applied before NLP text normalization, which converts the NLP internal text representation to lowercase letters. For this reason, substitution pairs are case-sensitive. Thus, to replace all instances of “physician” with “doctor” you will need the substitution pairs "physician","doctor", "Physician","Doctor", and perhaps "PHYSICIAN","DOCTOR".

A UserDictionary is also used for user-defined attribute terms, such terms that indicate a positive sentiment or a negative sentiment attribute.

Defining a UserDictionary is optional. A UserDictionary exists independent of any specific configuration or domain. A defined UserDictionary can be assigned as a Configuration property. Only one UserDictionary can be assigned to a Configuration. The same UserDictionary can be assigned to multiple Configurations.

A defined UserDictionary can also be specified to the NormalizeWithParams()Opens in a new tab method, independent of any Configuration.

Note:

You cannot modify an existing configuration; a %New() does not delete/replace an existing configuration. Therefore, to add a UserDictionary to an existing configuration you must explicitly delete then re-create the named configuration. Alternatively, you can create a new configuration with a new configuration name.

The UserDictionary is applied to sources when the sources are listed; already indexed sources are not affected by changes to UserDictionary.

UserDictionary Format

UserDictionary pairs often perform the simple substitution of a term for an equivalent term. For example, replacing every occurrence of “physician” with “doctor”. Using the backslash character provides additional formatting options:

Format Meaning
\ Only perform substitution if a blank space occurs here.
\noend Do not issue a sentence break.
\end Issue a sentence break.

These are shown in the following sample UserDictionary pairs:

\UK,United Kingdom
\+\,plus
Fr.,\noend
\STOP,\end

Defining a UserDictionary in Domain Architect

You can define a UserDictionary as part of Domain Settings when creating a domain using the interactive Domain Architect tool.

Defining a UserDictionary as an Object Instance

You must first create a UserDictionary object, then populate that instance.

  SET udict=##class(%iKnow.UserDictionary).%New("MyUserDict") 
  DO udict.%Save()
  DO udict.AddEntry("Dr.","Doctor")
  DO udict.AddEntry("physician","doctor")
  DO udict.AddEntry("Physician","Doctor")

To populate a UserDictionary object, you use the AddEntry()Opens in a new tab method to specify substitution pairs. Each substitution pair requires a separate AddEntry() with the following format: AddEntry(oldstring,newstring). Note that substitution is string substitution, and that pairs are case-sensitive. You can, optionally, specify the position at which to add the UserDictionary entry (the position default is to add the entry at the end of the UserDictionary). Because NLP applies substitution pairs in UserDictionary order, you can use position to perform additive substitutions. For example, first replace “PA” with “physician’s assistant”, then replace “physician” with “doctor”.

To add user-defined attribute terms, such as Sentiment attributes, you use the appropriate instance method, as shown in the following example:

  SET udict=##class(%iKnow.UserDictionary).%New("SentimentUserDict") 
  DO udict.%Save()
  DO udict.AddNegativeSentimentTerm("bad")
  DO udict.AddNegativeSentimentTerm("horrible")
  DO udict.AddPositiveSentimentTerm("good")
  DO udict.AddPositiveSentimentTerm("excellent")

The same UserDictionary can contain both substitution pairs and attribute terms.

To assign a UserDictionary object, you supply the UserDictionary name as the 4th argument in the Configuration %New() method:

  SET cfg=##class(%iKnow.Configuration).%New("MyConfig",0,$LISTBUILD("en"),"MyUserDict",1)
  DO cfg.%Save()

Defining a UserDictionary as a File

You must first create a UserDictionary file, populate it, then assign this UserDictionary file to a Configuration.

A UserDictionary file must be a text file in UTF-8 format encoding.

To populate a UserDictionary file, you specify substitution pairs in a text file. Each substitution pair is a separate line with the following format: oldstring,newstring. Note that substitution is string substitution, and that pairs are case-sensitive. The following is a sample UserDictionary file:

Mr.,Mister
Dr.,Doctor
Fr.,Fr
\UK,United Kingdom

To assign a UserDictionary file, you supply the full pathname as the 4th argument in the Configuration %New() method:

  SET cfg=##class(%iKnow.Configuration).%New(myconfig,0,$LISTBUILD("en"),"C:\temp\udict.txt",1)
  DO cfg.%Save()
FeedbackOpens in a new tab