Skip to main content
Previous sectionNext section

Semantic Attributes

InterSystems NLP identifies concepts and their context in a natural language text.

Concepts are indivisible word groups that have a meaning on their own, and are often associated with other concepts in a sentence by relations. For example, in the sentence “Patient is being treated for acute pulmonary hypertension,” NLP identifies the word groups “patient” and “acute pulmonary hypertension” as concepts. These concepts are associated by the relation “is being treated for”.

Context is provided by associating a concept with a semantic attribute. For example, in the sentence “Patient is not being treated for acute pulmonary hypertension,” the concept “acute pulmonary hypertension” has the same meaning, but its context is clearly different. The context for this concept is that it appears in a part of the sentence that is negated. An application created with NLP can provide much more focused results by differentiating instances of a concept that are not negated from those that are.

Negation is one of several semantic attributes that associate concepts with specific contexts.

NLP supports the following semantic attributes:

  • Negation: (red) The negation attribute identifies a part of a sentence as expressing negation. A negation word or phrase negates the meaning of the associated concept or path. For example, “no”, “do not”, “never fully”, “would not”, “without”, “neither”, “nowhere”, “nothing”.

  • Measurement: (purple) The measurement attribute identifies a part of a sentence as having an associated numeric measurement. A measurement attribute can be a count, or can identify dimensions, volume, weight, currency amounts, or other numerically quantified concepts. For example, “single item”, “three groups”, “12 feet”, “half”, “$200”, “more than 1.5 billion people”, “a few thousand years old”, “a pair of scientists”, “several widgets”, “fewer averse effects”.

  • Time: (orange) The time attribute identifies a part of a sentence as specifying a date or time. This can be a specific time, such as “the year 1848”, “Thursday”, “last January”, “summer”, “evening”, “the 1990s”, “5:00pm”, “this year’s award”, “today”, or it can be a relative time sequence, such as “now”, “then”, “later”, “at that time”, “eventually”.

  • Duration: (green) The duration attribute identifies a length of time. For example, “150 years”, “several hours”, “days”, “more than two years”, “the period when”, “long-term”, “temporary”, “permanent”, “a few thousand years”.

  • Frequency: (yellow) The frequency attribute identifies instances or repetitions in time, and identifies ordinal numbers that indicate a specific instance. For example, “sometimes”, “first time”, “often”, “usually”, “twice”, “daily”, “every month”, “annually”, “the fourth attempt”.

  • Sentiment: The two sentiment attributes identify a part of a sentence as having either a positive sentiment or a negative sentiment. For example, the words “avoid”, “harm”, “reject” typically convey a negative sentiment in many (but not all) contexts; the words “approve”, “accept”, “beneficial” convey a positive sentiment. Because sentiment terms are highly dependent on the kind of texts being analyzed, NLP does not automatically identify sentiment terms. You can create a list of positive and negative sentiment terms using the NLP UserDictionary. These terms are then identified in the texts by the NLP engine, which determines which parts of the sentence or path is affected by it.

An NLP attribute flag is associated with one or more marker terms (word or short phrase) that affects the interpretation of a path or sentence. A part of a path or sentence is flagged as being affected by that attribute, and can thus be separated from similar parts of paths or sentences that do not have the attribute.

Negation

Negation is the process that turns an affirmative sentence (or part of a sentence) into its opposite, a denial. For example, the sentence “I am a doctor.” can be negated as “I am not a doctor.” In analyzing text it is often important to separate affirmative statements about a topic from negative statements about that topic.

NLP provides a means to determine if a sentence or path is negated. During source indexing NLP associates the attribute “negation” with a sentence and indicates which part of the text is negated.

While in its simplest form negation is simply associating “no” or “not” with an affirmative statement or phrase, in actual language usage negation is a more complex and language-specific operation. There are two basic types of negation:

  • Formal, or grammatical, negation is always indicated by a specific morphological element in the text. For example, “no”, “not”, “don’t” and other specific negating terms. These negating elements can be part of a concept “He has no knowledge of” or part of a relation “He doesn’t know anything about”. Formal negation is always binary: a sentence (or part of a sentence) either contains a negating element (and is thus negation), or it is affirmative.

  • Semantic negation is a complex, context-dependent form of negation that is not indicated by any specific morphological element in the text. Semantic negation depends upon the specific meaning of a word or word group in a specific context, or results from a specific combination of meaning and tense (for example, conjunctive and subjunctive tenses in Romance languages). For example, “Fred would have been ready if he had stayed awake” and “Fred would have been ready if the need had arisen” say opposite things about Fred’s readiness. Semantic negation is not a binary principle; it is almost never absolute, but is subject to contextual and cultural insights.

The NLP language models contain a variety of language-specific negation words and structures. Using these language models the NLP analysis engine is able to automatically identify and flag for future use most instances of formal negation as part of the source loading operation. However, NLP cannot identify instances of semantic negation.

The largest unit of negation in NLP is a path; NLP cannot identify negations in text units larger than a path. Many, but not all, sentences comprise a single path.

Properties of Formal Negation

Formal negation can be defined by three properties:

  • Negation markers: formal negation is always marked by one or more negation markers. These negation markers can be part of a concept or a relation. Some examples of negation markers in English are no, not, doesn’t, isn’t, hasn’t, neither, nor, never, nothing, none, nobody, nowhere. NLP always identifies a negation marker as part of a concept or part of a relation.

  • Negation span: negation is always expressed in the broader context of a statement or a sentence. The effect of formal negation is that the statement or sentence (or some part of it) is negated. Therefore, it is important to determine the span of the negation, the part of the sentence that is made negative by the negation marker(s). The maximum span of a formal negation is a full sentence.

  • Negation stopper: in many cases the span of the negation is not a full sentence. The span of the negation is terminated by a negation stopper, such as the words “but” and “or”. NLP identifies negation stoppers and uses this information to limit negation span.

NLP uses these properties to identify negated units of text. Negation markers are tagged at the entity (concept or relation) level by assigning a negation attribute. Negation span is tagged at the path level with negation-begin and negation-end tags.

Japanese supports the negation attribute at the entity level, but because of the fundamentally different definition of paths in Japanese, path expansion is not supported. Therefore, negation for Japanese does not necessarily expand to all affected entities at the path level.

Using Negation Attributes

Negation analysis information can be used with the following methods:

You can specify the negation attribute ID using the $$$IKATTNEGATION macro, defined in the %IKPublic #Include file.

Negation Attribute Structure

Negation is implemented in NLP as an attribute. That is, sources, sentences, or paths that contain negation have the negation attribute. This attribute is a %List structure with the following elements:

  • Element 2 is the word “negation”

  • Element 3 is the entity position that contains the first negation marker. A negation marker can be part of a relation or a concept. For example, in “The White Rabbit usually hasn't any time.” the negation marker is in entity 3, the relation “usually hasn’t”. In “The White Rabbit usually has no time.” the negation marker is in entity 4, the concept “no time”. Note that for this position count non-relevant words (such as “the” and “a”) are counted as separate entities.

  • Element 4 is the scope of the negation as a count of entities. Negation scope is counted from the first entity containing a negation marker to the last entity containing a negation marker. For example, “The man is neither fat nor thin” has a negation scope of 3 entities: “is neither/fat/nor”.

  • Element 5 shows the position of the negation marker within the entity as a bit map. A “1” indicates a word that is a negation marker; a “0” indicates a word that is not a negation marker. A negation marker consisting of two adjacent words, such as “is not”, is indicated as “11”. Entity mapping stops when the negation marker has been indicated. For example, the relation “is often not” is “001”, while the relation “often is not” is “011”, and the relation “is not often” is “11”.

Negation Bit Map

Element 5 is the negation bit map. It indicates where the negation markers are in the negation scope. When the negation scope is 1, this is a simple bit map. When the negation scope is greater than one, this is a series of bit maps separated by spaces, one bit map for each entity within the negation scope.

Within the negation scope, if an entity contains a negation marker the negation marker and each word preceding it is indicated by either a 1 (negation marker word) or a 0 (word preceding the negation marker). If an entity within the negation scope does not contain a negation marker, the whole entity is represented by a single 0. Note that non-relevant words, such as “a” and “the”, are considered to be separate entities. Some examples of negation bit mapping are shown in the following table:

Negation Bit Map Sentence Text with / entity dividers and underlined negation markers
01 0 1 Bartleby / is neither / busy / nor idle.
01 0 1 Bartleby / is neither / sixty-five years old / nor retired.
1 0 1 Bartleby / is / no idler / and certainly is / no loafer.
11 0 0 01 Bartleby / is not / my / favorite fictional character / but neither is / he / my / least favorite.
1 0 0 01 Bartleby / isn’t / my / favorite fictional character / but neither is / he / my / least favorite.
001 0 0011 Bartleby / is either not / trying very hard / or he is not / succeeding.
11 0 0 0011 Bartleby / is not / a / wholly realistic character / and yet is not / wholly unbelievable.
1 0001 Bartleby / never works, / but he is never / wholly idle.
1 001 Bartleby / does / nothing / and yet never is / he / idle.

The largest entity bit map is 8 bits. In rare cases a negation marker can be more than eight words from the beginning of its entity. If the negation marker is a two-word marker at positions 8 and 9, the second “1” is omitted (“00000001”); if the negation marker is at position 9 or greater, no bit map is returned. In the following examples the negation marker is in the second entity, a relation containing many words (due to the semantic ambiguity of the word “in”): “They start when you get in and are not finished when you leave.” maps as “00000011”; “They start when you get in and they are not finished when you leave.” maps as “00000001” (second word of the negation marker not mapped); “They often start when you get in and they are not finished when you leave.” returns no bit map.

You can determine if a negation bit map has been omitted by comparing the Element 4 scope of negation entity count with the Element 5 number of blank-separated bit maps. If these two counts do not match one or more negation entity bit maps are missing.

Negation and Dictionary Matching

NLP recognizing negated entities when matching against a dictionary. It calculates the number of entities that are part of a negation and stores this number as part of the match-level information (as returned by methods such as GetMatchesBySource() or as the NegatedEntityCount property of %iKnow.Objects.DictionaryMatch). This allows you to create code that interprets matching results by considering negation content, for example by comparing negated entities to the total number of entities matched.

For further details, refer to the Smart Matching: Using a Dictionary chapter of this manual.

Negation Examples

Refer to A Note on Program Examples for details on the coding and data used in the examples in this book.

The following example uses %iKnow.Queries.SourceAPI.GetAttributes() to search each source in a domain for paths and sentences that have the negation attribute. It displays the PathId or SentenceId, the start position and the span of each negation. To limit %iKnow.Queries.SourceAPI.GetAttributes() to paths, specify $$$IKATTLVLPATH rather than $$$IKATTLVLANY:

#Include %IKPublic
DomainCreateOrOpen
  SET dname="mydomain"
  IF (##class(%iKnow.Domain).NameIndexExists(dname))
     { WRITE "The ",dname," domain already exists",!
       SET domoref=##class(%iKnow.Domain).NameIndexOpen(dname)
       GOTO DeleteOldData }
  ELSE 
     { WRITE "The ",dname," domain does not exist",!
       SET domoref=##class(%iKnow.Domain).%New(dname)
       DO domoref.%Save()
       WRITE "Created the ",dname," domain with domain ID ",domoref.Id,!
       GOTO ListerAndLoader }
DeleteOldData
  SET stat=domoref.DropData()
  IF stat { WRITE "Deleted the data from the ",dname," domain",!!
            GOTO ListerAndLoader }
  ELSE    { WRITE "DropData error ",$System.Status.DisplayError(stat)
            QUIT}
ListerAndLoader
  SET domId=domoref.Id
  SET flister=##class(%iKnow.Source.SQL.Lister).%New(domId)
  SET myloader=##class(%iKnow.Source.Loader).%New(domId)
QueryBuild
   SET myquery="SELECT TOP 100 ID AS UniqueVal,Type,NarrativeCause FROM Aviation.Event"
   SET idfld="UniqueVal"
   SET grpfld="Type"
   SET dataflds=$LB("NarrativeCause")
UseLister
  SET stat=flister.AddListToBatch(myquery,idfld,grpfld,dataflds)
      IF stat '= 1 {WRITE "The lister failed: ",$System.Status.DisplayError(stat) QUIT }
UseLoader
  SET stat=myloader.ProcessBatch()
      IF stat '= 1 {WRITE "The loader failed: ",$System.Status.DisplayError(stat) QUIT }
GetSourcesAndAttributes
   SET numSrcD=##class(%iKnow.Queries.SourceQAPI).GetCountByDomain(domId)
   DO ##class(%iKnow.Queries.SourceAPI).GetByDomain(.srcs,domId,1,numSrcD)
   SET i=1
   WHILE $DATA(srcs(i)) {
      SET srcId = $LISTGET(srcs(i),1)
      SET i=i+1
      DO ##class(%iKnow.Queries.SourceAPI).GetAttributes(.att,domId,srcId,1,10,"",$$$IKATTLVLANY)
      SET j=1
      WHILE $DATA(att(j)) {
          IF $LISTGET(att(j),1)=1 {
            SET type=$LISTGET(att(j),2)
            SET level=$LISTGET(att(j),3)
            SET targId=$LISTGET(att(j),4)
            SET start=$LISTGET(att(j),5)
            SET span=$LISTGET(att(j),6)
               IF level=1 {WRITE "source ",srcId," ",type," path ",targId," start at ",start," span ",span,!}
               ELSEIF level=2 {WRITE "source ",srcId," ",type," sentence ",targId," start at ",start," span ",span,!!}
               ELSE {WRITE "unexpected attribute level",! }
         }
     SET j=j+1
     }
    }
Copy code to clipboard

The following example uses %iKnow.Queries.SentenceAPI.GetAttributes() to find those sentences in each source in a domain that have the negation attribute. It displays which sentence id of those sentences that have this attribute, and the entity position that contains the negation marker. It then displays the text of these sentences.

#Include %IKPublic
DomainCreateOrOpen
  SET dname="mydomain"
  IF (##class(%iKnow.Domain).NameIndexExists(dname))
     { WRITE "The ",dname," domain already exists",!
       SET domoref=##class(%iKnow.Domain).NameIndexOpen(dname)
       GOTO DeleteOldData }
  ELSE 
     { WRITE "The ",dname," domain does not exist",!
       SET domoref=##class(%iKnow.Domain).%New(dname)
       DO domoref.%Save()
       WRITE "Created the ",dname," domain with domain ID ",domoref.Id,!
       GOTO ListerAndLoader }
DeleteOldData
  SET stat=domoref.DropData()
  IF stat { WRITE "Deleted the data from the ",dname," domain",!!
            GOTO ListerAndLoader }
  ELSE    { WRITE "DropData error ",$System.Status.DisplayError(stat)
            QUIT}
ListerAndLoader
  SET domId=domoref.Id
  SET flister=##class(%iKnow.Source.SQL.Lister).%New(domId)
  SET myloader=##class(%iKnow.Source.Loader).%New(domId)
QueryBuild
   SET myquery="SELECT TOP 100 ID AS UniqueVal,Type,NarrativeCause FROM Aviation.Event"
   SET idfld="UniqueVal"
   SET grpfld="Type"
   SET dataflds=$LB("NarrativeCause")
UseLister
  SET stat=flister.AddListToBatch(myquery,idfld,grpfld,dataflds)
      IF stat '= 1 {WRITE "The lister failed: ",$System.Status.DisplayError(stat) QUIT }
UseLoader
  SET stat=myloader.ProcessBatch()
      IF stat '= 1 {WRITE "The loader failed: ",$System.Status.DisplayError(stat) QUIT }
GetSourcesAndSentences
   SET numSrcD=##class(%iKnow.Queries.SourceQAPI).GetCountByDomain(domId)
   DO ##class(%iKnow.Queries.SourceAPI).GetByDomain(.srcs,domId,1,numSrcD)
   SET i=1
   WHILE $DATA(srcs(i)) {
      SET srcId = $LISTGET(srcs(i),1)
      SET i=i+1
      SET st = ##class(%iKnow.Queries.SentenceAPI).GetBySource(.sent,domId,srcId)
      SET j=1
      WHILE $DATA(sent(j)) {
         SET sentId=$LISTGET(sent(j),1)
         SET text=$LISTGET(sent(j),2)
         SET j=j+1
CheckSentencesForNegation
         SET atstat=##class(%iKnow.Queries.SentenceAPI).GetAttributes(.att,domId,sentId)
         SET k=1
            WHILE $DATA(att(k)) {
             WRITE "sentence ",sentId," has attribute=",$LISTGET(att(k),2)
             WRITE ", marker at entity position=",$LISTGET(att(k),3),!
             /* Format for display */
             WRITE sentId,": "
             SET x=1
             SET totlines=$LENGTH(text)/60
               FOR L=1:1:totlines {
               WRITE $EXTRACT(text,x,x+60),!
               SET x=x+61 }
             WRITE "END OF SENTENCE ",sentId,!!
         SET k=k+1 }
    }
  }
Copy code to clipboard

Adding Negation Terms

You can specify negation for other specific words or phrases using the NLP UserDictionary. Using the AddNegationTerm() method, you can add a list of negation terms to a UserDictionary. When source texts are loaded into a domain, all appearances of these terms are flagged with the negation marker.

Negation Special Cases

The following are a few peculiarities of negation in English:

  • No.: The word “No.” (with capital letter and period, quoted or not quoted) in English is treated as an abbreviation. It is not treated as negation and is not treated as the end of a sentence. Lowercase “no.” is treated as negation and as a sentence ending.

  • Nor: The word “Nor” at the beginning of a sentence is not marked as negation. Within the body of a sentence the word “nor” is marked as negation.

  • No-one: The hyphenated word “no-one” is treated as a negation marker. Other hyphenated forms (for example, “no-where”) are not.

  • False negatives: Because formal negation depends on words, not context, occasional cases of false negatives may inevitably arise. For example, the sentences “There was no answer” and “The answer was no” are both flagged as negation.

Because negation operates on sentence units, it is important to know what NLP does (and does not) consider a sentence. For details on how NLP identifies a sentence, refer to the Logical Text Units Identified by NLP section of the “Conceptual Overview” chapter.

Measurement

Documents commonly contain structured data elements that express a quantity. These can include counts, lengths, weights, currency amounts, medication dosages, and other quantified expressions. These data element commonly consist of a numeric expression (specified as either numbers or words) and either an associated concept term (23 widgets) for a count, or some form of unit for a measurement. NLP annotates this combination of a number and a unit of measure as a measurement marker term at the word level.

To handle fractional numeric values, a leading period is included as part of the measurement number.

In addition to annotating the number and unit, InterSystems NLP uses attribute expansion rules to identify the other concepts “involved” in the measurement. This annotated sequence of concepts captures what is being measured, rather than just the measurement itself.

These expanded attributes can be used to:

  • Extract all of the measurable facts in a document by highlighting them or displaying them in a list.

  • Narrow the display of a specific concept to only those that are associated with a measurement.

Note:

The Measurement attribute is currently only supported for English.

Generally, any concept associated with a number is flagged with the measurement attribute. The exceptions are described in “Time, Duration, and Frequency”.

Using Measurement Attributes

Measurement attributes can be used with the following methods:

You can specify the measurement attribute using the $$$IKATTTYPEPROPS macro, defined in the %IKInclude file.

Time, Duration, and Frequency

Documents may also contain structured data that expresses time, duration, or frequency. These are annotated as separate attributes, commonly consisting of an attribute term as part of a concept. These attributes are identified based on marker terms identified in the language. They may or may not include a specific number.

A number, either specified with numerals or with words is almost always treated as a measurement attribute. However, a time attribute can contain a numeric, and a frequency attribute can contain an ordinal number.

The following are some of the guidelines in the English language model governing numbers:

  • Numerics: A number with no associated term could be a measurement or a year. Numbers from 1900 through 2039 are assumed to be years and are assigned the time attribute (1923, 2008 applicants). Numbers outside this range (1776) are not considered to be years, unless the word “year” is specified (the year 1776). Isolated numbers outside this range (1776) are assigned no attribute. Numbers outside this range with an associated term are assumed to be measurements (1776 applicants). Two-digit numbers with an appended apostrophe (for example Winter of '89) are assumed to be years and are assigned the time attribute. Numbers with numeric or currency punctuation (1,973, -1973, 1973.0, or $1973), and numbers expressed in words (nineteen seventy-three) are assumed to be measurements. A valid time numeric (12:34:33) is assigned the time attribute.

  • Ordinal numbers: An ordinal in a concept with other words takes the frequency attribute when spelled out (the fourth attempt), the measurement attribute when specified as a number (the 4th attempt). Spelled-out ordinals beyond “tenth” do not take a frequency attribute. An ordinal by itself in a concept does not take an attribute (a fifth of scotch, came in third).

    A spelled out ordinal (first through tenth) following another number takes the measurement attribute as a fraction (one third, two fifths), with exception of “one second” which takes the duration attribute.

    An ordinal of any size with a month name takes the time attribute (sixteenth of October, October 16th, October sixteenth), except May, which is ambiguous in English and therefore doesn’t take an attribute.

Note:

The Time, Duration, and Frequency attributes are currently only fully supported for English. Partial support is provided for Dutch and Czech.

Sentiment

A sentiment attribute flags a sentence as having either a positive or negative sentiment. Sentiment terms are highly dependent on the kind of texts being analyzed. For example, in a customer perception survey context the following terms might be flagged with a sentiment attribute:

  • The words “avoid”, “terrible”, “difficult”, “hated” convey a negative sentiment.

  • The words “attractive”, “simple”, ”self-evident”, “useful”, “improved” convey a positive sentiment.

Because sentiment terms are often specific to the nature of the source texts, NLP does not automatically identify sentiment terms. You can flag individual words as having a positive sentiment or a negative sentiment attribute. By default, no words have a sentiment attribute. You can specify a sentiment attribute for specific words using the NLP UserDictionary. Using the AddPositiveSentimentTerm() and AddNegativeSentimentTerm() methods, you can add a list of sentiment terms to a UserDictionary. When source texts are loaded into a domain, each appearance of these terms and the part of the sentence affected by it is flagged with the specified positive or negative sentiment marker.

For example, if “hated” is specified as having a negative sentiment attribute, and “amazing” is specified as having a positive sentiment attribute, when NLP applies them to the sentence:

I hated the rain outside, but the running shoes were amazing.

Negative sentiment would affect “rain” and positive sentiment would affect “running shoes”.

When a positive or negative sentiment attribute appears in a negated part of a sentence, the sense of the sentiment is reversed. For example, if the word “good” is flagged as a positive sentiment, the sentence “The coffee was good” is a positive sentiment, but the sentence “The coffee was not good” is a negative sentiment.

Sentiment attributes are not currently supported for Japanese.

Using Sentiment Attributes

Sentiment Analysis information can be used with the following methods:

You can specify a sentiment attribute ID using either the $$$IKATTSENPOSITIVE or $$$IKATTSENNEGATIVE macro, defined in the %IKPublic #Include file.