Using iKnow
Dominance and Proximity
[Back] [Next]
   
Server:docs1
Instance:LATEST
User:UnknownUser
 
-
Go to:
Search:    

The iKnow semantics package consists of two classes: %iKnow.Semantics.DominanceAPI and %iKnow.Semantics.ProximityAPI. These classes with their parameters and methods are described in the InterSystems Class Reference.

This chapter describes:
Semantic Dominance
Semantic dominance is the overall importance of an entity within a source. iKnow determines semantic dominance by performing the following tests and obtaining statistical results:
Note:
Japanese uses a different, language-specific set of statistics to calculate the semantic dominance of an entity. See iKnow Japanese.
iKnow generates these values when the sources are loaded as part of iKnow indexing. It combines these values to produce a dominance score for each entity. The higher the dominance score, the more dominant the entity is within the source.
For example, for the concept “cardiovascular surgery” to be semantically dominant in a document, a statistical analysis would be performed. It determines that this concept appears 20 times in the source. A dominant concept is not the same as a top concept. In this source the concepts “doctor” (60 times), “surgery” (50 times), “operating room” (40 times), and “surgical procedure” (30 times) are far more common. However, the component words of “cardiovascular surgery” appear over twice as many times as the concept itself: “cardiovascular” (50 times) and “surgery” (80 times), which lends support to this being a dominant concept. In contrast, the concept “operating room” appears 40 times, but its component words appear barely more often than the concept: “operating” 60 times and “room” only 45 times. This indicates that this source is much more concerned with cardiovascular matters and surgery than it is with rooms.
iKnow gives these frequency counts greater or lesser weight based on the number of words in the original entity and whether that entity is a concept or a relation. (There is a much smaller number of commonly-occurring relations than concepts.)
However, to determine how dominant a concept really is, iKnow has to compare it to the total number of concepts in the source. If 5% of the concepts in the source contain the words “cardiovascular” and “surgery”, and these words do not combine in other concepts nearly as frequently as they do together, we know that these words not only appear frequently in the source, but that the source does not have a wide range of subject matter. If, however, the source contains a nearly equal occurrence of the word “surgery” in concepts with “hand”, “kidney” and “brain” and the word “cardiovascular” appears nearly as often with the words “exercise” and “diet” it is apparent that the source contains a wide range of subject matter. The concept “cardiovascular surgery” and its component words may appear more frequently than others, but may not significantly dominate the subject matter of the source.
By performing these statistical calculations, iKnow can determine the dominant concepts in a source — the subjects that are of greatest interest to you. iKnow performs this analysis without using an external reference corpus (such as pre-existing table of the relative frequency of words in a “typical” medical text). iKnow determines dominance using only the contents of the actual source text, and thus can be used on sources on any topic without any prior knowledge of the subject matter.
Dominance in Context
iKnow calculates the dominance of entities (concepts and relations) within a source and assigns each an integer value. It assigns the most dominant concept(s) in a source a dominance value of 1000. You can use the Indexing Results tool to list the dominance values for concepts in a single source.
iKnow calculates the dominance of CRCs within a source. The algorithm uses the dominance values of the entities within the CRC. CRC dominance values are intended only for comparison with other CRC dominance values; CRC dominance values should not be compared with entity dominance values. You can use the Indexing Results tool to list the dominance values for CRCs in a single source.
iKnow calculates the dominance of concepts across all loaded sources as a weighted average. These concept dominance scores are fractional numbers, with the largest possible number being 1000. You can use the Knowledge Portal tool Dominant Concepts option to list the dominance scores for concepts in all loaded sources. You can also use the %iKnow.Semantics.DominanceAPI GetTop() method to list the dominance scores for concepts in all loaded sources.
Concepts of Semantic Dominance
The following are the key elements of semantic dominance:
Semantic Dominance Examples
This chapter describes and provides examples for the following semantic dominance queries:
Dominance Profile Counts
The following example uses the GetProfileCountByDomain() method to return the total count of unique values for an entity type in the specified domain. The available entity types are: 0=concept ($$$SDCONCEPT or $$$SDENTITY); 1=relation ($$$SDRELATION); 2=CRC ($$$SDCRC); 4=aggregate ($$$SDAGGREGATE, the default) which is the total of all concepts, relations, and CRCs.
#Include %IKPublic
  ZNSPACE "Samples"
DomainCreateOrOpen
  SET dname="mydomain"
  IF (##class(%iKnow.Domain).Exists(dname))
     { WRITE "The ",dname," domain already exists",!
       SET domoref=##class(%iKnow.Domain).Open(dname)
       GOTO DeleteOldData }
  ELSE 
     { WRITE "The ",dname," domain does not exist",!
       SET domoref=##class(%iKnow.Domain).%New(dname)
       DO domoref.%Save()
       WRITE "Created the ",dname," domain with domain ID ",domoref.Id,!
       GOTO ListerAndLoader }
DeleteOldData
  SET stat=domoref.DropData()
  IF stat { WRITE "Deleted the data from the ",dname," domain",!!
            GOTO ListerAndLoader }
  ELSE    { WRITE "DropData error ",$System.Status.DisplayError(stat)
            QUIT}
ListerAndLoader
  SET domId=domoref.Id
  SET flister=##class(%iKnow.Source.SQL.Lister).%New(domId)
  SET myloader=##class(%iKnow.Source.Loader).%New(domId)
QueryBuild
   SET myquery="SELECT Top 25 ID AS UniqueVal,Type,NarrativeFull FROM Aviation.Event"
   SET idfld="UniqueVal"
   SET grpfld="Type"
   SET dataflds=$LB("NarrativeFull")
UseLister
  SET stat=flister.AddListToBatch(myquery,idfld,grpfld,dataflds)
      IF stat '= 1 {WRITE "The lister failed: ",$System.Status.DisplayError(stat) QUIT }
UseLoader
  SET stat=myloader.ProcessBatch()
      IF stat '= 1 {WRITE "The loader failed: ",$System.Status.DisplayError(stat) QUIT }
ProfileCounts
  WRITE ##class(%iKnow.Semantics.DominanceAPI).GetProfileCountByDomain(domId,$$$SDCONCEPT)," total concepts",!
  WRITE ##class(%iKnow.Semantics.DominanceAPI).GetProfileCountByDomain(domId,$$$SDRELATION)," total relations",!
  WRITE ##class(%iKnow.Semantics.DominanceAPI).GetProfileCountByDomain(domId,$$$SDCRC)," total CRCs",!
  WRITE ##class(%iKnow.Semantics.DominanceAPI).GetProfileCountByDomain(domId)," aggregate total",!
 
Concepts with Top Dominance Scores in the Domain
The GetTop() method returns the top concepts (or relations) by dominance scores for all loaded sources.
#Include %IKPublic
  ZNSPACE "Samples"
DomainCreateOrOpen
  SET dname="mydomain"
  IF (##class(%iKnow.Domain).Exists(dname))
     { WRITE "The ",dname," domain already exists",!
       SET domoref=##class(%iKnow.Domain).Open(dname)
       GOTO DeleteOldData }
  ELSE 
     { WRITE "The ",dname," domain does not exist",!
       SET domoref=##class(%iKnow.Domain).%New(dname)
       DO domoref.%Save()
       WRITE "Created the ",dname," domain with domain ID ",domoref.Id,!
       GOTO ListerAndLoader }
DeleteOldData
  SET stat=domoref.DropData()
  IF stat { WRITE "Deleted the data from the ",dname," domain",!!
            GOTO ListerAndLoader }
  ELSE    { WRITE "DropData error ",$System.Status.DisplayError(stat)
            QUIT}
ListerAndLoader
  SET domId=domoref.Id
  SET flister=##class(%iKnow.Source.SQL.Lister).%New(domId)
  SET myloader=##class(%iKnow.Source.Loader).%New(domId)
QueryBuild
   SET myquery="SELECT Top 25 ID AS UniqueVal,Type,NarrativeFull FROM Aviation.Event"
   SET idfld="UniqueVal"
   SET grpfld="Type"
   SET dataflds=$LB("NarrativeFull")
UseLister
  SET stat=flister.AddListToBatch(myquery,idfld,grpfld,dataflds)
      IF stat '= 1 {WRITE "The lister failed: ",$System.Status.DisplayError(stat) QUIT }
UseLoader
  SET stat=myloader.ProcessBatch()
      IF stat '= 1 {WRITE "The loader failed: ",$System.Status.DisplayError(stat) QUIT }
SourceCount
  SET numSrcD=##class(%iKnow.Queries.SourceAPI).GetCountByDomain(domId)
  WRITE "The domain contains ",numSrcD," sources",!!
DominantConcepts
  DO ##class(%iKnow.Semantics.DominanceAPI).GetTop(.profresult,domId,1,50)
  WRITE "Top Concepts in Domain by Dominance Score",!
  SET j=1
  WHILE $DATA(profresult(j),list) {
     WRITE $LISTGET(list,2)
     WRITE ": ",$LISTGET(list,3),!
     SET j=j+1 }
  WRITE !,"Printed ",j-1," dominant concepts"
 
CRC Dominance Scores in the Domain
The GetProfileByDomain() method returns the dominance scores for CRCs in a domain. For each entity returns the following:
The following example uses the GetProfileByDomain() method to return the dominant profile entities, in this case CRCs. For each CRC it returns the value, the source Id, the type, and the dominance score. Because they are all CRCs, the type is always 2:
#Include %IKPublic
  ZNSPACE "Samples"
DomainCreateOrOpen
  SET dname="mydomain"
  IF (##class(%iKnow.Domain).Exists(dname))
     { WRITE "The ",dname," domain already exists",!
       SET domoref=##class(%iKnow.Domain).Open(dname)
       GOTO DeleteOldData }
  ELSE 
     { WRITE "The ",dname," domain does not exist",!
       SET domoref=##class(%iKnow.Domain).%New(dname)
       DO domoref.%Save()
       WRITE "Created the ",dname," domain with domain ID ",domoref.Id,!
       GOTO ListerAndLoader }
DeleteOldData
  SET stat=domoref.DropData()
  IF stat { WRITE "Deleted the data from the ",dname," domain",!!
            GOTO ListerAndLoader }
  ELSE    { WRITE "DropData error ",$System.Status.DisplayError(stat)
            QUIT}
ListerAndLoader
  SET domId=domoref.Id
  SET flister=##class(%iKnow.Source.SQL.Lister).%New(domId)
  SET myloader=##class(%iKnow.Source.Loader).%New(domId)
QueryBuild
   SET myquery="SELECT Top 25 ID AS UniqueVal,Type,NarrativeFull FROM Aviation.Event"
   SET idfld="UniqueVal"
   SET grpfld="Type"
   SET dataflds=$LB("NarrativeFull")
UseLister
  SET stat=flister.AddListToBatch(myquery,idfld,grpfld,dataflds)
      IF stat '= 1 {WRITE "The lister failed: ",$System.Status.DisplayError(stat) QUIT }
UseLoader
  SET stat=myloader.ProcessBatch()
      IF stat '= 1 {WRITE "The loader failed: ",$System.Status.DisplayError(stat) QUIT }
SourceSentenceQueries
  SET numSrcD=##class(%iKnow.Queries.SourceAPI).GetCountByDomain(domId)
  WRITE "The domain contains ",numSrcD," sources",!
  SET numSentD=##class(%iKnow.Queries.SentenceAPI).GetCountByDomain(domId)
  WRITE "These sources contain ",numSentD," sentences",!!
DominanceProfile
  DO ##class(%iKnow.Semantics.DominanceAPI).GetProfileByDomain(.profresult,domId,1,20,$$$SDCRC)
  SET j=1
  WHILE $DATA(profresult(j),list) {
      WRITE " ",$LISTTOSTRING($LISTGET(list,2))
      WRITE " [Id:",$LISTGET(list,1)
      WRITE " type:",$LISTGET(list,3)
      WRITE " Dominance:",$LISTGET(list,4),"]",!
      SET j=j+1 }
  WRITE !,"Printed ",j-1," dominant profile CRCs"
 
Dominance Score for a Specified Entity
iKnow uses the GetDomainValue() method to return the dominance value for a specified entity. You specify the entity by its entity Id (a unique integer), and specify the entity type by a numeric code. The default entity type is 0 (concept).
A single set of unique entity Ids is used for concepts and relations; therefore, no concept has the same entity Id as a relation. A separate set of unique entity Ids is used for CRCs; therefore, a CRC may have the same entity Id as a concept or a relation. This numbering is entirely coincidental; there is no connection between the entities.
The following example takes the top 12 entities and determines the dominance score for each entity. As one can see from the results of this example, the top (most frequently occurring) entities do not necessarily correspond to the entities with the highest dominance scores:
#Include %IKPublic
  ZNSPACE "Samples"
DomainCreateOrOpen
  SET dname="mydomain"
  IF (##class(%iKnow.Domain).Exists(dname))
     { WRITE "The ",dname," domain already exists",!
       SET domoref=##class(%iKnow.Domain).Open(dname)
       GOTO DeleteOldData }
  ELSE 
     { WRITE "The ",dname," domain does not exist",!
       SET domoref=##class(%iKnow.Domain).%New(dname)
       DO domoref.%Save()
       WRITE "Created the ",dname," domain with domain ID ",domoref.Id,!
       GOTO ListerAndLoader }
DeleteOldData
  SET stat=domoref.DropData()
  IF stat { WRITE "Deleted the data from the ",dname," domain",!!
            GOTO ListerAndLoader }
  ELSE    { WRITE "DropData error ",$System.Status.DisplayError(stat)
            QUIT}
ListerAndLoader
  SET domId=domoref.Id
  SET flister=##class(%iKnow.Source.SQL.Lister).%New(domId)
  SET myloader=##class(%iKnow.Source.Loader).%New(domId)
QueryBuild
   SET myquery="SELECT Top 25 ID AS UniqueVal,Type,NarrativeFull FROM Aviation.Event"
   SET idfld="UniqueVal"
   SET grpfld="Type"
   SET dataflds=$LB("NarrativeFull")
UseLister
  SET stat=flister.AddListToBatch(myquery,idfld,grpfld,dataflds)
      IF stat '= 1 {WRITE "The lister failed: ",$System.Status.DisplayError(stat) QUIT }
UseLoader
  SET stat=myloader.ProcessBatch()
      IF stat '= 1 {WRITE "The loader failed: ",$System.Status.DisplayError(stat) QUIT }
TopEntitiesDominanceScores
  DO ##class(%iKnow.Queries.EntityAPI).GetTop(.result,domId,1,12)
  SET i=1
  WHILE $DATA(result(i)) {
       SET topstr=$LISTTOSTRING(result(i),",",1)
       SET topid=$PIECE(topstr,",",1)
       SET val=$PIECE(topstr,",",2)
         SET spc=25-$LENGTH(val)
       WRITE val
       WRITE $JUSTIFY("top=",spc),i
       WRITE " dominance="
       WRITE ##class(%iKnow.Semantics.DominanceAPI).GetDomainValue(domId,topid,0),!
       SET i=i+1 }
  WRITE "Top ",i-1," entities and their dominance scores"
 
Overlap Between Sources
The following example uses the GetOverlap() method to return the overlap score for each entity. This score is the count of overlapping occurrences in all of sources in the domain for that entity. Note that before you can invoke GetOverlap(), you must first invoke BuildOverlap(). When displaying the results, type=0 (concepts) for all of the returned values in this example and is therefore omitted from the display; the $JUSTIFY function is used in this example to align display of the overlap counts:
#Include %IKPublic
  ZNSPACE "Samples"
DomainCreateOrOpen
  SET dname="mydomain"
  IF (##class(%iKnow.Domain).Exists(dname))
     { WRITE "The ",dname," domain already exists",!
       SET domoref=##class(%iKnow.Domain).Open(dname)
       GOTO DeleteOldData }
  ELSE 
     { WRITE "The ",dname," domain does not exist",!
       SET domoref=##class(%iKnow.Domain).%New(dname)
       DO domoref.%Save()
       WRITE "Created the ",dname," domain with domain ID ",domoref.Id,!
       GOTO ListerAndLoader }
DeleteOldData
  SET stat=domoref.DropData()
  IF stat { WRITE "Deleted the data from the ",dname," domain",!!
            GOTO ListerAndLoader }
  ELSE    { WRITE "DropData error ",$System.Status.DisplayError(stat)
            QUIT}
ListerAndLoader
  SET domId=domoref.Id
  SET flister=##class(%iKnow.Source.SQL.Lister).%New(domId)
  SET myloader=##class(%iKnow.Source.Loader).%New(domId)
QueryBuild
   SET myquery="SELECT Top 25 ID AS UniqueVal,Type,NarrativeFull FROM Aviation.Event"
   SET idfld="UniqueVal"
   SET grpfld="Type"
   SET dataflds=$LB("NarrativeFull")
UseLister
  SET stat=flister.AddListToBatch(myquery,idfld,grpfld,dataflds)
      IF stat '= 1 {WRITE "The lister failed: ",$System.Status.DisplayError(stat) QUIT }
UseLoader
  SET stat=myloader.ProcessBatch()
      IF stat '= 1 {WRITE "The loader failed: ",$System.Status.DisplayError(stat) QUIT }
SourceSentenceQueries
  SET numSrcD=##class(%iKnow.Queries.SourceAPI).GetCountByDomain(domId)
  WRITE "The domain contains ",numSrcD," sources",!
  SET numSentD=##class(%iKnow.Queries.SentenceAPI).GetCountByDomain(domId)
  WRITE "These sources contain ",numSentD," sentences",!!
Overlap
  DO ##class(%iKnow.Semantics.DominanceAPI).BuildOverlap(domId)
  DO ##class(%iKnow.Semantics.DominanceAPI).GetOverlap(.result,domId,1,17)
DisplayOverlapResults
  SET i=1
  WHILE $DATA(result(i)) {
       SET datastr=$LISTTOSTRING(result(i),",",1)
          SET spc=40-$LENGTH(datastr)
       WRITE $PIECE(datastr,",",1)," "
       WRITE $LISTTOSTRING($PIECE(datastr,",",2))
       /* WRITE $PIECE(datastr,",",3)," " */
          WRITE $JUSTIFY(" overlap=",spc)
       WRITE $PIECE(datastr,",",4),!
       SET i=i+1 }
 
Typical Sources
The following example lists the ten most typical sources with their dominance scores. In order to retrieve typical sources you must issue the GetOverlap() method, followed by the FindMostTypicalSources() and GetTypicalSources() methods.
#Include %IKPublic
  ZNSPACE "Samples"
DomainCreateOrOpen
  SET dname="mydomain"
  IF (##class(%iKnow.Domain).Exists(dname))
     { WRITE "The ",dname," domain already exists",!
       SET domoref=##class(%iKnow.Domain).Open(dname)
       GOTO DeleteOldData }
  ELSE 
     { WRITE "The ",dname," domain does not exist",!
       SET domoref=##class(%iKnow.Domain).%New(dname)
       DO domoref.%Save()
       WRITE "Created the ",dname," domain with domain ID ",domoref.Id,!
       GOTO ListerAndLoader }
DeleteOldData
  SET stat=domoref.DropData()
  IF stat { WRITE "Deleted the data from the ",dname," domain",!!
            GOTO ListerAndLoader }
  ELSE    { WRITE "DropData error ",$System.Status.DisplayError(stat)
            QUIT}
ListerAndLoader
  SET domId=domoref.Id
  SET flister=##class(%iKnow.Source.SQL.Lister).%New(domId)
  SET myloader=##class(%iKnow.Source.Loader).%New(domId)
QueryBuild
   SET myquery="SELECT Top 25 ID AS UniqueVal,Type,NarrativeFull FROM Aviation.Event"
   SET idfld="UniqueVal"
   SET grpfld="Type"
   SET dataflds=$LB("NarrativeFull")
UseLister
  SET stat=flister.AddListToBatch(myquery,idfld,grpfld,dataflds)
      IF stat '= 1 {WRITE "The lister failed: ",$System.Status.DisplayError(stat) QUIT }
UseLoader
  SET stat=myloader.ProcessBatch()
      IF stat '= 1 {WRITE "The loader failed: ",$System.Status.DisplayError(stat) QUIT }
SourceCountQueries
  WRITE ##class(%iKnow.Queries.SourceAPI).GetCountByDomain(domId)," total sources",!!
DominanceTypicalSources
  SET dstat = ##class(%iKnow.Semantics.DominanceAPI).BuildOverlap(domId)
  IF dstat '=1 {WRITE "BuildOverlap() error"  QUIT}
  SET dstat = ##class(%iKnow.Semantics.DominanceAPI).FindMostTypicalSources(domId)
  IF dstat '=1 {WRITE "FindMostTypicalSources() error"  QUIT}
  SET dstat = ##class(%iKnow.Semantics.DominanceAPI).GetTypicalSources(.srcresult,domId,1,10)
  IF dstat '=1 {WRITE "GetTypicalSources() error"  QUIT}
  SET k=1
  WHILE $DATA(srcresult(k)) {
        WRITE $LISTTOSTRING(srcresult(k)),!
        SET k=k+1 }
 
Breaking Sources
Breaking sources are those sources that differ significantly from the other sources in the domain (or filtered subset of the domain). In order to retrieve breaking sources you must issue the GetOverlap() method, followed by the FindBreakingSources() and GetBreakingSources() methods.
In the following program, after identifying each breaking source, the program uses GetBySource() to return the dominant entities for each breaking source. For each entity it returns:
#Include %IKPublic
  ZNSPACE "Samples"
DomainCreateOrOpen
  SET dname="mydomain"
  IF (##class(%iKnow.Domain).Exists(dname))
     { WRITE "The ",dname," domain already exists",!
       SET domoref=##class(%iKnow.Domain).Open(dname)
       GOTO DeleteOldData }
  ELSE 
     { WRITE "The ",dname," domain does not exist",!
       SET domoref=##class(%iKnow.Domain).%New(dname)
       DO domoref.%Save()
       WRITE "Created the ",dname," domain with domain ID ",domoref.Id,!
       GOTO ListerAndLoader }
DeleteOldData
  SET stat=domoref.DropData()
  IF stat { WRITE "Deleted the data from the ",dname," domain",!!
            GOTO ListerAndLoader }
  ELSE    { WRITE "DropData error ",$System.Status.DisplayError(stat)
            QUIT}
ListerAndLoader
  SET domId=domoref.Id
  SET flister=##class(%iKnow.Source.SQL.Lister).%New(domId)
  SET myloader=##class(%iKnow.Source.Loader).%New(domId)
QueryBuild
   SET myquery="SELECT Top 25 ID AS UniqueVal,Type,NarrativeFull FROM Aviation.Event"
   SET idfld="UniqueVal"
   SET grpfld="Type"
   SET dataflds=$LB("NarrativeFull")
UseLister
  SET stat=flister.AddListToBatch(myquery,idfld,grpfld,dataflds)
      IF stat '= 1 {WRITE "The lister failed: ",$System.Status.DisplayError(stat) QUIT }
UseLoader
  SET stat=myloader.ProcessBatch()
      IF stat '= 1 {WRITE "The loader failed: ",$System.Status.DisplayError(stat) QUIT }
SourceCountQueries
  WRITE ##class(%iKnow.Queries.SourceAPI).GetCountByDomain(domId)," total sources",!!
DominanceBreakingSources
  SET dstat = ##class(%iKnow.Semantics.DominanceAPI).BuildOverlap(domId)
  IF dstat '=1 {WRITE "BuildOverlap() error"  QUIT}
  SET dstat = ##class(%iKnow.Semantics.DominanceAPI).FindBreakingSources(domId)
  IF dstat '=1 {WRITE "FindBreakingSources() error"  QUIT}
  DO ##class(%iKnow.Semantics.DominanceAPI).GetBreakingSources(.profresult,domId,1,10)
  SET j=1,k=1
  WHILE $DATA(profresult(j),srclist) {
    SET src = $LISTGET(srclist)
    WRITE !,"Source id: ",src,!
    DO ##class(%iKnow.Semantics.DominanceAPI).GetBySource(.srcresult,domId,src,1,10)
    WHILE $DATA(srcresult(k),list) {
          WRITE "id:",$LISTGET(list)
          /* WRITE "  values:",$LISTTOSTRING($LISTGET(list,2)) */
          WRITE "  type:",$LISTGET(list,3),"  dominance:",$LISTGET(list,4),!
          SET k=k+1 }
    SET k=1
    SET j=j+1 }
  WRITE !!,"Printed ",j-1," breaking sources"
 
Sources By Correlation
GetSourcesByCorrelation() returns those sources that correlate to a list of entities, calculating the a percentage of correlation between the source’s entities and the list entities. A correlation percentage of 1 (100%) means that there are as many correlations in the source as there are listed entities; it does not mean that every listed entity appears in that source. For this reason, it is possible to return a correlation percentage greater that 1. Sources with 0% correlation are not listed.
In order to retrieve sources by correlation you must issue the GetOverlap() method, followed by the GetSourcesByCorrelation() method. You can supply a filter to GetOverlap() to limit the set of sources to be tested for correlation.
The following example tests sources by comparing them to a list of concepts that refer to helicopters. It returns the source ID, the external Id, and the percentage of correlation:
#Include %IKPublic
  ZNSPACE "Samples"
DomainCreateOrOpen
  SET dname="mydomain"
  IF (##class(%iKnow.Domain).Exists(dname))
     { WRITE "The ",dname," domain already exists",!
       SET domoref=##class(%iKnow.Domain).Open(dname)
       GOTO DeleteOldData }
  ELSE 
     { WRITE "The ",dname," domain does not exist",!
       SET domoref=##class(%iKnow.Domain).%New(dname)
       DO domoref.%Save()
       WRITE "Created the ",dname," domain with domain ID ",domoref.Id,!
       GOTO ListerAndLoader }
DeleteOldData
  SET stat=domoref.DropData()
  IF stat { WRITE "Deleted the data from the ",dname," domain",!!
            GOTO ListerAndLoader }
  ELSE    { WRITE "DropData error ",$System.Status.DisplayError(stat)
            QUIT}
ListerAndLoader
  SET domId=domoref.Id
  SET flister=##class(%iKnow.Source.SQL.Lister).%New(domId)
  SET myloader=##class(%iKnow.Source.Loader).%New(domId)
QueryBuild
   SET myquery="SELECT Top 25 ID AS UniqueVal,Type,NarrativeFull FROM Aviation.Event"
   SET idfld="UniqueVal"
   SET grpfld="Type"
   SET dataflds=$LB("NarrativeFull")
UseLister
  SET stat=flister.AddListToBatch(myquery,idfld,grpfld,dataflds)
      IF stat '= 1 {WRITE "The lister failed: ",$System.Status.DisplayError(stat) QUIT }
UseLoader
  SET stat=myloader.ProcessBatch()
      IF stat '= 1 {WRITE "The loader failed: ",$System.Status.DisplayError(stat) QUIT }
SourceSentenceQueries
  SET numSrcD=##class(%iKnow.Queries.SourceAPI).GetCountByDomain(domId)
  WRITE "The domain contains ",numSrcD," sources",!
  SET numSentD=##class(%iKnow.Queries.SentenceAPI).GetCountByDomain(domId)
  WRITE "These sources contain ",numSentD," sentences",!!
SourcesByCorrelation
  DO ##class(%iKnow.Semantics.DominanceAPI).BuildOverlap(domId)
  SET heliterms=$LB(796,1048,1706,2484,2571,2637,2642,2653,3284,3987,4037,4038,4054,4957)
  DO ##class(%iKnow.Semantics.DominanceAPI).GetSourcesByCorrelation(.result,domId,heliterms)
DisplaySourcesByCorrelation
  SET i=1
  WHILE $DATA(result(i)) {
       WRITE $LISTTOSTRING(result(i),",",1),!
       SET i=i+1 }
 
Semantic Proximity
Semantic proximity is a calculation of the semantic “distance” between two entities within a sentence. The higher the proximity integer, the closer the entities.
As a demonstration of this semantic distance, given the sentence:
“The giraffe walked with long legs to the base of the tree, then stretched his long neck
 up to reach the lowest leaves.”
the proximity of the concept “giraffe” might be as follows: long legs=64, base=42, tree=32, long neck=25, lowest leaves=21.
Semantic proximity is calculated for each entity in each sentence, then these generated proximity scores are added together producing an overall proximity score for each entity for the entire set of source texts. For example, given the sentences:
“The giraffe walked with long legs to the base of the tree, then stretched his long neck
 up to reach the lowest leaves. Having eaten, the giraffe bent his long legs and stretched
 his long neck down to drink from the pool.”
the proximity of the concept “giraffe” might be as follows: long legs=128, long neck=67, base=42, tree=32, pool=32, lowest leaves=21.
Entity proximity is commutative; this means that the proximity of entity1 to entity2 is the same as the proximity of entity2 to entity1. iKnow does not calculate a semantic proximity of an entity to itself. For example, the sentence “The boy told a boy about another boy.” would not generate any proximity scores, but the sentence “The boy told a younger boy about another small boy.” generates the proximity scores younger boy=64, small boy=42. If the same entity appears multiple times in a sentence, the proximity score is additive. For example, the proximity for the concept “girl” in the sentence “The girl told the boy about another boy.” is boy=106, the total the two proximity scores 64 and 42.
Japanese Semantic Proximity
iKnow semantic analysis of Japanese uses an algorithm to create Entity Vectors. An entity vector is an ordering of entities in the sentence that follow a predefined logical sequence. When iKnow converts a Japanese sentence into an entity vector it commonly rearranges the order of entities. Semantic proximity for Japanese uses the entity vector entity order, not the original sentence entity order.
Proximity Examples
The following example uses the GetProfileForEntity() method to return the proximity of the concept “student pilot” to other concepts in sentences in all of the sources in the domain.
#Include %IKPublic
  ZNSPACE "Samples"
DomainCreateOrOpen
  SET dname="mydomain"
  IF (##class(%iKnow.Domain).Exists(dname))
     { WRITE "The ",dname," domain already exists",!
       SET domoref=##class(%iKnow.Domain).Open(dname)
       GOTO DeleteOldData }
  ELSE 
     { WRITE "The ",dname," domain does not exist",!
       SET domoref=##class(%iKnow.Domain).%New(dname)
       DO domoref.%Save()
       WRITE "Created the ",dname," domain with domain ID ",domoref.Id,!
       GOTO ListerAndLoader }
DeleteOldData
  SET stat=domoref.DropData()
  IF stat { WRITE "Deleted the data from the ",dname," domain",!!
            GOTO ListerAndLoader }
  ELSE    { WRITE "DropData error ",$System.Status.DisplayError(stat)
            QUIT}
ListerAndLoader
  SET domId=domoref.Id
  SET flister=##class(%iKnow.Source.SQL.Lister).%New(domId)
  SET myloader=##class(%iKnow.Source.Loader).%New(domId)
QueryBuild
   SET myquery="SELECT Top 25 ID AS UniqueVal,Type,NarrativeFull FROM Aviation.Event"
   SET idfld="UniqueVal"
   SET grpfld="Type"
   SET dataflds=$LB("NarrativeFull")
UseLister
  SET stat=flister.AddListToBatch(myquery,idfld,grpfld,dataflds)
      IF stat '= 1 {WRITE "The lister failed: ",$System.Status.DisplayError(stat) QUIT }
UseLoader
  SET stat=myloader.ProcessBatch()
      IF stat '= 1 {WRITE "The loader failed: ",$System.Status.DisplayError(stat) QUIT }
GetEntityID
    SET entId=##class(%iKnow.Queries.EntityAPI).GetId(domId,"student pilot")
ProximityForEntity
    DO ##class(%iKnow.Semantics.ProximityAPI).GetProfileForEntity(.eresult,domId,entId,1,20)
       SET k=1
       WHILE $DATA(eresult(k)) {
          SET item=$LISTTOSTRING(eresult(k))
          WRITE $PIECE(item,",",1)," ^ "
          WRITE $PIECE(item,",",2)," ^ "
          WRITE $PIECE(item,",",3),!
          SET k=k+1 }
    WRITE !,"all done"
 
The following example uses the GetClustersBySource() method to list the concepts with the greatest proximity for each source. Each concept is listed by entity Id, value, and proximity score.
#Include %IKPublic
  ZNSPACE "Samples"
DomainCreateOrOpen
  SET dname="mydomain"
  IF (##class(%iKnow.Domain).Exists(dname))
     { WRITE "The ",dname," domain already exists",!
       SET domoref=##class(%iKnow.Domain).Open(dname)
       GOTO DeleteOldData }
  ELSE 
     { WRITE "The ",dname," domain does not exist",!
       SET domoref=##class(%iKnow.Domain).%New(dname)
       DO domoref.%Save()
       WRITE "Created the ",dname," domain with domain ID ",domoref.Id,!
       GOTO ListerAndLoader }
DeleteOldData
  SET stat=domoref.DropData()
  IF stat { WRITE "Deleted the data from the ",dname," domain",!!
            GOTO ListerAndLoader }
  ELSE    { WRITE "DropData error ",$System.Status.DisplayError(stat)
            QUIT}
ListerAndLoader
  SET domId=domoref.Id
  SET flister=##class(%iKnow.Source.SQL.Lister).%New(domId)
  SET myloader=##class(%iKnow.Source.Loader).%New(domId)
QueryBuild
   SET myquery="SELECT Top 25 ID AS UniqueVal,Type,NarrativeFull FROM Aviation.Event"
   SET idfld="UniqueVal"
   SET grpfld="Type"
   SET dataflds=$LB("NarrativeFull")
UseLister
  SET stat=flister.AddListToBatch(myquery,idfld,grpfld,dataflds)
      IF stat '= 1 {WRITE "The lister failed: ",$System.Status.DisplayError(stat) QUIT }
UseLoader
  SET stat=myloader.ProcessBatch()
      IF stat '= 1 {WRITE "The loader failed: ",$System.Status.DisplayError(stat) QUIT }
SourceCountQuery
  WRITE ##class(%iKnow.Queries.SourceAPI).GetCountByDomain(domId)," total sources",!!
QueryBySource
  DO ##class(%iKnow.Queries.SourceAPI).GetByDomain(.result,domId,1,20)
  SET j=1,k=1
  WHILE $DATA(result(j),srclist) {
    SET src = $LISTGET(srclist)
    WRITE !,"Source id: ",src,!
    DO ##class(%iKnow.Semantics.ProximityAPI).GetClustersBySource(.srcresult,domId,src,1,20)
       WHILE $DATA(srcresult(k)) {
          SET item=$LISTTOSTRING(srcresult(k))
          WRITE $PIECE(item,",",1)," ^ "
          WRITE $PIECE(item,",",2)," ^ "
          WRITE $PIECE(item,",",3),!
          SET k=k+1 }
    SET k=1
    SET j=j+1 }
  WRITE !!,"Printed ",j-1," sources"