Using iKnow
iKnow Queries
[Back] [Next]
   
Server:docs1
Instance:LATEST
User:UnknownUser
 
-
Go to:
Search:    

The iKnow semantic analysis engine supplies a large number of query APIs which are used to return text entities and statistics about these text entities. For example, the %iKnow.Queries.CrcAPI.GetTop() method returns the most frequently occurring CRCs in a specified domain. The %iKnow.Queries.CrcAPI.GetCountBySource() returns the total number of unique CRCs that appear in the specified sources.

Types of Queries
There are three types of queries provided. They are distinguished by their name suffixes:
For each of these types, iKnow provides queries for:
Queries Described in this Chapter
This chapter describes and provides examples of many commonly-used iKnow queries:
Note that the query examples in this chapter use the default Configuration. Queries that you write may require a specified Configuration to establish the language environment.
Query Method Parameters
The following parameters are common to many query methods:
Counting Sources and Sentences
To count the number of sources loaded, you can use the GetCountByDomain() method of the %iKnow.Queries.SourceAPI class.
To count the sentences in all of the sources loaded, you can use the GetCountByDomain() method of the %iKnow.Queries.SentenceAPI class. To count the sentences in a single source, you can use the GetCountBySource() method.
The following example uses data loaded from .txt files to demonstrate these sentence count methods. The default Configuration is used:
DomainCreateOrOpen
  SET dname="mydomain"
  IF (##class(%iKnow.Domain).Exists(dname))
      { SET domoref=##class(%iKnow.Domain).Open(dname)
        GOTO DeleteOldData }
  ELSE 
     { SET domoref=##class(%iKnow.Domain).%New(dname)
       DO domoref.%Save()
       GOTO ListerAndLoader }
DeleteOldData
  SET stat=domoref.DropData()
  IF stat { GOTO ListerAndLoader }
  ELSE    { WRITE "DropData error ",$System.Status.DisplayError(stat)
            QUIT}
ListerAndLoader
  SET domId=domoref.Id
  SET mylister=##class(%iKnow.Source.File.Lister).%New(domId)
  SET myloader=##class(%iKnow.Source.Loader).%New(domId)
  SET stat=myloader.SetLister(mylister)
  SET install=$SYSTEM.Util.InstallDirectory()
  SET dirpath=install_"mgr\Temp\iknow\mytextfiles"
  SET stat=myloader.ProcessList(dirpath,$LB("txt"),0,"")
  IF stat '= 1 { WRITE "Loader error ",$System.Status.DisplayError(stat)
                     QUIT }
SourceSentenceQueries
  SET numSrcD=##class(%iKnow.Queries.SourceAPI).GetCountByDomain(domId)
  WRITE "The domain contains ",numSrcD," sources",!
  SET numSentD=##class(%iKnow.Queries.SentenceAPI).GetCountByDomain(domId)
  WRITE "These sources contain ",numSentD," sentences",!!
  DO ##class(%iKnow.Queries.SourceAPI).GetByDomain(.result,domId,1,20)
  SET i=1
  WHILE $DATA(result(i)) {
     SET extId = $LISTGET(result(i),2)
     SET fullref = $PIECE(extId,":",3,4)
     SET fname = $PIECE(fullref,"\",$LENGTH(extId,"\"))
     SET numSentS = ##class(%iKnow.Queries.SentenceAPI).GetCountBySource(domId,result(i))
     WRITE fname," has ",numSentS," sentences",!
     SET i=i+1 }
The following example uses data loaded from a field of the Aviation.Event SQL table to demonstrate these sentence count methods. In this example only a sample of 10 data records (TOP 10) are loaded:
  ZNSPACE "Samples"
DomainCreateOrOpen
  SET dname="mydomain"
  IF (##class(%iKnow.Domain).Exists(dname))
     { WRITE "The ",dname," domain already exists",!
       SET domoref=##class(%iKnow.Domain).Open(dname)
       GOTO DeleteOldData }
  ELSE 
     { WRITE "The ",dname," domain does not exist",!
       SET domoref=##class(%iKnow.Domain).%New(dname)
       DO domoref.%Save()
       WRITE "Created the ",dname," domain with domain ID ",domoref.Id,!
       GOTO ListerAndLoader }
DeleteOldData
  SET stat=domoref.DropData()
  IF stat { WRITE "Deleted the data from the ",dname," domain",!!
            GOTO ListerAndLoader }
  ELSE    { WRITE "DropData error ",$System.Status.DisplayError(stat)
            QUIT}
ListerAndLoader
  SET domId=domoref.Id
  SET flister=##class(%iKnow.Source.SQL.Lister).%New(domId)
  SET myloader=##class(%iKnow.Source.Loader).%New(domId)
QueryBuild
   SET myquery="SELECT Top 10 ID AS UniqueVal,Type,NarrativeFull FROM Aviation.Event"
   SET idfld="UniqueVal"
   SET grpfld="Type"
   SET dataflds=$LB("NarrativeFull")
UseLister
  SET stat=flister.AddListToBatch(myquery,idfld,grpfld,dataflds)
      IF stat '= 1 {WRITE "The lister failed: ",$System.Status.DisplayError(stat) QUIT }
UseLoader
  SET stat=myloader.ProcessBatch()
      IF stat '= 1 {WRITE "The loader failed: ",$System.Status.DisplayError(stat) QUIT }
SourceSentenceQueries
  SET numSrcD=##class(%iKnow.Queries.SourceQAPI).GetCountByDomain(domId)
  WRITE "The domain contains ",numSrcD," sources",!
  SET numSentD=##class(%iKnow.Queries.SentenceQAPI).GetCountByDomain(domId)
  WRITE "These sources contain ",numSentD," sentences",!!
  DO ##class(%iKnow.Queries.SourceAPI).GetByDomain(.result,domId,1,20)
  SET i=1
  WHILE $DATA(result(i)) {
     SET extId = $LISTGET(result(i),2)
     SET fullref = $PIECE(extId,":",3,4)
     SET fname = $PIECE(fullref,"\",$LENGTH(extId,"\"))
     SET numSentS = ##class(%iKnow.Queries.SentenceAPI).GetCountBySource(domId,result(i))
     WRITE fname," has ",numSentS," sentences",!
     SET i=i+1 }
 
For details on what iKnow considers a sentence, refer to the Logical Text Units Identified by iKnow section of the “Conceptual Overview” chapter.
Counting Entities
To count the number of sources that contain one or more occurrences of a specified entity, you can use the GetCountByEntities() method of the %iKnow.Queries.SourceAPI class. In this method you can specify a list on one or more entities to search for in the loaded sources.
Note that here, and throughout iKnow, the concept of “entity” differs significantly from the familiar notion of a search term. For example, the entity “dog” does not occur in the sentence “The quick brown fox jumped over the lazy dog.” The entity “lazy dog” does occur in this sentence. An entity can be a concept or a relation; you could, for example, count the number of sources that contain the entity “is” or the entity “jumped over”. However, in these examples and in most real-world cases, iKnow matches concepts or concepts associated by a relation.
The following example demonstrates these query count methods:
#Include %IKPublic
  ZNSPACE "Samples"
DomainCreateOrOpen
  SET dname="mydomain"
  IF (##class(%iKnow.Domain).Exists(dname))
     { WRITE "The ",dname," domain already exists",!
       SET domoref=##class(%iKnow.Domain).Open(dname)
       GOTO DeleteOldData }
  ELSE 
     { WRITE "The ",dname," domain does not exist",!
       SET domoref=##class(%iKnow.Domain).%New(dname)
       DO domoref.%Save()
       WRITE "Created the ",dname," domain with domain ID ",domoref.Id,!
       GOTO ListerAndLoader }
DeleteOldData
  SET stat=domoref.DropData()
  IF stat { WRITE "Deleted the data from the ",dname," domain",!!
            GOTO ListerAndLoader }
  ELSE    { WRITE "DropData error ",$System.Status.DisplayError(stat)
            QUIT}
ListerAndLoader
  SET domId=domoref.Id
  SET flister=##class(%iKnow.Source.SQL.Lister).%New(domId)
  SET myloader=##class(%iKnow.Source.Loader).%New(domId)
QueryBuild
   SET myquery="SELECT TOP 100 ID AS UniqueVal,Type,NarrativeFull FROM Aviation.Event"
   SET idfld="UniqueVal"
   SET grpfld="Type"
   SET dataflds=$LB("NarrativeFull")
UseLister
  SET stat=flister.AddListToBatch(myquery,idfld,grpfld,dataflds)
      IF stat '= 1 {WRITE "The lister failed: ",$System.Status.DisplayError(stat) QUIT }
UseLoader
  SET stat=myloader.ProcessBatch()
      IF stat '= 1 {WRITE "The loader failed: ",$System.Status.DisplayError(stat) QUIT }
SourceCountQuery
  SET numSrcD=##class(%iKnow.Queries.SourceQAPI).GetCountByDomain(domId)
  WRITE "The domain contains ",numSrcD," sources",!
SingleEntityCounts
  SET ent=$LB("NTSB","National Transportation Safety Board",
    "NTSB investigator-in-charge","NTSB oversight","NTSB's Materials Laboratory",
    "FAA","Federal Aviation Administration","FAA inspector")
  SET entcnt=$LISTLENGTH(ent)
  SET ptr=0
  FOR x=1:1:entcnt {
   SET stat=$LISTNEXT(ent,ptr,val)
   WRITE ##class(%iKnow.Queries.SourceAPI).GetCountByEntities(domId,val)," contain ",val,!
   }
   WRITE "end of listing"
 
Listing Top Entities
iKnow has three query methods you can use to return the “top” entities in the source documents of a domain:
All three of these methods return top Concepts by default, but can be used to return top Relations. All three of these methods can apply a filter to limit the scope of sources used.
The GetTop() method ignores entities of less than three characters. The GetTopTFIDF() and GetTopBM25() methods can return 1-character and 2-character entities.
GetTop(): Most-Frequently-Occurring Entities
An iKnow query can return the most frequently occurring entities in the source documents in descending order of frequency or spread. Each entity is returned as a separate record in Caché list format.
The entity record format is as follows:
The following query returns the most frequent (top) entities in the sources loaded by this program. By default these are Concept entities. It sets the page (1) and pagesize (50) parameters to specify how many entities to return. It returns (at most) the top 50 entities. It uses the domain default sorttype, which is in descending order by frequency:
#Include %IKPublic
  ZNSPACE "Samples"
DomainCreateOrOpen
  SET dname="mydomain"
  IF (##class(%iKnow.Domain).Exists(dname))
     { WRITE "The ",dname," domain already exists",!
       SET domoref=##class(%iKnow.Domain).Open(dname)
       GOTO DeleteOldData }
  ELSE 
     { WRITE "The ",dname," domain does not exist",!
       SET domoref=##class(%iKnow.Domain).%New(dname)
       DO domoref.%Save()
       WRITE "Created the ",dname," domain with domain ID ",domoref.Id,!
       GOTO ListerAndLoader }
DeleteOldData
  SET stat=domoref.DropData()
  IF stat { WRITE "Deleted the data from the ",dname," domain",!!
            GOTO ListerAndLoader }
  ELSE    { WRITE "DropData error ",$System.Status.DisplayError(stat)
            QUIT}
ListerAndLoader
  SET domId=domoref.Id
  SET flister=##class(%iKnow.Source.SQL.Lister).%New(domId)
  SET myloader=##class(%iKnow.Source.Loader).%New(domId)
QueryBuild
   SET myquery="SELECT TOP 100 ID AS UniqueVal,Type,NarrativeFull FROM Aviation.Event"
   SET idfld="UniqueVal"
   SET grpfld="Type"
   SET dataflds=$LB("NarrativeFull")
UseLister
  SET stat=flister.AddListToBatch(myquery,idfld,grpfld,dataflds)
      IF stat '= 1 {WRITE "The lister failed: ",$System.Status.DisplayError(stat) QUIT }
UseLoader
  SET stat=myloader.ProcessBatch()
      IF stat '= 1 {WRITE "The loader failed: ",$System.Status.DisplayError(stat) QUIT }
SourceCountQuery
  SET numSrcD=##class(%iKnow.Queries.SourceQAPI).GetCountByDomain(domId)
  WRITE "The domain contains ",numSrcD," sources",!
TopEntitiesQuery
  DO ##class(%iKnow.Queries.EntityAPI).GetTop(.result,domId,1,50)
  SET i=1
  WHILE $DATA(result(i)) {
       SET outstr = $LISTTOSTRING(result(i),",",1)
         SET entity = $PIECE(outstr,",",2)
         SET freq = $PIECE(outstr,",",3)
         SET spread = $PIECE(outstr,",",4)
       WRITE "[",entity,"] appears ",freq," times in ",spread," sources",!
       SET i=i+1 }
  WRITE "Printed the top ",i-1," entities"
 
The following GetTop() method returns the top entities by spread:
  DO ##class(%iKnow.Queries.EntityAPI).GetTop(.result,domId,1,50,,,$$$SORTBYSPREAD)
GetTopTFIDF() and GetTopBM25()
These two methods return a list of top entities in descending order by a calculated score. By default these are Concept entities. Because they are using different algorithms to assign a score to an entity, the list of “top” entities may differ significantly. For example, the following table shows the relative order of four entities in the Aviation.Event database when analyzed using different methods:
  “airplane” “helicopter” “flight instructor” “student pilot”
GetTop() 1st 12th 17th 43rd
GetTopTFIDF() (not in listing) 1st 4th 22nd
GetTopBM25() (not in listing) 3rd 2nd 1st
The top 5 entities in the Aviation.Event database returned by GetTop() are: “airplane”, “pilot”, “engine”, “flight”, and “accident”. All of these entities occur at least once in more than half of the sources. While these are frequently-occurring entities, they are of little value in determining the contents of specific sources. An entity that occurs in more than half of the sources is given a negative IDF value. For this reason, none of these entities appear in the GetTopTFIDF() and GetTopBM25() listings.
The following example list the top 50 entities using GetTopTFIDF():
#Include %IKPublic
  ZNSPACE "Samples"
DomainCreateOrOpen
  SET dname="mydomain"
  IF (##class(%iKnow.Domain).Exists(dname))
     { WRITE "The ",dname," domain already exists",!
       SET domoref=##class(%iKnow.Domain).Open(dname)
       GOTO DeleteOldData }
  ELSE 
     { WRITE "The ",dname," domain does not exist",!
       SET domoref=##class(%iKnow.Domain).%New(dname)
       DO domoref.%Save()
       WRITE "Created the ",dname," domain with domain ID ",domoref.Id,!
       GOTO ListerAndLoader }
DeleteOldData
  SET stat=domoref.DropData()
  IF stat { WRITE "Deleted the data from the ",dname," domain",!!
            GOTO ListerAndLoader }
  ELSE    { WRITE "DropData error ",$System.Status.DisplayError(stat)
            QUIT}
ListerAndLoader
  SET domId=domoref.Id
  SET flister=##class(%iKnow.Source.SQL.Lister).%New(domId)
  SET myloader=##class(%iKnow.Source.Loader).%New(domId)
QueryBuild
   SET myquery="SELECT TOP 100 ID AS UniqueVal,Type,NarrativeFull FROM Aviation.Event"
   SET idfld="UniqueVal"
   SET grpfld="Type"
   SET dataflds=$LB("NarrativeFull")
UseLister
  SET stat=flister.AddListToBatch(myquery,idfld,grpfld,dataflds)
      IF stat '= 1 {WRITE "The lister failed: ",$System.Status.DisplayError(stat) QUIT }
UseLoader
  SET stat=myloader.ProcessBatch()
      IF stat '= 1 {WRITE "The loader failed: ",$System.Status.DisplayError(stat) QUIT }
SourceCountQuery
  SET numSrcD=##class(%iKnow.Queries.SourceQAPI).GetCountByDomain(domId)
  WRITE "The domain contains ",numSrcD," sources",!
TopEntitiesQuery
  DO ##class(%iKnow.Queries.EntityAPI).GetTopTFIDF(.result,domId,1,50)
  SET i=1
  WHILE $DATA(result(i)) {
       SET outstr = $LISTTOSTRING(result(i),",",1)
         SET entity = $PIECE(outstr,",",2)
         SET score = $PIECE(outstr,",",3)
       WRITE "[",entity,"] has a TFIDF score of ",score,!
       SET i=i+1 }
  WRITE "Printed the top ",i-1," entities"
 
The following example list the top 50 entities using GetTopBM25():
#Include %IKPublic
  ZNSPACE "Samples"
DomainCreateOrOpen
  SET dname="mydomain"
  IF (##class(%iKnow.Domain).Exists(dname))
     { WRITE "The ",dname," domain already exists",!
       SET domoref=##class(%iKnow.Domain).Open(dname)
       GOTO DeleteOldData }
  ELSE 
     { WRITE "The ",dname," domain does not exist",!
       SET domoref=##class(%iKnow.Domain).%New(dname)
       DO domoref.%Save()
       WRITE "Created the ",dname," domain with domain ID ",domoref.Id,!
       GOTO ListerAndLoader }
DeleteOldData
  SET stat=domoref.DropData()
  IF stat { WRITE "Deleted the data from the ",dname," domain",!!
            GOTO ListerAndLoader }
  ELSE    { WRITE "DropData error ",$System.Status.DisplayError(stat)
            QUIT}
ListerAndLoader
  SET domId=domoref.Id
  SET flister=##class(%iKnow.Source.SQL.Lister).%New(domId)
  SET myloader=##class(%iKnow.Source.Loader).%New(domId)
QueryBuild
   SET myquery="SELECT TOP 100 ID AS UniqueVal,Type,NarrativeFull FROM Aviation.Event"
   SET idfld="UniqueVal"
   SET grpfld="Type"
   SET dataflds=$LB("NarrativeFull")
UseLister
  SET stat=flister.AddListToBatch(myquery,idfld,grpfld,dataflds)
      IF stat '= 1 {WRITE "The lister failed: ",$System.Status.DisplayError(stat) QUIT }
UseLoader
  SET stat=myloader.ProcessBatch()
      IF stat '= 1 {WRITE "The loader failed: ",$System.Status.DisplayError(stat) QUIT }
SourceCountQuery
  SET numSrcD=##class(%iKnow.Queries.SourceQAPI).GetCountByDomain(domId)
  WRITE "The domain contains ",numSrcD," sources",!
TopEntitiesQuery
  DO ##class(%iKnow.Queries.EntityAPI).GetTopBM25(.result,domId,1,50)
  SET i=1
  WHILE $DATA(result(i)) {
       SET outstr = $LISTTOSTRING(result(i),",",1)
         SET entity = $PIECE(outstr,",",2)
         SET score = $PIECE(outstr,",",3)
       WRITE "[",entity,"] has a BM25 score of ",score,!
       SET i=i+1 }
  WRITE "Printed the top ",i-1," entities"
 
CRC Queries
An iKnow query that returns a CRC (Concept-Relation-Concept sequence) returns it in the following format:
Listing CRCs that Contain Entities
One common use of CRCs is to specify an entity (usually a Concept) and return the CRCs that contain that entity. This provides the various contexts in which an entity appears in a source (or sources). Because iKnow normalizes all text to lowercase letters, you must specify these matching entities in lowercase.
The following query returns all of the CRCs that contain the specified Concepts (“left wing”, "right wing", "wings", "leading edge", and "trailing edge") as either the master concept or the slave concept of a CRC. Note that the GetByEntities() method page argument has been set to 25 to return more CRCs; it defaults to 10.
#Include %IKPublic
  ZNSPACE "Samples"
DomainCreateOrOpen
  SET dname="mydomain"
  IF (##class(%iKnow.Domain).Exists(dname))
     { WRITE "The ",dname," domain already exists",!
       SET domoref=##class(%iKnow.Domain).Open(dname)
       GOTO DeleteOldData }
  ELSE 
     { WRITE "The ",dname," domain does not exist",!
       SET domoref=##class(%iKnow.Domain).%New(dname)
       DO domoref.%Save()
       WRITE "Created the ",dname," domain with domain ID ",domoref.Id,!
       GOTO ListerAndLoader }
DeleteOldData
  SET stat=domoref.DropData()
  IF stat { WRITE "Deleted the data from the ",dname," domain",!!
            GOTO ListerAndLoader }
  ELSE    { WRITE "DropData error ",$System.Status.DisplayError(stat)
            QUIT}
ListerAndLoader
  SET domId=domoref.Id
  SET flister=##class(%iKnow.Source.SQL.Lister).%New(domId)
  SET myloader=##class(%iKnow.Source.Loader).%New(domId)
QueryBuild
   SET myquery="SELECT TOP 100 ID AS UniqueVal,Type,NarrativeFull FROM Aviation.Event"
   SET idfld="UniqueVal"
   SET grpfld="Type"
   SET dataflds=$LB("NarrativeFull")
UseLister
  SET stat=flister.AddListToBatch(myquery,idfld,grpfld,dataflds)
      IF stat '= 1 {WRITE "The lister failed: ",$System.Status.DisplayError(stat) QUIT }
UseLoader
  SET stat=myloader.ProcessBatch()
      IF stat '= 1 {WRITE "The loader failed: ",$System.Status.DisplayError(stat) QUIT }
CRCQuery
  SET myconcepts=$LB("left wing","right wing","wings","leading edge","trailing edge")
  DO ##class(%iKnow.Queries.CrcAPI).GetByEntities(.result,domId,myconcepts,1,25)
  SET i=1
  WHILE $DATA(result(i)) {
     SET mycrcs=$LISTTOSTRING(result(i),",",1)
     WRITE "[",$PIECE(mycrcs,",",2,4),"]"
     WRITE "  appears ",$PIECE(mycrcs,",",5)," times in "
     WRITE $PIECE(mycrcs,",",6)," sources",!
     SET i=i+1 }
  WRITE !,"End of listing"
 
Counting Sources that Contain a CRC
The following program example returns the count of sources that contain the specified CRCs. To specify CRCs to the GetCountByCrcs() method, you must specify each CRC as a %List (using $LB), and then group these CRCs together as a %List. This is shown in the following example:
#Include %IKPublic
  ZNSPACE "Samples"
DomainCreateOrOpen
  SET dname="mydomain"
  IF (##class(%iKnow.Domain).Exists(dname))
     { WRITE "The ",dname," domain already exists",!
       SET domoref=##class(%iKnow.Domain).Open(dname)
       GOTO DeleteOldData }
  ELSE 
     { WRITE "The ",dname," domain does not exist",!
       SET domoref=##class(%iKnow.Domain).%New(dname)
       DO domoref.%Save()
       WRITE "Created the ",dname," domain with domain ID ",domoref.Id,!
       GOTO ListerAndLoader }
DeleteOldData
  SET stat=domoref.DropData()
  IF stat { WRITE "Deleted the data from the ",dname," domain",!!
            GOTO ListerAndLoader }
  ELSE    { WRITE "DropData error ",$System.Status.DisplayError(stat)
            QUIT}
ListerAndLoader
  SET domId=domoref.Id
  SET flister=##class(%iKnow.Source.SQL.Lister).%New(domId)
  SET myloader=##class(%iKnow.Source.Loader).%New(domId)
QueryBuild
   SET myquery="SELECT TOP 100 ID AS UniqueVal,Type,NarrativeFull FROM Aviation.Event"
   SET idfld="UniqueVal"
   SET grpfld="Type"
   SET dataflds=$LB("NarrativeFull")
UseLister
  SET stat=flister.AddListToBatch(myquery,idfld,grpfld,dataflds)
      IF stat '= 1 {WRITE "The lister failed: ",$System.Status.DisplayError(stat) QUIT }
UseLoader
  SET stat=myloader.ProcessBatch()
      IF stat '= 1 {WRITE "The loader failed: ",$System.Status.DisplayError(stat) QUIT }
CRCCount
  SET numSrcD=##class(%iKnow.Queries.SourceQAPI).GetCountByDomain(domId)
  SET mycrcs=$LB($LB("leading edge","of","wing"),$LB("leading edge","of","right wing"),
             $LB("leading edge","of","left wing"),$LB("leading edges","of","wings"),
             $LB("leading edges","of","both wings"))
  SET numSrc=##class(%iKnow.Queries.SourceAPI).GetCountByCrcs(domId,mycrcs)
  WRITE "From ",numSrcD," indexed sources there are ",!
  WRITE numSrc," sources containing one or more of the following CRCs:",!
  FOR i=1:1:$LISTLENGTH(mycrcs) {
      WRITE $LISTTOSTRING($LIST(mycrcs,i)," "),!
  }
 
The GetCountByCrcs() method returns the count of sources that contain any of the specified CRCs.
Listing Sources or Sentences that Fulfill a CRC Mask
You can use a CRC mask to specify an entity value for a specific CRC position. Each CRC has three positions: master, relation, and slave. With a CRC mask you can specify either an entity value or a wildcard for each position. A CRC mask enables you to list sources or sentences that contain CRCs that match one or more positional values. Because it specifies both position and entity value, the GetByCrcMask() partial CRC match is a more restrictive match than GetByEntities(), but a less restrictive match than GetByCrcs().
The following example uses a CRC mask that matches the entity “student pilot” in master position, while using wildcards to permit any value in the CRC relation and slave positions. The GetByCrcMask() method matches this mask against every sentence in each source, and returns the sentence Id and the sentence text of those sentences that contain a CRC with “student pilot” in the master position.
#Include %IKPublic
  ZNSPACE "Samples"
DomainCreateOrOpen
  SET dname="mydomain"
  IF (##class(%iKnow.Domain).Exists(dname))
     { WRITE "The ",dname," domain already exists",!
       SET domoref=##class(%iKnow.Domain).Open(dname)
       GOTO DeleteOldData }
  ELSE 
     { WRITE "The ",dname," domain does not exist",!
       SET domoref=##class(%iKnow.Domain).%New(dname)
       DO domoref.%Save()
       WRITE "Created the ",dname," domain with domain ID ",domoref.Id,!
       GOTO ListerAndLoader }
DeleteOldData
  SET stat=domoref.DropData()
  IF stat { WRITE "Deleted the data from the ",dname," domain",!!
            GOTO ListerAndLoader }
  ELSE    { WRITE "DropData error ",$System.Status.DisplayError(stat)
            QUIT}
ListerAndLoader
  SET domId=domoref.Id
  SET flister=##class(%iKnow.Source.SQL.Lister).%New(domId)
  SET myloader=##class(%iKnow.Source.Loader).%New(domId)
QueryBuild
   SET myquery="SELECT TOP 100 ID AS UniqueVal,Type,NarrativeFull FROM Aviation.Event"
   SET idfld="UniqueVal"
   SET grpfld="Type"
   SET dataflds=$LB("NarrativeFull")
UseLister
  SET stat=flister.AddListToBatch(myquery,idfld,grpfld,dataflds)
      IF stat '= 1 {WRITE "The lister failed: ",$System.Status.DisplayError(stat) QUIT }
UseLoader
  SET stat=myloader.ProcessBatch()
      IF stat '= 1 {WRITE "The loader failed: ",$System.Status.DisplayError(stat) QUIT }
CRCMaskSentencesBySource
  DO ##class(%iKnow.Queries.SourceAPI).GetByDomain(.result,domId,1,100)
  SET i=1
  WHILE $DATA(result(i)) {
     SET srcId = $LISTGET(result(i),1)
     SET extId = $LISTGET(result(i),2)
     SET srcname = $PIECE($PIECE(extId,":",3,4),"\",$LENGTH(extId,"\"))
     SET numSentS = ##class(%iKnow.Queries.SentenceAPI).GetCountBySource(domId,result(i))
     WRITE numSentS," sentences in ",srcname,!
     SET stat=##class(%iKnow.Queries.SentenceAPI).GetByCrcMask(.sentresult,domId,"student pilot",
              $$$WILDCARD,$$$WILDCARD,srcId)
     SET i=i+1
     FOR j=1:1:20 {
         IF $DATA(sentresult(j)) {
         SET sent = $LISTTOSTRING(sentresult(j),",",1)
         SET sentId = $PIECE(sent,",",3)
         WRITE "The SentenceId is ",sentId," in source ",srcname,":",!
         WRITE "  ",##class(%iKnow.Queries.SentenceAPI).GetValue(domId,sentId),!
         }
         ELSE { WRITE "Listed ",j-1," sentence that match the CRC mask",!!
         QUIT }
     }
  }
 
Listing Similar Entities
You can list the unique entities that are similar to a specified string. An entity is similar if one of the following applies:
Similarity returns each unique entity (Master Concept or Slave Concept) with integer counts of its frequency and spread, in descending sort order of these integer counts. Similarity does not match Relations. As is true throughout iKnow, matching ignores letter case; all entities are returned in lowercase letters. Similarity does not use stemming logic; “cat” returns both “cats” and “category”.
The following example lists the entities that are similar to the string “student pilot”:
#Include %IKPublic
  ZNSPACE "Samples"
DomainCreateOrOpen
  SET dname="mydomain"
  IF (##class(%iKnow.Domain).Exists(dname))
     { WRITE "The ",dname," domain already exists",!
       SET domoref=##class(%iKnow.Domain).Open(dname)
       GOTO DeleteOldData }
  ELSE 
     { WRITE "The ",dname," domain does not exist",!
       SET domoref=##class(%iKnow.Domain).%New(dname)
       DO domoref.%Save()
       WRITE "Created the ",dname," domain with domain ID ",domoref.Id,!
       GOTO ListerAndLoader }
DeleteOldData
  SET stat=domoref.DropData()
  IF stat { WRITE "Deleted the data from the ",dname," domain",!!
            GOTO ListerAndLoader }
  ELSE    { WRITE "DropData error ",$System.Status.DisplayError(stat)
            QUIT}
ListerAndLoader
  SET domId=domoref.Id
  SET flister=##class(%iKnow.Source.SQL.Lister).%New(domId)
  SET myloader=##class(%iKnow.Source.Loader).%New(domId)
QueryBuild
   SET myquery="SELECT TOP 100 ID AS UniqueVal,Type,NarrativeFull FROM Aviation.Event"
   SET idfld="UniqueVal"
   SET grpfld="Type"
   SET dataflds=$LB("NarrativeFull")
UseLister
  SET stat=flister.AddListToBatch(myquery,idfld,grpfld,dataflds)
      IF stat '= 1 {WRITE "The lister failed: ",$System.Status.DisplayError(stat) QUIT }
UseLoader
  SET stat=myloader.ProcessBatch()
      IF stat '= 1 {WRITE "The loader failed: ",$System.Status.DisplayError(stat) QUIT }
SourceCountQuery
  WRITE ##class(%iKnow.Queries.SourceAPI).GetCountByDomain(domId)," total sources",!!
SimilarEntityQuery
  WRITE "Entities similar to 'Student Pilot':",!
  DO ##class(%iKnow.Queries.EntityAPI).GetSimilar(.simresult,domId,"student pilot",1,50)
  SET j=1
  WHILE $DATA(simresult(j)) {
       SET outstr = $LISTTOSTRING(simresult(j),",",1)
         SET entity = $PIECE(outstr,",",2)
         SET freq = $PIECE(outstr,",",3)
         SET spread = $PIECE(outstr,",",4)
       WRITE "(",entity,")  appears ",freq," times in ",spread," sources",!
      SET j=j+1 }
 
The default domain parameter setting governing entity similarity is EnableNgrams, a boolean value.
Parts and N-grams
The GetSimilar() and GetSimilarCounts() methods have a mode parameter that specifies where to search for similarity. There are two available values:
Listing Related Entities
An entity is related to another entity if both occur in a CRC. By default, the related entity can be either a master concept or a slave concept. (Refer to “Limiting by Position” (below) to override this default.)
The following example shows how iKnow returns related entities. It first determines how many CRCs contain the entity “student pilot” and lists these CRCs. (In this small example, you can simply read all the CRCs to see what is related to “student pilot”; in a much larger collection of sources this would not be practical.) The program example then lists all of the entities that are related to “student pilot” as either slave or master (you can confirm these relations by matching these entities against the CRCs listed earlier):
#Include %IKPublic
  ZNSPACE "Samples"
DomainCreateOrOpen
  SET dname="mydomain"
  IF (##class(%iKnow.Domain).Exists(dname))
     { WRITE "The ",dname," domain already exists",!
       SET domoref=##class(%iKnow.Domain).Open(dname)
       GOTO DeleteOldData }
  ELSE 
     { WRITE "The ",dname," domain does not exist",!
       SET domoref=##class(%iKnow.Domain).%New(dname)
       DO domoref.%Save()
       WRITE "Created the ",dname," domain with domain ID ",domoref.Id,!
       GOTO ListerAndLoader }
DeleteOldData
  SET stat=domoref.DropData()
  IF stat { WRITE "Deleted the data from the ",dname," domain",!!
            GOTO ListerAndLoader }
  ELSE    { WRITE "DropData error ",$System.Status.DisplayError(stat)
            QUIT}
ListerAndLoader
  SET domId=domoref.Id
  SET flister=##class(%iKnow.Source.SQL.Lister).%New(domId)
  SET myloader=##class(%iKnow.Source.Loader).%New(domId)
QueryBuild
   SET myquery="SELECT TOP 100 ID AS UniqueVal,Type,NarrativeFull FROM Aviation.Event"
   SET idfld="UniqueVal"
   SET grpfld="Type"
   SET dataflds=$LB("NarrativeFull")
UseLister
  SET stat=flister.AddListToBatch(myquery,idfld,grpfld,dataflds)
      IF stat '= 1 {WRITE "The lister failed: ",$System.Status.DisplayError(stat) QUIT }
UseLoader
  SET stat=myloader.ProcessBatch()
      IF stat '= 1 {WRITE "The loader failed: ",$System.Status.DisplayError(stat) QUIT }
SourceCountQuery
  WRITE ##class(%iKnow.Queries.SourceAPI).GetCountByDomain(domId)," total sources",!!
ContainCRCQuery
  SET crccount = ##class(%iKnow.Queries.CrcAPI).GetCountByEntities(domId,$LB("student pilot"))
        WRITE crccount," CRCs contain 'student pilot'",!
  DO ##class(%iKnow.Queries.CrcAPI).GetByEntities(.result,domId,$LB("student pilot"),1,crccount)
  SET i=1
  WHILE $DATA(result(i)) {
    WRITE $LISTTOSTRING(result(i),",",1),!
    SET i=i+1 }
  SET relcount = ##class(%iKnow.Queries.EntityAPI).GetRelatedCount(domId,$LB("student pilot"))
  WRITE !,relcount," entities are related to 'student pilot':",!
RelatedEntityQuery
  DO ##class(%iKnow.Queries.EntityAPI).GetRelated(.rresult,domId,$LB("student pilot"),1,relcount)
  SET j=1
  WHILE $DATA(rresult(j)) {
      WRITE $LISTTOSTRING(rresult(j),",",1),!
    SET j=j+1 }
 
Limiting by Position
The position of an entity can be Master Concept, Relation, or Slave Concept. By default, the GetRelated() method returns all related concepts regardless of position and does not return relations. You can change this default by specifying a macro constant for the 8th parameter (positiontomatch). The available constants are as follows:
Constant Value Meaning
$$$USEPOSM 1 Master Concepts
$$$USEPOSR 2 Relations
$$$USEPOSMR 3 Master Concepts and Relations
$$$USEPOSS 4 Slave Concepts
$$$USEPOSMS (the default) 5 Master Concepts and Slave Concepts
$$$USEPOSRS 6 Relations and Slave Concepts
$$$USEPOSALL 7 Master Concepts, Relations, and Slave Concepts
The following example separates the related master concepts and the related slave concepts. (Note that $$$USEPOSM means that the supplied string is the master concept in the CRC, and the related entities are the slave concepts.)
#Include %IKPublic
  ZNSPACE "Samples"
DomainCreateOrOpen
  SET dname="mydomain"
  IF (##class(%iKnow.Domain).Exists(dname))
     { WRITE "The ",dname," domain already exists",!
       SET domoref=##class(%iKnow.Domain).Open(dname)
       GOTO DeleteOldData }
  ELSE 
     { WRITE "The ",dname," domain does not exist",!
       SET domoref=##class(%iKnow.Domain).%New(dname)
       DO domoref.%Save()
       WRITE "Created the ",dname," domain with domain ID ",domoref.Id,!
       GOTO ListerAndLoader }
DeleteOldData
  SET stat=domoref.DropData()
  IF stat { WRITE "Deleted the data from the ",dname," domain",!!
            GOTO ListerAndLoader }
  ELSE    { WRITE "DropData error ",$System.Status.DisplayError(stat)
            QUIT}
ListerAndLoader
  SET domId=domoref.Id
  SET flister=##class(%iKnow.Source.SQL.Lister).%New(domId)
  SET myloader=##class(%iKnow.Source.Loader).%New(domId)
QueryBuild
   SET myquery="SELECT TOP 100 ID AS UniqueVal,Type,NarrativeFull FROM Aviation.Event"
   SET idfld="UniqueVal"
   SET grpfld="Type"
   SET dataflds=$LB("NarrativeFull")
UseLister
  SET stat=flister.AddListToBatch(myquery,idfld,grpfld,dataflds)
      IF stat '= 1 {WRITE "The lister failed: ",$System.Status.DisplayError(stat) QUIT }
UseLoader
  SET stat=myloader.ProcessBatch()
      IF stat '= 1 {WRITE "The loader failed: ",$System.Status.DisplayError(stat) QUIT }
SourceCountQuery
  WRITE ##class(%iKnow.Queries.SourceAPI).GetCountByDomain(domId)," total sources",!!
ContainCRCQuery
  SET crccount = ##class(%iKnow.Queries.CrcAPI).GetCountByEntities(domId,$LB("student pilot"))
        WRITE crccount," CRCs contain 'student pilot'",!
  DO ##class(%iKnow.Queries.CrcAPI).GetByEntities(.result,domId,$LB("student pilot"),1,crccount)
  SET i=1
  WHILE $DATA(result(i)) {
    WRITE $LISTTOSTRING(result(i),",",1),!
    SET i=i+1 }
  SET relcount = ##class(%iKnow.Queries.EntityAPI).GetRelatedCount(domId,$LB("student pilot"))
  WRITE !,relcount," entities are related to 'student pilot':",!

ListRelatedMastersQuery
  DO ##class(%iKnow.Queries.EntityAPI).GetRelated(.mresult,domId,$LB("student pilot"),1,relcount,"","",$$$USEPOSM)
   WRITE !,"The following have 'student pilot' as a master:",!
   SET j=1
   WHILE $DATA(mresult(j)) {
      WRITE $LISTTOSTRING(mresult(j),",",1),!
      SET j=j+1 }
ListRelatedSlavesQuery
  DO ##class(%iKnow.Queries.EntityAPI).GetRelated(.sresult,domId,$LB("student pilot"),1,relcount,"","",$$$USEPOSS)
   WRITE !,"The following have 'student pilot' as a slave:",!
   SET k=1
   WHILE $DATA(sresult(k)) {
      WRITE $LISTTOSTRING(sresult(k),",",1),!
      SET k=k+1 }
 
Counting Paths
The following example shows the count of paths and the count of sentences for 50 sources. Commonly there are more paths than sentences in a source. However, it is possible that there may be more sentences than paths in some sources.
#Include %IKPublic
  ZNSPACE "Samples"
DomainCreateOrOpen
  SET dname="mydomain"
  IF (##class(%iKnow.Domain).Exists(dname))
     { WRITE "The ",dname," domain already exists",!
       SET domoref=##class(%iKnow.Domain).Open(dname)
       GOTO DeleteOldData }
  ELSE 
     { WRITE "The ",dname," domain does not exist",!
       SET domoref=##class(%iKnow.Domain).%New(dname)
       DO domoref.%Save()
       WRITE "Created the ",dname," domain with domain ID ",domoref.Id,!
       GOTO ListerAndLoader }
DeleteOldData
  SET stat=domoref.DropData()
  IF stat { WRITE "Deleted the data from the ",dname," domain",!!
            GOTO ListerAndLoader }
  ELSE    { WRITE "DropData error ",$System.Status.DisplayError(stat)
            QUIT}
ListerAndLoader
  SET domId=domoref.Id
  SET flister=##class(%iKnow.Source.SQL.Lister).%New(domId)
  SET myloader=##class(%iKnow.Source.Loader).%New(domId)
QueryBuild
   SET myquery="SELECT TOP 50 ID AS UniqueVal,Type,NarrativeFull FROM Aviation.Event"
   SET idfld="UniqueVal"
   SET grpfld="Type"
   SET dataflds=$LB("NarrativeFull")
UseLister
  SET stat=flister.AddListToBatch(myquery,idfld,grpfld,dataflds)
      IF stat '= 1 {WRITE "The lister failed: ",$System.Status.DisplayError(stat) QUIT }
UseLoader
  SET stat=myloader.ProcessBatch()
      IF stat '= 1 {WRITE "The loader failed: ",$System.Status.DisplayError(stat) QUIT }
SourceCountQuery
  WRITE ##class(%iKnow.Queries.SourceAPI).GetCountByDomain(domId)," total sources",!!
PathCountBySource
  DO ##class(%iKnow.Queries.SourceAPI).GetByDomain(.result,domId,1,50)
  SET i=1
  WHILE $DATA(result(i)) {
     SET srcId = $LISTGET(result(i),1)
     SET extId = $LISTGET(result(i),2)
     SET fullref = $PIECE(extId,":",3,4)
     SET fname = $PIECE(fullref,"\",$LENGTH(extId,"\"))
     SET numPathS = ##class(%iKnow.Queries.PathAPI).GetCountBySource(domId,$LB(srcId))
     SET numSentS = ##class(%iKnow.Queries.SentenceAPI).GetCountBySource(domId,result(i))
     WRITE numPathS," paths and ",numSentS," sentences in ",fname,!
     SET i=i+1 }
 
Listing Similar Sources
The iKnow semantic analysis engine can list which sources are similar to a specified source. Similarity between sources is determined by the number of entities that appear in both sources (the overlap), and the percentage of the source contents that contain overlap.
The GetSimilar() method can calculate similarity of sources to a specified source. Because of the potentially large number of similar sources, this method is commonly used with a filter to limit the set of sources considered. GetSimilar() can use your choice of two algorithms, each of which takes an algorithm parameter:
For each similar source, iKnow returns a list of elements with the following format:
srcId,extId,percentageMatched,percentageNew,nbOfEntsInRefSrc,nbOfEntsInCommon,nbOfEntsInSimSrc,score
srcId The source ID, an integer assigned by iKnow.
extId The external ID for the source, a string value.
percentageMatched The percentage of the contents of the source that is the same as the match source.
percentageNew The percentage of the contents of the source that is new. New contents are those that do not match with the match source.
nbOfEntsInRefSrc The number of unique entities in the source being referenced (matched against this source).
nbOfEntsInCommon The number of unique entities that are found in both sources.
nbOfEntsInSimSrc The number of unique entities in this source.
score The similarity score, expressed as a fractional number. An identical source would have a similarity score of 1.
The following example demonstrates the listing of similar sources. It first limits the set of test sources to those that may describe an engine failure incident, by using GetByEntities() to select for a list of appropriate entities. It then uses GetSimilar() to find sources similar to these test sources, which may indicate a pattern of similar incidents. GetSimilar() takes the default similarity algorithm ($$$SIMSRCSIMPLE) and its default algorithm parameter (“ent”). The program displays only those similar sources with a high similarity score (>.33). The similarity display omits the source external IDs:
#Include %IKPublic
  ZNSPACE "Samples"
DomainCreateOrOpen
  SET dname="mydomain"
  IF (##class(%iKnow.Domain).Exists(dname))
     { WRITE "The ",dname," domain already exists",!
       SET domoref=##class(%iKnow.Domain).Open(dname)
       GOTO DeleteOldData }
  ELSE 
     { WRITE "The ",dname," domain does not exist",!
       SET domoref=##class(%iKnow.Domain).%New(dname)
       DO domoref.%Save()
       WRITE "Created the ",dname," domain with domain ID ",domoref.Id,!
       GOTO ListerAndLoader }
DeleteOldData
  SET stat=domoref.DropData()
  IF stat { WRITE "Deleted the data from the ",dname," domain",!!
            GOTO ListerAndLoader }
  ELSE    { WRITE "DropData error ",$System.Status.DisplayError(stat)
            QUIT}
ListerAndLoader
  SET domId=domoref.Id
  SET flister=##class(%iKnow.Source.SQL.Lister).%New(domId)
  SET myloader=##class(%iKnow.Source.Loader).%New(domId)
QueryBuild
   SET myquery="SELECT TOP 100 ID AS UniqueVal,Type,NarrativeFull FROM Aviation.Event"
   SET idfld="UniqueVal"
   SET grpfld="Type"
   SET dataflds=$LB("NarrativeFull")
UseLister
  SET stat=flister.AddListToBatch(myquery,idfld,grpfld,dataflds)
      IF stat '= 1 {WRITE "The lister failed: ",$System.Status.DisplayError(stat) QUIT }
UseLoader
  SET stat=myloader.ProcessBatch()
      IF stat '= 1 {WRITE "The loader failed: ",$System.Status.DisplayError(stat) QUIT }
SourceCountQuery
  SET totsrc = ##class(%iKnow.Queries.SourceAPI).GetCountByDomain(domId)
  WRITE totsrc," total sources",!
SimiarSourcesQuery
  SET engineents = $LB("engine","engine failure","engine power","loss of power","carburetor","crankshaft","piston")
  DO ##class(%iKnow.Queries.SourceAPI).GetByEntities(.result,domId,engineents,1,totsrc)
  SET i=1
  WHILE $DATA(result(i)) {
      SET src = $LISTTOSTRING(result(i),",",1)
      SET srcId = $PIECE(src,",",1)
      WRITE "Source ",srcId," contains an engine incident",!
      DO ##class(%iKnow.Queries.SourceAPI).GetSimilar(.sim,domId,srcId,1,50,"",$$$SIMSRCSIMPLE,$LB("ent"))
      SET j=1
      WHILE $DATA(sim(j)) {
          SET simlist=$LISTTOSTRING(sim(j))
          IF $PIECE(simlist,",",8) > .33 {
              WRITE "   similar to source ",$PIECE(simlist,",",1),": "
              WRITE $PIECE(simlist,",",3,8),! }
          SET j=j+1 }
  SET i=i+1 }
 
Summarizing a Source
The iKnow semantic analysis engine can summarize a source text by returning the most relevant sentences. It returns a user-specified number of sentences in the original sentence order, selecting those sentences that have the highest similarity to the overall content of the source text. iKnow determines relevance by calculating an internal relevancy score for each sentence. Sentences that contain concepts that appear many times in the source text are more likely to be included in the summary than those that contain concepts that only appear once in the source text. iKnow considers the overall frequency of each concept, the similarity of each concept to the most frequent concepts in the source, and other factors.
Summarizing a source is only available if the Summarize property was set to 1 in the Configuration when loading the source. The default Configuration specifies Summarize=1.
The accuracy of a summary therefore depends on two factors:
iKnow provides three summary methods:
For details on what iKnow considers a sentence, refer to the Logical Text Units Identified by iKnow section of the “Conceptual Overview” chapter.
The following example goes through the source texts in a domain until it finds one that contains more than 100 sentences. It then uses GetSummary() to summarize that source to half of its original sentences:
#Include %IKPublic
  ZNSPACE "Samples"
DomainCreateOrOpen
  SET dname="mydomain"
  IF (##class(%iKnow.Domain).Exists(dname))
     { WRITE "The ",dname," domain already exists",!
       SET domoref=##class(%iKnow.Domain).Open(dname)
       GOTO DeleteOldData }
  ELSE 
     { WRITE "The ",dname," domain does not exist",!
       SET domoref=##class(%iKnow.Domain).%New(dname)
       DO domoref.%Save()
       WRITE "Created the ",dname," domain with domain ID ",domoref.Id,!
       GOTO ListerAndLoader }
DeleteOldData
  SET stat=domoref.DropData()
  IF stat { WRITE "Deleted the data from the ",dname," domain",!!
            GOTO ListerAndLoader }
  ELSE    { WRITE "DropData error ",$System.Status.DisplayError(stat)
            QUIT}
ListerAndLoader
  SET domId=domoref.Id
  SET flister=##class(%iKnow.Source.SQL.Lister).%New(domId)
  SET myloader=##class(%iKnow.Source.Loader).%New(domId)
QueryBuild
   SET myquery="SELECT TOP 100 ID AS UniqueVal,Type,NarrativeFull FROM Aviation.Event"
   SET idfld="UniqueVal"
   SET grpfld="Type"
   SET dataflds=$LB("NarrativeFull")
UseLister
  SET stat=flister.AddListToBatch(myquery,idfld,grpfld,dataflds)
      IF stat '= 1 {WRITE "The lister failed: ",$System.Status.DisplayError(stat) QUIT }
UseLoader
  SET stat=myloader.ProcessBatch()
      IF stat '= 1 {WRITE "The loader failed: ",$System.Status.DisplayError(stat) QUIT }
SourceSentenceTotals
  SET numSrcD=##class(%iKnow.Queries.SourceAPI).GetCountByDomain(domId)
  WRITE "The domain contains ",numSrcD," sources",!
  SET numSentD=##class(%iKnow.Queries.SentenceAPI).GetCountByDomain(domId)
  WRITE "These sources contain ",numSentD," sentences",!!
  DO ##class(%iKnow.Queries.SourceAPI).GetByDomain(.result,domId)
SentenceCounts
  FOR i=1:1:numSrcD {
     SET srcId = $LISTGET(result(i),1)
     SET extId = $LISTGET(result(i),2)
     SET fullref = $PIECE(extId,":",3,4)
     SET fname = $PIECE(fullref,"\",$LENGTH(extId,"\"))
     SET numSentS = ##class(%iKnow.Queries.SentenceAPI).GetCountBySource(domId,result(i))
       IF numSentS > 100 {WRITE fname," has ",numSentS," sentences",!
                          GOTO SummarizeASource }
     }
     QUIT
SummarizeASource
   SET sumlen=$NUMBER(numSentS/2,0)
   WRITE "total sentences=",numSentS," summary=",sumlen," sentences",!!
   DO ##class(%iKnow.Queries.SourceAPI).GetSummary(.sumresult,domId,srcId,sumlen)
   FOR j=1:1:sumlen { WRITE "[S",j,"]: ",$LISTGET(sumresult(j),2),! }
   WRITE !,"END OF ",fname," SUMMARY",!!
   QUIT
 
Note that $NUMBER is used to assure that the specified summary sentence count is an integer. $LISTGET is used to remove the sentence Id and return just the sentence text.
The following example uses GetSummaryDirect() to return the same summary as a single concatenated string. It then uses $EXTRACT to divide the string into 38-character lines for display purposes:
#Include %IKPublic
  ZNSPACE "Samples"
DomainCreateOrOpen
  SET dname="mydomain"
  IF (##class(%iKnow.Domain).Exists(dname))
     { WRITE "The ",dname," domain already exists",!
       SET domoref=##class(%iKnow.Domain).Open(dname)
       GOTO DeleteOldData }
  ELSE 
     { WRITE "The ",dname," domain does not exist",!
       SET domoref=##class(%iKnow.Domain).%New(dname)
       DO domoref.%Save()
       WRITE "Created the ",dname," domain with domain ID ",domoref.Id,!
       GOTO ListerAndLoader }
DeleteOldData
  SET stat=domoref.DropData()
  IF stat { WRITE "Deleted the data from the ",dname," domain",!!
            GOTO ListerAndLoader }
  ELSE    { WRITE "DropData error ",$System.Status.DisplayError(stat)
            QUIT}
ListerAndLoader
  SET domId=domoref.Id
  SET flister=##class(%iKnow.Source.SQL.Lister).%New(domId)
  SET myloader=##class(%iKnow.Source.Loader).%New(domId)
QueryBuild
   SET myquery="SELECT TOP 100 ID AS UniqueVal,Type,NarrativeFull FROM Aviation.Event"
   SET idfld="UniqueVal"
   SET grpfld="Type"
   SET dataflds=$LB("NarrativeFull")
UseLister
  SET stat=flister.AddListToBatch(myquery,idfld,grpfld,dataflds)
      IF stat '= 1 {WRITE "The lister failed: ",$System.Status.DisplayError(stat) QUIT }
UseLoader
  SET stat=myloader.ProcessBatch()
      IF stat '= 1 {WRITE "The loader failed: ",$System.Status.DisplayError(stat) QUIT }
SourceSentenceTotals
  SET numSrcD=##class(%iKnow.Queries.SourceAPI).GetCountByDomain(domId)
  WRITE "The domain contains ",numSrcD," sources",!
  SET numSentD=##class(%iKnow.Queries.SentenceAPI).GetCountByDomain(domId)
  WRITE "These sources contain ",numSentD," sentences",!!
  DO ##class(%iKnow.Queries.SourceAPI).GetByDomain(.result,domId)
SentenceCounts
  FOR i=1:1:numSrcD {
     SET srcId = $LISTGET(result(i),1)
     SET extId = $LISTGET(result(i),2)
     SET fullref = $PIECE(extId,":",3,4)
     SET fname = $PIECE(fullref,"\",$LENGTH(extId,"\"))
     SET numSentS = ##class(%iKnow.Queries.SentenceAPI).GetCountBySource(domId,result(i))
       IF numSentS > 100 {WRITE fname," has ",numSentS," sentences",!
                          GOTO SummarizeASource }
     }
     QUIT
SummarizeASource
   SET sumlen=$NUMBER(numSentS/2,0)
   WRITE "total sentences=",numSentS," summary=",sumlen," sentences",!!
   SET summary = ##class(%iKnow.Queries.SourceAPI).GetSummaryDirect(domId,srcId,sumlen)
FormatSummaryDisplay
   SET x=1
   SET totlines=$LENGTH(summary)/38
   FOR i=1:1:totlines {
      WRITE $EXTRACT(summary,x,x+38),!
      SET x=x+39 }
   WRITE !,"END OF ",fname," SUMMARY"
 
Custom Summaries
iKnow permits you to generate custom summaries for sources by specifying a summaryConfig parameter string. Custom summaries are provided for those who desire to tune the content of iKnow generated summaries to their specific needs. Custom summaries allow you to absolutely include, preferentially include, or absolutely exclude sentences into the summary. You can, for example, include or exclude standard components of sources that always appear at the same location, such as a title, byline, copyright, abstract, or summary. You can also absolutely or preferentially include or exclude sentences that contain a specified word.
The source summarization operation first gives each sentence a numeric summary weight, and then creates the summary by selecting the appropriate number of sentences with the highest weights. You can influence this ranking by specifying a summaryConfig parameter to the summary method.
The summaryConfig parameter value is a string consisting of one or more specifications. Each specification consists of three elements separated by vertical bars. For example, "s|2|false". You can concatenate multiple specifications using a vertical bar. For example, "s|1|true|s|2|false". The summaryConfig parameter default is the empty string.
You can configure the summary to select sentences according to the following:
You can specify multiple summary customizations by concatenation. For example: "s|1|true|s|2|false|w|surgery|3|w|hypnosis|false" (always include the first sentence, never include the second sentence, increase the summary weight of all sentences containing the word “surgery”, exclude all sentences containing the word “hypnosis”.
Thus the user can give more or less importance to specific words and/or sentences. The weight of sentences affected by more than one of the specifications in the summaryConfig will be resolved by the Custom Summaries algorithm. This algorithm also applies when there is a conflict between specifications that apply to the same sentence:
The options for custom summaries can be set by means of the summaryConfig parameter in the %iKnow.Queries.SourceAPI.GetSummary() and %iKnow.Queries.SourceAPI.GetSummaryForText() methods.
Querying a Subset of the Sources
iKnow provides filters that allow you to include or exclude sources from a query. You can include or exclude sources based on:
iKnow supports the combining of multiple filters through logical AND and logical OR operators. For further details, refer to the Filtering chapter of this manual.