Skip to main content

Skiplists

Important:

InterSystems has deprecatedOpens in a new tab InterSystems IRIS® Natural Language Processing (NLP). It may be removed from future versions of InterSystems products. The following documentation is provided as reference for existing users only. Existing users who would like assistance identifying an alternative solution should contact the WRCOpens in a new tab.

A skiplist is a list of entities that you do not want a query to return. For example, if your source texts include greetings and salutations, you might want to put “many thanks”, “best regards”, and other stock phrases with no real information content in your skiplist. A skiplist might also be used to suppress top concepts that are too general and widespread to be of interest when analyzing query results.

Note:

When displaying results of a query that applied a skiplist, it is important to note that use of a skiplist silently changes the query results by suppressing some information. Make sure the skiplists used are relevant for the data contents, for the user looking at the query results, and for the context in which query results are displayed.

Creating a Skiplist

You can define a skiplist that is assigned to a specific domain, or define a skiplist that is domain-independent (cross-domain) and can be used by any domain in the current namespace. You can define a skiplist in two ways:

  • Using the InterSystems IRIS Domain Architect. This interface can be used define a skiplist within a domain, add and delete skiplist entities, delete a skiplist, or list all skiplists defined for a domain. This interface supports populating a skiplist by specifying entities as strings.

  • Using the %iKnow.Utils.MaintenanceAPIOpens in a new tab class methods to define, populate, and maintain skiplists. This class allows you to create both domain-specific and cross-domain skiplists. This class provides methods for populating a skiplist either by specifying entities as strings or by specifying entities by entity Id. Use of some of the skiplist %iKnow.Utils.MaintenanceAPIOpens in a new tab class methods is shown in this chapter.

You can use the CopySkipLists()Opens in a new tab method of the %iKnow.Utils.CopyUtilsOpens in a new tab class to copy all defined skiplists in a domain to another domain.

The Domain Explorer and the Basic Portal user interfaces support the use of skiplists.

The following example creates a domain-specific skiplist and populates it with elements. It then lists all skiplists for the domain, and all of the elements in this domain. Finally, it deletes the skiplist.

DomainCreateOrOpen
  SET domn="mydomainwithsl"
  IF (##class(%iKnow.Domain).NameIndexExists(domn))
     { SET domo=##class(%iKnow.Domain).NameIndexOpen(domn)
       SET domId=domo.Id }
  ELSE { SET domo=##class(%iKnow.Domain).%New(domn)
         DO domo.%Save()
         SET domId=domo.Id }
CreateSkipList
  SET slname="AviationSkipList"
  SET slId=##class(%iKnow.Utils.MaintenanceAPI).CreateSkipList(domId,slname,
       "Aviation non-mechanical terms skiplist")
PopulateSkipList
  SET skip=$LB("aircraft","airplane","flight","accident","event","incident","pilot",
                "student pilot","flight instructor","runway","accident site","ground","visibility","faa")
  SET ptr=0
  FOR x=0:1:100 {
   SET moredata=$LISTNEXT(skip,ptr,val)
    IF moredata=1 {
      SET stat=##class(%iKnow.Utils.MaintenanceAPI).AddStringToSkipList(domId,slId,val)
    }
    ELSE { WRITE x," entities in skiplist",!!
           GOTO ListSkipList }
  }
ListSkipList
  SET stat=##class(%iKnow.Utils.MaintenanceAPI).GetSkipLists(.sl,domId,0)
  SET i=1
  WHILE $DATA(sl(i)) {
       WRITE $LISTTOSTRING(sl(i),",",1),!
       SET i=i+1 }
  WRITE "Printed the ",i-1," skiplists",!
  SET stat=##class(%iKnow.Utils.MaintenanceAPI).GetSkipListElements(.sle,domId,slId)
   /* IF stat=1 {WRITE "success",!}
  ELSE {WRITE "GetSkipListElements failed" QUIT } */
  SET j=1
  WHILE $DATA(sle(j)) {
       WRITE $LISTTOSTRING(sle(j),",",1),!
       SET j=j+1 }
  WRITE "Printed the ",j-1," skiplist elements",!
CleanUp
  SET stat=##class(%iKnow.Utils.MaintenanceAPI).DropSkipList(domId,slId)
  IF stat=1 {WRITE "Skiplist deleted",!}
  ELSE {WRITE "DropSkipList failed" QUIT }

The CreateSkipList()Opens in a new tab method allows you to specify both a name and a description for your skiplist. A skiplist name can be any valid string of any length; skiplist names are case-sensitive. The name you assign to a skiplist must be unique: for a domain-specific skiplist it must be unique within the domain; for a cross-domain skiplist it must be unique within the namespace. Specifying a duplicate skiplist name generates ERROR #8091. The skiplist description is optional; it can be a string of any length.

Skiplists and Domains

Each skiplist you create can either be specific to a domain, or can be cross-domain (domain-independent) and usable by any domain in the current namespace:

  • A domain-specific skiplist is assigned to a domain by specifying a domain ID in the CreateSkipList() method. This method returns a skiplist ID as a sequential positive integer. Query methods that use this skiplist reference it by this skiplist ID. A domain-specific skiplist can support stemming.

  • A cross-domain skiplist is not assigned to a domain. Instead, you specify a domain ID of 0 in the CreateSkipList() method. This method returns a skiplist ID as a sequential positive integer. Query methods that use this skiplist reference it by a negative skiplist ID; for example, the skiplist identified by skiplist ID 8 is referenced by the skiplist ID value -8.

To populate a domain-specific skiplist, you can use either AddEntityToSkipList()Opens in a new tab or AddStringToSkipList()Opens in a new tab. To populate a cross-domain skiplist, you can only use AddStringToSkipList()Opens in a new tab.

GetSkipListElements()Opens in a new tab returns the empty string for the entUniId value for a cross-domain skiplist.

The following example creates and populates two skiplists, a domain-specific skiplist (AviationTermsSkipList) and a cross-domain skiplist (JobTitleSkipList). The GetSkipLists()Opens in a new tab method returns both skiplists, because the pIncludeCrossDomain boolean is set to 1. Note that GetSkipLists() returns the skiplist ID for the cross-domain skiplist as a negative integer.

DomainCreateOrOpen
  SET domn="mydomainwithsl"
  IF (##class(%iKnow.Domain).NameIndexExists(domn))
     { SET domo=##class(%iKnow.Domain).NameIndexOpen(domn)
       SET domId=domo.Id }
  ELSE { SET domo=##class(%iKnow.Domain).%New(domn)
         DO domo.%Save()
         SET domId=domo.Id }
CreateSkipList1
  SET slname="AviationTermsSkipList"
  SET slId=##class(%iKnow.Utils.MaintenanceAPI).CreateSkipList(domId,slname,
       "Common aviation terms skiplist")
PopulateSkipList1
  SET skip=$LB("aircraft","airplane","flight","accident","event","incident","airport","runway")
  SET ptr=0
  FOR x=0:1:100 {
   SET moredata=$LISTNEXT(skip,ptr,val)
    IF moredata=1 {
      SET stat=##class(%iKnow.Utils.MaintenanceAPI).AddStringToSkipList(domId,slId,val)
    }
  }
  WRITE "Skiplist ",slname," populated",!
CreateSkipList2
  SET sl2name="JobTitleSkipList"
  SET sl2Id=##class(%iKnow.Utils.MaintenanceAPI).CreateSkipList(0,sl2name,
       "Aviation personnel skiplist")
PopulateSkipList2
  SET jobskip=$LB("pilot","copilot","student pilot","flight instructor","passenger")
  SET ptr=0
  FOR x=0:1:100 {
   SET moredata=$LISTNEXT(jobskip,ptr,val)
    IF moredata=1 {
      SET stat=##class(%iKnow.Utils.MaintenanceAPI).AddStringToSkipList(0,sl2Id,val)
    }
  }
  WRITE "Skiplist ",sl2name," populated",!!
ListSkipLists
  SET pIncludeCrossDomain=1
  SET stat=##class(%iKnow.Utils.MaintenanceAPI).GetSkipLists(.sl,domId,pIncludeCrossDomain)
  SET i=1
  WHILE $DATA(sl(i)) {
     IF $LIST(sl(i),1)<0 {
        WRITE "cross-domain:",!,$LISTTOSTRING(sl(i),",",1),! }
     ELSE { WRITE "domain-specific:",!,$LISTTOSTRING(sl(i),",",1),! }
  SET i=i+1 }
  WRITE "Printed the ",i-1," skiplists",!!
CleanUp
  SET stat=##class(%iKnow.Utils.MaintenanceAPI).DropSkipList(domId,slId)
  IF stat=1 {WRITE "domain skiplist deleted",!}
  ELSE {WRITE "first DropSkipList failed" }
  SET stat=##class(%iKnow.Utils.MaintenanceAPI).DropSkipList(0,sl2Id)
  IF stat=1 {WRITE "cross-domain skiplist deleted",!}
  ELSE {WRITE "second DropSkipList failed" }

Queries that Support Skiplists

The following query methods provide a parameter to specify skiplists. You can specify multiple skiplists to any of these methods by specifying the skiplist IDs as elements of a %List structure, using the $LISTBUILD function. You specify a domain-specific skiplist as a positive integer skiplist ID value; you specify a cross-domain skiplist as a negative integer skiplist ID value.

Entity Queries:

Sentence Queries:

Source Queries:

Skiplist Query Example

Refer to A Note on Program Examples for details on the coding and data used in the examples in this book.

The following example suppresses non-mechanical aviation terms that are too general to be of interest. It uses CreateSkipList()Opens in a new tab to create a skiplist, uses AddStringToSkipList()Opens in a new tab to add entities to the skiplist, then supplies the skiplist to the GetTop()Opens in a new tab method:

#include %IKPublic
DomainCreateOrOpen
  SET dname="mydomain"
  IF (##class(%iKnow.Domain).NameIndexExists(dname))
     { WRITE "The ",dname," domain already exists",!
       SET domoref=##class(%iKnow.Domain).NameIndexOpen(dname)
       GOTO DeleteOldData }
  ELSE 
     { WRITE "The ",dname," domain does not exist",!
       SET domoref=##class(%iKnow.Domain).%New(dname)
       DO domoref.%Save()
       WRITE "Created the ",dname," domain with domain ID ",domoref.Id,!
       GOTO ListerAndLoader }
DeleteOldData
  SET stat=domoref.DropData()
  IF stat { WRITE "Deleted the data from the ",dname," domain",!!
            GOTO ListerAndLoader }
  ELSE    { WRITE "DropData error ",$System.Status.DisplayError(stat)
            QUIT}
ListerAndLoader
  SET domId=domoref.Id
  SET flister=##class(%iKnow.Source.SQL.Lister).%New(domId)
  SET myloader=##class(%iKnow.Source.Loader).%New(domId)
CreateSkipList1
  SET slname="AviationTermsSkipList"
  SET slId=##class(%iKnow.Utils.MaintenanceAPI).CreateSkipList(domId,slname,
       "Common aviation terms skiplist")
PopulateSkipList
  SET skip=$LB("aircraft","airplane","flight","accident","event","incident","pilot","airport",
                "student pilot","flight instructor","runway","accident site","ground","visibility","faa")
  SET ptr=0
  FOR x=0:1:100 {
   SET moredata=$LISTNEXT(skip,ptr,val)
    IF moredata=1 {
      SET stat=##class(%iKnow.Utils.MaintenanceAPI).AddStringToSkipList(domId,slId,val)
    }
  }
  WRITE "Skiplist ",slname," populated",!

QueryBuild
   SET myquery="SELECT TOP 100 ID AS UniqueVal,Type,NarrativeFull FROM Aviation.Event"
   SET idfld="UniqueVal"
   SET grpfld="Type"
   SET dataflds=$LB("NarrativeFull")
UseLister
  SET stat=flister.AddListToBatch(myquery,idfld,grpfld,dataflds)
      IF stat '= 1 {WRITE "The lister failed: ",$System.Status.DisplayError(stat) QUIT }
UseLoader
  SET stat=myloader.ProcessBatch()
      IF stat '= 1 {WRITE "The loader failed: ",$System.Status.DisplayError(stat) QUIT }
SourceCountQuery
  SET numSrcD=##class(%iKnow.Queries.SourceQAPI).GetCountByDomain(domId)
  WRITE "The domain contains ",numSrcD," sources",!
TopEntitiesQuery
  DO ##class(%iKnow.Queries.EntityAPI).GetTop(.result,domId,1,20,"",0,0,0,0,$LB(slId))
  WRITE "NOTE: the ",slname," skiplist",!,
        "has been applied to this list of top entities",!
  SET i=1
  WHILE $DATA(result(i)) {
       SET outstr = $LISTTOSTRING(result(i),",",1)
         SET entity = $PIECE(outstr,",",2)
         SET freq = $PIECE(outstr,",",3)
         SET spread = $PIECE(outstr,",",4)
       WRITE "[",entity,"] appears ",freq," times in ",spread," sources",!
       SET i=i+1 }
  WRITE "Printed the top ",i-1," entities"
FeedbackOpens in a new tab