Class Reference
Cache for UNIX 2018.1.1
InterSystems: The power behind what matters   
Documentation  Search
Private  Storage  

class %iKnow.Queries.SentenceAPI extends %iKnow.Queries.AbstractAPI

Main Query API class to retrieve sentence information.

Inventory

Parameters Properties Methods Queries Indices ForeignKeys Triggers
11 30


Summary

Methods
%AddToSaveSet %ClassIsLatestVersion %ClassName %ConstructClone
%DispatchClassMethod %DispatchGetModified %DispatchGetProperty %DispatchMethod
%DispatchSetModified %DispatchSetMultidimProperty %DispatchSetProperty %Extends
%GetParameter %IsA %IsModified %New
%NormalizeObject %ObjectModified %OriginalNamespace %PackageName
%RemoveFromSaveSet %SerializeObject %SetModified %ValidateObject
GetAttributes GetByCrcIds GetByCrcMask GetByCrcs
GetByEntities GetByEntityIds GetByPathIds GetBySource
GetCountByCrcIds GetCountByCrcMask GetCountByCrcs GetCountByDomain
GetCountByEntities GetCountByEntityIds GetCountByPathIds GetCountBySource
GetHighlighted GetLanguage GetNewBySource GetPartLiteral
GetParts GetPosition GetSourceId GetValue


Parameters

• parameter GetAttributesRT = "attTypeId:%Integer,attType:%String,start:%Integer,span:%Integer,wordPositions:%String,properties:%String,level:%Integer";
• parameter GetByCrcIdsRT = "srcId:%Integer,externalId:%String,sentId:%Integer,sentenceValue:%String";
• parameter GetByCrcMaskRT = "srcId:%Integer,externalId:%String,sentId:%Integer,sentenceValue:%String";
• parameter GetByCrcsRT = "srcId:%Integer,externalId:%String,sentId:%Integer,sentenceValue:%String";
• parameter GetByEntitiesRT = "srcId:%Integer,externalId:%String,sentId:%Integer,sentenceValue:%String";
• parameter GetByEntityIdsRT = "srcId:%Integer,externalId:%String,sentId:%Integer,sentenceValue:%String";
• parameter GetByPathIdsRT = "srcId:%Integer,externalId:%String,sentId:%Integer,sentenceValue:%String";
• parameter GetBySourceRT = "sentId:%Integer,sentenceValue:%String,sentenceIsTruncated:%Boolean";
• parameter GetNewBySourceRT = "sentId:%Integer,sentenceValue:%String,score:%Numeric";
• parameter GetPartsRT = "entOccId:%Integer,entUniId:%Integer,literal:%String,role:%Integer,stemUniId:%Integer";

Methods

• classmethod GetAttributes(ByRef pResult, pDomainId As %Integer, pSentId As %Integer, vSrcId As %Integer = 0, pIncludePathAttributes As %Boolean = 0) as %Status

Returns all attributes for a given sentence. By default, only entity-level attributes are returned, with the wordPositions result column referring which words within the affected entities are actually attributed. Using pIncludePathAttributes, also path-level attributes (such as implied negation) can be returned, but these will have no values for the wordPositions column. Also note that the start and span columns for path-level results will refer to positions within those paths and not entity positions within the sentence. See also GetAttributes in %iKnow.Queries.PathAPI and GetOccurrenceAttributes in %iKnow.Queries.EntityAPI.

Any named attribute properties are also included through sub-nodes (not available through SQL or SOAP):

pResult(rowNumber, propertyName) = propertyValue

The returned wordPositions apply to the entities starting from start up to offset and only extend to the last attributed word position (there might be more words within the entity).

• classmethod GetByCrcIds(ByRef result, domainid As %Integer, crcidlist As %List, filter As %iKnow.Filters.Filter = "", page As %Integer = 1, pagesize As %Integer = 10, setop As %Integer = $$$UNION) as %Status

Retrieves all sentences containing the given CRC ids, optionally limited to all sentences in records satisfying filter. For querying Virtual Sources, set filter to a single, negative integer. In this case, crcidlist is expected to contain virtual Entity IDs.

See also GetByEntities for a description of the parameters.

• classmethod GetByCrcMask(ByRef result, domainid As %Integer, master As %String = $$$WILDCARD, relation As %String = $$$WILDCARD, slave As %String = $$$WILDCARD, filter As %iKnow.Filters.Filter = "", page As %Integer = 1, pagesize As %Integer = 10, setop As %Integer = $$$UNION, pActualFormOnly As %Boolean = 0) as %Status

Retrieves all sentences containing a CRC satisfying the given CRC Mask, optionally limited to all sentences in records satisfying filter. For querying Virtual Sources, set filter to a single, negative integer.

See also GetByEntities for a description of the parameters.

• classmethod GetByCrcs(ByRef result, domainid As %Integer, crclist As %List, filter As %iKnow.Filters.Filter = "", page As %Integer = 1, pagesize As %Integer = 10, setop As %Integer = $$$UNION) as %Status

Retrieves all sentences containing the given CRCs, optionally limited to all sentences in records satisfying filter. For querying Virtual Sources, set filter to a single, negative integer.

See also GetByEntities for a description of the parameters.

• classmethod GetByEntities(ByRef result, domainid As %Integer, entitylist As %List, filter As %iKnow.Filters.Filter = "", page As %Integer = 1, pagesize As %Integer = 10, setop As %Integer = $$$UNION, pActualFormOnly As %Boolean = 0) as %Status

This method will retrieve all sentences containing any (if setop = $$$UNION) or all (if setop = $$$INTERSECT) of the entities supplied through entitylist, optionally limited to all sentences in records satisfying filter. For querying Virtual Sources, set filter to a single, negative integer.

If stemming is enabled for this domain through $$$IKPSTEMMING, sentences containing any actual form of the entities in entityList will be returned. Use pActualFormOnly=1 to retrieve only those sentences containing the actual forms in entitylist. This argument is ignored if stemming is not enabled.

• classmethod GetByEntityIds(ByRef result, domainid As %Integer, entityidlist As %List, filter As %iKnow.Filters.Filter = "", page As %Integer = 1, pagesize As %Integer = 10, setop As %Integer = $$$UNION, pActualFormOnly As %Boolean = 0) as %Status

Retrieves all sentences containing the given entity IDs., optionally limited to all sentences in records satisfying filter. For querying Virtual Sources, set filter to a single, negative integer. In this case, entityidlist is expected to contain virtual Entity IDs.

See also GetByEntities for a description of the parameters.

• classmethod GetByPathIds(ByRef result, domainid As %Integer, pathidlist As %List, sourceidlist As %List, page As %Integer = 1, pagesize As %Integer = 10) as %Status

Retrieves all sentences containing the given path IDs.

See also GetByEntities for a description of the parameters.

• classmethod GetBySource(ByRef result, domainid As %Integer, sourceid As %Integer, page As %Integer = 1, pagesize As %Integer = 10) as %Status
Returns the sentences for the given source. A negative source ID is interpreted as a Virtual Source.
• classmethod GetCountByCrcIds(domainid As %Integer, crcidlist As %List, filter As %iKnow.Filters.Filter = "", setop As %Integer = $$$UNION, Output sc As %Status = $$$OK) as %Integer

Retrieves the number of sentences containing the given CRC ids, optionally limited to all sentences in records satisfying filter. For querying Virtual Sources, set filter to a single, negative integer. In this case, crcidlist is expected to contain virtual Entity IDs.

See also GetByEntities for a description of the parameters.

• classmethod GetCountByCrcMask(domainid As %Integer, master As %String = $$$WILDCARD, relation As %String = $$$WILDCARD, slave As %String = $$$WILDCARD, filter As %iKnow.Filters.Filter = "", setop As %Integer = $$$UNION, Output sc As %Status = $$$OK, pActualFormOnly As %Boolean = 0) as %Integer

Retrieves the number of sentences containing a CRC satisfying the given CRC Mask, optionally limited to all sentences in records satisfying filter. For querying Virtual Sources, set filter to a single, negative integer.

See also GetByEntities for a description of the parameters.

• classmethod GetCountByCrcs(domainid As %Integer, crclist As %List, filter As %iKnow.Filters.Filter = "", setop As %Integer = $$$UNION, Output sc As %Status = $$$OK) as %Integer

Retrieves the number of sentences containing the given CRCs, optionally limited to all sentences in records satisfying filter. For querying Virtual Sources, set filter to a single, negative integer.

See also GetByEntities for a description of the parameters.

• classmethod GetCountByDomain(domainid As %Integer, filter As %iKnow.Filters.Filter = "", Output sc As %Status = $$$OK) as %Integer

Returns the total number of sentences for a given domain, optionally filtered to those sources satisfying a %iKnow.Filters.Filter object passed in through filter.

• classmethod GetCountByEntities(domainid As %Integer, entitylist As %List, filter As %iKnow.Filters.Filter = "", setop As %Integer = $$$UNION, Output sc As %Status = $$$OK, pActualFormOnly As %Boolean = 0) as %Integer

Retrieves the number of sentences containing the given entities, optionally limited to all sentences in records satisfying filter. For querying Virtual Sources, set filter to a single, negative integer.

See also GetByEntities for a description of the parameters.

• classmethod GetCountByEntityIds(domainid As %Integer, entityidlist As %List, filter As %iKnow.Filters.Filter = "", setop As %Integer = $$$UNION, Output sc As %Status = $$$OK, pActualFormOnly As %Boolean = 0) as %Integer

Retrieves the nubmer of sentences containing the given entity ids. For querying Virtual Sources, set filter to a single, negative integer. In this case, entityidlist is expected to contain virtual Entity IDs.

See also GetByEntities for a description of the parameters.

If stemming is enabled for this domain through $$$IKPSTEMMING, sources containing any actual form of the entities in entityidlist will be returned. Use pActualFormOnly=1 to retrieve only those sources containing the actual forms in entityidlist. This argument is ignored if stemming is not enabled.

• classmethod GetCountByPathIds(domainid As %Integer, pathidlist As %List, sourceidlist As %List, Output sc As %Status = $$$OK) as %Integer

Retrieves the number of sentences containing the given path IDs.

See also GetByEntities for a description of the parameters.

• classmethod GetCountBySource(domainid As %Integer, sourceidlist As %List, Output sc As %Status = $$$OK) as %Integer

Returns the total number of sentences for the given sources. Negative Source IDs are interpreted as referring to Virtual Sources.

• classmethod GetHighlighted(pDomainId As %Integer, pSentenceId As %Integer, ByRef pHighlight="", vSrcId As %Integer = 0, Output pFullSentence="", Output pSC As %Status = $$$OK, pEscapeHTML As %Boolean = 1) as %String

Highlighting

This is a flexible method to highlight specific elements within a sentence using user-supplied markup passed in through the pHighlight argument (by reference) in a multidimensional form:

 set pHighlight("FLAG") = "markup"
 set pHighlight("FLAG", id) = "markup"

The first option will highlight any element of the type identified by "FLAG", the second option allows refining this to a particular instance, identified by id, overriding any eventual definitions at the generic "FLAG" level.

Note: unless explicitly stated otherwise, all highlighting is based on the entity level.

Markup options

Any single (opening) HTML tag can be specified on the value side of pHighlight and will automatically be wrapped around every entity. The closing tag will be automatically derived from the opening tag supplied through pHighlight

HTML markup supplied this way supports a basic means of annotating with metadata about the particular thing being highlighted. Any occurrences of "$$$ID" in the HTML tag will be substituted with the relevant identifier of what's being highlighted, such as entity IDs for entity markup, CRC IDs for CRC markup or match IDs for dictionary matching markup. Most entity-level markup also supports the $$$LITERAL tag to replace with the original text string for that entity.
For example, the following highlight spec would add links to an info page that takes entity IDs as a URL parameter:

 set tHighlight("ROLE", "concept") = ""

Note that in some cases, such as dictionary matches, ther may be multiple IDs associated with the same highlighted entity. These will be provided as a comma-separated list replacing the $$$ID placeholder.

As an alternative to HTML markup, you can also supply two-character strings that will be used to wrap entities that need highlighting. For example, this array will put square brackets around all concepts and curly braces around relationships:

 set tHighlight("ROLE", "concept") = "[]"
 set tHighlight("ROLE", "relation") = "{}"

Highlighting specific entities, CRCs and paths

To highlight all occurrences of a particular entity, stem, CRC, CC or path, use the corresponding flag. For entities, you can also supply the string value (except when the string value is an integer number itself).

 set tHighlight("ENTITY", 123) = ""
 set tHighlight("ENTITY", "snow storm") = ""
 set tHighlight("STEM", 234) = ""
 set tHighlight("CRC", 345) = ""
 set tHighlight("PATH", 456) = ""

Highlighting based on role

The "ROLE" flag can be used to mark concepts, relations and non-relevants, either by using the corresponding integer code (i.e. $$$ENTTYPECONCEPT) or a simple string value. Note that in some cases, some words inside a relationship entity may be marked as non-relevant. These will be highlighted at the word level (only if there is a specific highlighting spec for non-relevants) and are an exception to the general rule that all highlighting happens at the entity level.

 set tHighlight("ROLE", "concept") = ""
 set tHighlight("ROLE", "relation") = ""
 set tHighlight("ROLE", "non-relevant") = "()"
 set tString = "The newspaper published the article and it sold very well."
 write $system.iKnow.Highlight(tString, .tHighlight)

The above example would print:

(The) <c>newspaper</c> <r>published</r> (the) <c>article</c> <r>and (it) sold very well</r>.

Highlighting based on attributes

Attributes can be highlighted at two levels. Using the regular "ATTRIBUTE" flag will highlight all entities affected by the attribute specified by attribute ID (such as $$$IKATTNEGATION). However, some attributes support more fine-grained annotation at the word level, marking those words that actually caused the attribute to apply to an entity or part of a path. These can be highlighted individually through the "ATTRIBUTEWORDS" flag and are an exception to the general rule that highlighting happens per-entity.

 set tHighlight("ATTRIBUTE", $$$IKATTNEGATION) = ""
 set tHighlight("ATTRIBUTEWORDS", $$$IKATTNEGATION) = ""
 set tString = "The landlord doesn't accept late payments, but makes exceptions for students."
 write $system.iKnow.Highlight(tString, .tHighlight)

The above example would display as:

The landlord doesn't accept late payments, but makes exceptions for students.

Highlighting based on matching results

Dictionary matches can be highlighted using the "MATCH" flag, optionally restricted to a particular dictionary ID. To refine to a particular dictionary item, use the "MATCHITEM" flag. Highlighting can further be refined to distinguish based on full or partial matches using the "FULL" and "PARTIAL" flags as an additional subscript. Please note this is a refinement and the parent node (ID-specific or generic) should contain a value:

Additional information about the matches themselves is available through the metadata rewrite mechanism: $$$TERM, $$$TERMID, $$$ITEM, $$$ITEMID, $$$ITEMURI, $$$DICT, $$$DICTID. Note that the regular $$$ID markers will be replaced with dictionary match IDs, not the IDs of the Dictionary or Dictionary Items.

Highlighting based on character position

If external tooling provided annotations based on character positions, use the "CHARS" flag to highlight those annotations by providing the start and end positions as second and third subscripts of the highlight spec array. This will highlight the entities "covering" these start and end positions, starting with the entity which includes the character at the designated start position and ending with the entity including the character at the designated end position.

The above example will annotate the entire entities "instant Project X party" and "Haren".

Note that the iKnow indexing engine in certain cases may modify input text while processing text and therefore, character position based informations from external sources that based themselves on the original text, may no longer point to the expected positions. The two most important cases where this can happen is when User Dictionaries are used to rewrite the input explicitly or when duplicate whitespace is normalized by the engine. To work around this issue, present the output of the iKnow engine (as retrieved through GetValue to these external tools to be sure the same normalizations are applied.

In cases where the externally provided character positions span more than a single sentence, you can pass an offset as the data element of the main "CHARS" node to mark the character position that corresponds the start of this sentence. This should be easier than recalculating all character positions and allows you to reuse the entire array for successive calls to GetHighlighted.

Style precedence

For the purpose of HTML styling precedence, this is the order in which tags are wrapped around entities, from innermost to outermost:

  1. ATTRIBUTEWORDS (wrapped around individual words)
  2. ATTRIBUTE - ID-specific (attribute type ID)
  3. ATTRIBUTE - generic
  4. ENTITY - ID-specific
  5. STEM - ID-specific
  6. CRC - ID-specific
  7. CC - ID-specific
  8. MATCHITEM - ID-specific (dictionary item ID)
  9. MATCH - ID-specific (dictionary ID)
  10. MATCHITEM - generic
  11. MATCH - generic
  12. PATH - ID-specific
  13. ROLE - ID-specific (role)
  14. CHARS

• classmethod GetLanguage(domainid As %Integer, sentenceid As %Integer, Output confidence As %Numeric = "", vSrcId As %Integer = 0) as %String

Retrieves the language of the given sentence, as derived by the Automatic Language Identification algorithm or, if ALI was disabled, the language specified when indexing this sentence.

The confidence level is returned as well through an output parameter. If the confidence level is 0, this means ALI was not used and the language was defined by the user loading the source.

If a Virtual Source ID is specified, the sentence ID is treated as a virtual one, in the context of the supplied vSrcId.

• classmethod GetNewBySource(ByRef result, domainid As %Integer, sourceid As %Integer, length As %Integer = 5, filter As %iKnow.Filters.Filter = "", algorithm As %String = $$$NEWENTSIMPLE, algorithmParams As %List = "", newEntitiesWindow As %Integer = 100, blackListIds As %List = "") as %Status

Retrieves the sentences with the most significant concepts compared to the rest of the domain (or optionally a subset thereof as filtered through filter). This array of sentences is based on results of the GetNewBySource query in %iKnow.Queries.EntityAPI, using the supplied algorithm and parameter values. The scores of the first [newEntitiesWindow] concepts are aggregated across sentences to produce the result of this query.

Please refer to the documentation of the GetNewBySource query in %iKnow.Queries.EntityAPI for more details on the parameters and available algorithms.

• classmethod GetPartLiteral(domainId As %Integer, sentenceId As %Integer, position As %Integer, vSrcId As %Integer = 0) as %String
Returns the literal of the entity or nonrelevant at the specified position.
• classmethod GetParts(ByRef result, domainid As %Integer, sentenceid As %Integer, includeCRCMarkers As %Boolean = 0, includePathMarkers As %Boolean = 0, vSrcId As %Integer = 0) as %Status

Returns the elements (concepts, relations and nonrelevants) that make up the sentence, optional including markers for the beginning and end of any CRCs or Paths in the sentence. This information can be used to display the sentence value (see also GetValue) and/or highlight specific elements of interest.

Output structure:
result(pos) = $lb(entOccId, entUniId, entity, role)
when includeCRCMarkers = 1, adds
result(pos, [CRCMASTER | CRCRELATION | CRCSLAVE]) = $lb(crcOccId, crcUniId)
when includePathMarkers = 1, adds
result(pos, [PATHBEGIN | PATHEND]) = $lb(pathId)

Note: the subscript levels for CRC and Path markers are not available in the QAPI and WSAPI versions of this query.

If a Virtual Source ID is specified, the sentence ID is treated as a virtual one, in the context of the supplied vSrcId.

• classmethod GetPosition(domainId As %Integer, sentenceId As %Integer, vSrcId As %Integer = 0) as %Integer
Returns the position within the source this sentence occurs at (1-based).
• classmethod GetSourceId(domainId As %Integer, sentenceId As %Integer) as %Integer
Returns the source ID in which the supplied sentence ID occurs
• classmethod GetValue(domainid As %Integer, sentenceid As %Integer, Output fullSentence As %Boolean = 1, vSrcId As %Integer = 0) as %String

This method rebuilds a sentence based on the literals and entities it is composed of.

The string returned is the first part, up to the maximum string length, whereas the output parameter fullSentence is an array containing all the parts in the right order, containing a %Boolean value at the top level indicating whether the returned string is the full sentence (1) or (if 0) the user should have to look into this array to learn the full sentence.

If a Virtual Source ID is specified, the sentence ID is treated as a virtual one, in the context of the supplied vSrcId.



Copyright © 1997-2019, InterSystems Corporation