%iKnow.Queries.SentenceAPI
class %iKnow.Queries.SentenceAPI extends %iKnow.Queries.AbstractAPI
Main Query API class to retrieve sentence information.Method Inventory
- GetAttributes()
- GetByCrcIds()
- GetByCrcMask()
- GetByCrcs()
- GetByEntities()
- GetByEntityIds()
- GetByPathIds()
- GetBySource()
- GetCountByCrcIds()
- GetCountByCrcMask()
- GetCountByCrcs()
- GetCountByDomain()
- GetCountByEntities()
- GetCountByEntityIds()
- GetCountByPathIds()
- GetCountBySource()
- GetHighlighted()
- GetLanguage()
- GetNewBySource()
- GetPartLiteral()
- GetParts()
- GetPosition()
- GetSourceId()
- GetValue()
Parameters
Methods
Returns all attributes for a given sentence. By default, only entity-level attributes are returned, with the wordPositions result column referring which words within the affected entities are actually attributed. Using pIncludePathAttributes, also path-level attributes (such as implied negation) can be returned, but these will have no values for the wordPositions column. Also note that the start and span columns for path-level results will refer to positions within those paths and not entity positions within the sentence. See also GetAttributes() in %iKnow.Queries.PathAPI and GetOccurrenceAttributes() in %iKnow.Queries.EntityAPI.
Any named attribute properties are also included through sub-nodes (not available through SQL or SOAP):
pResult(rowNumber, propertyName) = propertyValue
The returned wordPositions apply to the entities starting from start up to offset and only extend to the last attributed word position (there might be more words within the entity).
Retrieves all sentences containing the given CRC ids, optionally limited to all sentences in records satisfying filter. For querying Virtual Sources, set filter to a single, negative integer. In this case, crcidlist is expected to contain virtual Entity IDs.
See also GetByEntities() for a description of the parameters.
Retrieves all sentences containing a CRC satisfying the given CRC Mask, optionally limited to all sentences in records satisfying filter. For querying Virtual Sources, set filter to a single, negative integer.
See also GetByEntities() for a description of the parameters.
Retrieves all sentences containing the given CRCs, optionally limited to all sentences in records satisfying filter. For querying Virtual Sources, set filter to a single, negative integer.
See also GetByEntities() for a description of the parameters.
This method will retrieve all sentences containing any (if setop = $$$UNION) or all (if setop = $$$INTERSECT) of the entities supplied through entitylist, optionally limited to all sentences in records satisfying filter. For querying Virtual Sources, set filter to a single, negative integer.
If stemming is enabled for this domain through $$$IKPSTEMMING, sentences containing any actual form of the entities in entityList will be returned. Use pActualFormOnly=1 to retrieve only those sentences containing the actual forms in entitylist. This argument is ignored if stemming is not enabled.
Retrieves all sentences containing the given entity IDs., optionally limited to all sentences in records satisfying filter. For querying Virtual Sources, set filter to a single, negative integer. In this case, entityidlist is expected to contain virtual Entity IDs.
See also GetByEntities() for a description of the parameters.
Retrieves all sentences containing the given path IDs.
See also GetByEntities() for a description of the parameters.
Retrieves the number of sentences containing the given CRC ids, optionally limited to all sentences in records satisfying filter. For querying Virtual Sources, set filter to a single, negative integer. In this case, crcidlist is expected to contain virtual Entity IDs.
See also GetByEntities() for a description of the parameters.
Retrieves the number of sentences containing a CRC satisfying the given CRC Mask, optionally limited to all sentences in records satisfying filter. For querying Virtual Sources, set filter to a single, negative integer.
See also GetByEntities() for a description of the parameters.
Retrieves the number of sentences containing the given CRCs, optionally limited to all sentences in records satisfying filter. For querying Virtual Sources, set filter to a single, negative integer.
See also GetByEntities() for a description of the parameters.
Returns the total number of sentences for a given domain, optionally filtered to those sources satisfying a %iKnow.Filters.Filter object passed in through filter.
Retrieves the number of sentences containing the given entities, optionally limited to all sentences in records satisfying filter. For querying Virtual Sources, set filter to a single, negative integer.
See also GetByEntities() for a description of the parameters.
Retrieves the nubmer of sentences containing the given entity ids. For querying Virtual Sources, set filter to a single, negative integer. In this case, entityidlist is expected to contain virtual Entity IDs.
See also GetByEntities() for a description of the parameters.
If stemming is enabled for this domain through $$$IKPSTEMMING, sources containing any actual form of the entities in entityidlist will be returned. Use pActualFormOnly=1 to retrieve only those sources containing the actual forms in entityidlist. This argument is ignored if stemming is not enabled.
Retrieves the number of sentences containing the given path IDs.
See also GetByEntities() for a description of the parameters.
Returns the total number of sentences for the given sources. Negative Source IDs are interpreted as referring to Virtual Sources.
Highlighting
This is a flexible method to highlight specific elements within a sentence using user-supplied markup passed in through the pHighlight argument (by reference) in a multidimensional form:
set pHighlight("FLAG") = "markup" set pHighlight("FLAG", id) = "markup"
The first option will highlight any element of the type identified by "FLAG", the second option allows refining this to a particular instance, identified by id, overriding any eventual definitions at the generic "FLAG" level.
Note: unless explicitly stated otherwise, all highlighting is based on the entity level.
Markup options
Any single (opening) HTML tag can be specified on the value side of pHighlight and will automatically be wrapped around every entity. The closing tag will be automatically derived from the opening tag supplied through pHighlight
HTML markup supplied this way supports a basic means of annotating with metadata about the
particular thing being highlighted. Any occurrences of "$$$ID" in the HTML tag will be substituted
with the relevant identifier of what's being highlighted, such as entity IDs for entity markup,
CRC IDs for CRC markup or match IDs for dictionary matching markup. Most entity-level markup also
supports the $$$LITERAL tag to replace with the original text string for that entity.
For example, the following highlight spec would add links to an info page that takes entity IDs as a URL parameter:
set tHighlight("ROLE", "concept") = "<a href=""Example.MyEntityViewer.cls?entity=$$$ID"">"
Note that in some cases, such as dictionary matches, ther may be multiple IDs associated with the same highlighted entity. These will be provided as a comma-separated list replacing the $$$ID placeholder.
As an alternative to HTML markup, you can also supply two-character strings that will be used to wrap entities that need highlighting. For example, this array will put square brackets around all concepts and curly braces around relationships:
set tHighlight("ROLE", "concept") = "[]" set tHighlight("ROLE", "relation") = "{}"
Highlighting specific entities, CRCs and paths
To highlight all occurrences of a particular entity, stem, CRC, CC or path, use the corresponding flag. For entities, you can also supply the string value (except when the string value is an integer number itself).
set tHighlight("ENTITY", 123) = "<b>" set tHighlight("ENTITY", "snow storm") = "<b>" set tHighlight("STEM", 234) = "<strong title=""$$$LITERAL"">" set tHighlight("CRC", 345) = "<u>" set tHighlight("PATH", 456) = "<span style='border: 1px solid blue;'>"
Highlighting based on role
The "ROLE" flag can be used to mark concepts, relations and non-relevants, either by using the corresponding integer code (i.e. $$$ENTTYPECONCEPT) or a simple string value. Note that in some cases, some words inside a relationship entity may be marked as non-relevant. These will be highlighted at the word level (only if there is a specific highlighting spec for non-relevants) and are an exception to the general rule that all highlighting happens at the entity level.
set tHighlight("ROLE", "concept") = "<c>" set tHighlight("ROLE", "relation") = "<r>" set tHighlight("ROLE", "non-relevant") = "()" set tString = "The newspaper published the article and it sold very well." write $system.iKnow.Highlight(tString, .tHighlight)
The above example would print:
(The) <c>newspaper</c> <r>published</r> (the) <c>article</c> <r>and (it) sold very well</r>.
Highlighting based on attributes
Attributes can be highlighted at two levels. Using the regular "ATTRIBUTE" flag will highlight all entities affected by the attribute specified by attribute ID (such as $$$IKATTNEGATION). However, some attributes support more fine-grained annotation at the word level, marking those words that actually caused the attribute to apply to an entity or part of a path. These can be highlighted individually through the "ATTRIBUTEWORDS" flag and are an exception to the general rule that highlighting happens per-entity.
set tHighlight("ATTRIBUTE", $$$IKATTNEGATION) = "<span style='color: red;'>" set tHighlight("ATTRIBUTEWORDS", $$$IKATTNEGATION) = "<u>" set tString = "The landlord doesn't accept late payments, but makes exceptions for students." write $system.iKnow.Highlight(tString, .tHighlight)
The above example would display as:
The landlord doesn't accept late payments, but makes exceptions for students.
Highlighting based on matching results
Dictionary matches can be highlighted using the "MATCH" flag, optionally restricted to a particular dictionary ID. To refine to a particular dictionary item, use the "MATCHITEM" flag. Highlighting can further be refined to distinguish based on full or partial matches using the "FULL" and "PARTIAL" flags as an additional subscript. Please note this is a refinement and the parent node (ID-specific or generic) should contain a value:
Additional information about the matches themselves is available through the metadata rewrite mechanism: $$$TERM, $$$TERMID, $$$ITEM, $$$ITEMID, $$$ITEMURI, $$$DICT, $$$DICTID. Note that the regular $$$ID markers will be replaced with dictionary match IDs, not the IDs of the Dictionary or Dictionary Items.
set tHighlight("MATCH") = "<a href=""Example.MoreInfo.zen?uri=$$$ITEMURI"" style='border: 1px solid Tomato;' >" set tHighlight("MATCH", "FULL") = "<a href=""Example.MoreInfo.zen?uri=$$$ITEMURI""style='background-color: Tomato'>" set tHighlight("MATCHITEM", 123) = "<a href=""Example.MoreInfo.zen?uri=$$$ITEMURI"" style='border: 1px solid Lime;'>"
Highlighting based on character position
If external tooling provided annotations based on character positions, use the "CHARS" flag to highlight those annotations by providing the start and end positions as second and third subscripts of the highlight spec array. This will highlight the entities "covering" these start and end positions, starting with the entity which includes the character at the designated start position and ending with the entity including the character at the designated end position.
set tHighlight("CHARS", 13, 21) = "<a href=""www.imdb.com/title/tt1636826/"">" set tHighlight("CHARS", 71, 75) = "<a href=""http://www.haren.nl/"">" set tString = "The instant Project X party was not well-received by the cummunity of Haren in the Netherlands." write $system.iKnow.Highlight(tString, .tHighlight)
The above example will annotate the entire entities "instant Project X party" and "Haren".
Note that the iKnow indexing engine in certain cases may modify input text while processing text and therefore, character position based informations from external sources that based themselves on the original text, may no longer point to the expected positions. The two most important cases where this can happen is when User Dictionaries are used to rewrite the input explicitly or when duplicate whitespace is normalized by the engine. To work around this issue, present the output of the iKnow engine (as retrieved through GetValue() to these external tools to be sure the same normalizations are applied.
In cases where the externally provided character positions span more than a single sentence, you can pass an offset as the data element of the main "CHARS" node to mark the character position that corresponds the start of this sentence. This should be easier than recalculating all character positions and allows you to reuse the entire array for successive calls to GetHighlighted().
Style precedence
For the purpose of HTML styling precedence, this is the order in which tags are wrapped around entities, from innermost to outermost:
- ATTRIBUTEWORDS (wrapped around individual words)
- ATTRIBUTE - ID-specific (attribute type ID)
- ATTRIBUTE - generic
- ENTITY - ID-specific
- STEM - ID-specific
- CRC - ID-specific
- CC - ID-specific
- MATCHITEM - ID-specific (dictionary item ID)
- MATCH - ID-specific (dictionary ID)
- MATCHITEM - generic
- MATCH - generic
- PATH - ID-specific
- ROLE - ID-specific (role)
- CHARS
Retrieves the language of the given sentence, as derived by the Automatic Language Identification algorithm or, if ALI was disabled, the language specified when indexing this sentence.
The confidence level is returned as well through an output parameter. If the confidence level is 0, this means ALI was not used and the language was defined by the user loading the source.
If a Virtual Source ID is specified, the sentence ID is treated as a virtual one, in the context of the supplied vSrcId.
Retrieves the sentences with the most significant concepts compared to the rest of the domain (or optionally a subset thereof as filtered through filter). This array of sentences is based on results of the GetNewBySource query in %iKnow.Queries.EntityAPI, using the supplied algorithm and parameter values. The scores of the first [newEntitiesWindow] concepts are aggregated across sentences to produce the result of this query.
Please refer to the documentation of the GetNewBySource query in %iKnow.Queries.EntityAPI for more details on the parameters and available algorithms.
Returns the elements (concepts, relations and nonrelevants) that make up the sentence, optional including markers for the beginning and end of any CRCs or Paths in the sentence. This information can be used to display the sentence value (see also GetValue()) and/or highlight specific elements of interest.
Output structure:result(pos) = $lb(entOccId, entUniId, entity, role)when includeCRCMarkers = 1, addsresult(pos, [CRCMASTER | CRCRELATION | CRCSLAVE]) = $lb(crcOccId, crcUniId)when includePathMarkers = 1, addsresult(pos, [PATHBEGIN | PATHEND]) = $lb(pathId)
Note: the subscript levels for CRC and Path markers are not available in the QAPI and WSAPI versions of this query.
If a Virtual Source ID is specified, the sentence ID is treated as a virtual one, in the context of the supplied vSrcId.
This method rebuilds a sentence based on the literals and entities it is composed of.
The string returned is the first part, up to the maximum string length, whereas the output parameter fullSentence is an array containing all the parts in the right order, containing a %Boolean value at the top level indicating whether the returned string is the full sentence (1) or (if 0) the user should have to look into this array to learn the full sentence.
If a Virtual Source ID is specified, the sentence ID is treated as a virtual one, in the context of the supplied vSrcId.
Inherited Members
Inherited Methods
- %AddToSaveSet()
- %ClassIsLatestVersion()
- %ClassName()
- %ConstructClone()
- %DispatchClassMethod()
- %DispatchGetModified()
- %DispatchGetProperty()
- %DispatchMethod()
- %DispatchSetModified()
- %DispatchSetMultidimProperty()
- %DispatchSetProperty()
- %Extends()
- %GetParameter()
- %IsA()
- %IsModified()
- %New()
- %NormalizeObject()
- %ObjectModified()
- %OriginalNamespace()
- %PackageName()
- %RemoveFromSaveSet()
- %SerializeObject()
- %SetModified()
- %ValidateObject()