%iFind.Index.Basic
index class %iFind.Index.Basic extends %Library.FunctionalIndex, %Library.CacheEmbedded
This index class provides text search capabilities to perform word-level searches through text in the %String or %Stream properties being indexed, for persistent classes with regular Caché storage.
Defining an iFind index
An iFind index can be defined in the class as follows:
Class ThePackage.MyClass Extends %Persistent { Property MyStringProperty As %String; Index MyBasicIndex On (MyStringProperty) As %iFind.Index.Basic; }
A number of parameters can be configured in order to refine the indexing behavior, such as whether to support case-sensitive search (LOWER), which language to use when indexing the text (LANGUAGE) or whether to enable stemming or decompounding (INDEXOPTION).
Querying an iFind index
Classes with an iFind index can subsequently be queried in SQL using the following syntax:
SELECT * FROM ThePackage.MyClass WHERE %ID %FIND search_index(MyBasicIndex, 'interesting')
This will return all the records containing the word "interesting". The following table lists a few sample search strings illustrating more advanced iFind search syntax.
Search string | What will be retrieved |
---|---|
structure | All records containing the word "structure" |
logical structure | All records containing both the words "logical" and "structure" (implicit AND) |
logical structure* | Same, but with anything that starts with "structure" (wildcard search) |
(logical structure) | All records containing the word "structure" right behind "logical" (positional search) |
(logical ? structure) | All records containing the words "logical" and "structure" with exactly one search in between (positional search) |
(logical [0-5] structure) | Positional again, but with up to 5 words between |
It's also possible to use AND, OR and NOT, as well as parentheses to combine words into more complex search strings, other than the implicit AND which is the default behavior for multi-word search:
Search string | What will be retrieved |
---|---|
Fixed | All records containing the word "fixed" |
Fixed and stored | All records containing "fixed" and "stored" |
Fixed and not stored | All records containing "fixed" but not "stored" |
Fixed and not (stored procedure) | All records containing "fixed" but not the positional string "stored procedure" |
fixed and ((stored procedure) or (default parameters)) | All records containing "fixed" and either "stored procedure" or "default parameters" |
Fixed and \not | All records containing the words "fixed" and "not" |
Fixed \and \not | All records containing "fixed", "and" and "not" |
not generated | All records not containing "generated" |
\not generated | Implicit AND of "not" and "generated" |
(\not generated) | Positional search for the word sequence "not generated" |
Besides the name of the iFind index and the search string, the search_index() function supports two more optional parameters:
search_index(index_name, search_string [, search_option [, search_language]]
The search_option defines whether to search for exact occurrences of words in the search string (search_option=0), which is the default, to look for words that correspond to the same "normalized" form, based on a particular transformation. For example, stemming will normalize conjugated words to their base form and allow you to search for any conjugated form that corresponds to the same base form. Similarly, decompounding will normalize words even further by splitting up compound words in the atomic words it consists of (see also %iKnow.Stemming.DecompoundUtils). The following values can be used for search_option:
- search_option = 0 will perform a regular search, without any transformations. This is the default.
- search_option = 1 is a shorthand for stemmed search, using the default stemmer for the current language (see also %iKnow.Stemmer), optionally overridden by the STEMMINGCONFIG
- search_option = 2 is a shorthand for decompounded search, relying on the same settings as stemming described in the previous bullet
- search_option = 3 is a shorthand for fuzzy search, which will match any word that has at most 2 characters differing from the search term. (Note that this is not a transformation, strictly speaking).
- search_option = '3:n' will also perform fuzzy search, but the maximum edit distance (different characters) is now n
- search_option = 4 will perform a regular expression search
- search_option = '*' is a shorthand for searching based on all the transformations defined for this index
- search_option = 'string' will perform the tranformation identified by string (see also TRANSFORMATIONSPEC)
The search_language argument enables filtering records to those in a particular language, in cases where the indexed property contains text in multiple languages (LANGUAGE = "*"). This language is also passed on to an eventual word transformation method when search_option != 0.
Method Inventory
- DeleteIndex()
- Embedded()
- Find()
- Highlight()
- InsertIndex()
- Normalize()
- PurgeIndex()
- Rank()
- SortBeginIndex()
- SortEndIndex()
- StripCharacters()
- StrippedEntityId()
- StrippedWordId()
- UpdateIndex()
Parameters
When generating SQL projections of iFind index data using the IFINDMAPPINGS), this parameter overrides the naming of those classes, using this parameter's value instead of the default [class_name]_[index_name] prefix. The projections will still be created in the [package_name]_[class_name] package.
When this parameter is set to 1, additional SQL projections will be created upon compiling the class. These are accessible as read-only tables in a package named [package_name]_[class_name] and have names starting with [class_name]_[index_name] (which can be overridden through IFINDADVANCEDSQLPREFIX).
By default, the following mappings are generated for an %iFind.Index.Basic index:
- [class_name]_[index_name]_WordRec: stores which words appear in each record in this index. See also %iFind.Index.AbstractWordRec.
- [class_name]_[index_name]_WordSpread: stores the total number of records in which this word appears in this index. See also %iFind.Index.AbstractWordSpread.
- [class_name]_[index_name]_WordPos stores which word occurs at which position in a record, so it can be joined to the AttributePos table. See also %iFind.Index.AbstractWordPos.
Additional classes will be generated automatically, based on your index class and parameters. See the class reference for subclasses for more details.
- 0 = Do not store compounds or stems
- 1 = Store word-level stems
- 2 = Store word-level compounds and stems
See also %iKnow.Stemmer and %iKnow.Stemming.DecompoundUtils for more details on stemming or decompounding, or TRANSFORMATIONSPEC for advanced options to use custom transformations.
This parameter controls which characters are retained at the start and end of a word when calculating the "stripped" version of a word that will be indexed along with the original word as it appeared in the text.
The %iFind.Rank.Abstract implementation to use for ranking search results using the auto-generated rank SQL procedure "[package name].[class name]_[index name]Rank"
This parameter can be used to override the default stemming implementation when INDEXOPTION > 0. To do so, set this parameter to a saved %iKnow.Stemming.Configuration instance. This parameter has no effect if INDEXOPTION = 0.
This parameter is for advanced use only and empty by default.
This parameter defines the word transformation(s) to apply to input text, such as stemming, decompounding
and other operations for "normalizing" words, so searches can scan these normalized forms rather
than the literal forms.
This parameter cannot be set in conjunction with the INDEXOPTION
and/or STEMMINGCONFIG parameters, which are shorthands for configuring stemming
and decompounding options and overriding the default configurations for those.
This parameter also allows using custom transformations by specifying the name of a class that
inherits from %iFind.Transformation.Abstract, optionally followed by a colon and
string that will be passed onto the Transform method of the transformation class if it accepts
any parameters.
This parameter controls which user dictionary should be used to rewrite or annotate text before it is processed by the iKnow engine. See also the section on User Dictionaries in the iKnow documentation.
This parameter is for advanced use only and empty by default.
Methods
SELECT * FROM MyPackage.Table WHERE %ID %FIND search_index(<i>index_name</i>, <var>pSearch</var> [, <var>pOption</var> [, <var>pLanguage</var>]])
This SQL procedure returns the text indexed by pRecordID, in which all matches of the supplied pSearchString are highlighted using pTags.
SELECT %ID, Title, SomePackage.TheTable_MyIndexHighlight(%ID, 'cocktail* OR (hammock AND NOT bees)') FROM SomePackage.TheTable WHERE %ID %FIND search_index(MyIndex, 'cocktail* OR (hammock AND NOT bees)') ORDER BY 4 DESC
pTags is a comma-separated list of tags to use for highlighting. If only a single one is supplied, it will be used to highlight all matches of search terms. If a second one is supplied, it will be used for all terms in a NOT node of the search tree (such as 'bees' in the above example), while the first will be used for all other terms.
pLimit can be used to limit the text to a maximum number of hits rather than returning the entire, highlighted text. pSearchOption can be used as in other iFind search operations, for example to also mark fuzzy matches or stem matches.
This SQL procedure returns the score expressing how well the record identified by pRecordID matches pSearchString, based on the ranking algorithm defined by RANKERCLASS.
SELECT %ID, Title, FullText, SomePackage.TheTable_MyIndexRank(%ID, 'cocktail* OR (hammock AND NOT bees)') FROM SomePackage.TheTable WHERE %ID %FIND search_index(MyIndex, 'cocktail* OR (hammock AND NOT bees)') ORDER BY 4 DESC
pSearchOption can be used as in other iFind search operations, for example to also accept fuzzy matches or stem matches when calculating the rank score.