%iFind.Index.Basic
index class %iFind.Index.Basic extends %iFind.Index.Minimal
This index class provides text search capabilities to perform word-level searches through text in the %String or %Stream properties being indexed, for persistent classes using default storage.
Defining an iFind index
An iFind index can be defined in the class as follows:
Class ThePackage.MyClass Extends %Persistent { Property MyStringProperty As %String; Index MyBasicIndex On (MyStringProperty) As %iFind.Index.Basic; }
A number of parameters can be configured in order to refine the indexing behavior, such as whether to support case-sensitive search (LOWER), which language to use when indexing the text (LANGUAGE) or whether to enable stemming or decompounding (INDEXOPTION).
Querying an iFind index
Classes with an iFind index can subsequently be queried in SQL using the following syntax:
SELECT * FROM ThePackage.MyClass WHERE %ID %FIND search_index(MyBasicIndex, 'interesting')
This will return all the records containing the word "interesting". The following table lists a few sample search strings illustrating more advanced iFind search syntax.
Search string | What will be retrieved |
---|---|
structure | All records containing the word "structure" |
logical structure | All records containing both the words "logical" and "structure" (implicit AND) |
logical structure* | Same, but with anything that starts with "structure" (wildcard search) |
"logical structure" | All records containing the word "structure" right behind "logical" (positional search) |
"logical ? structure" | All records containing the words "logical" and "structure" with exactly one word in between (positional search) |
"logical [0-5] structure" | Positional again, but with up to 5 words between |
[logical, structure, 5]" | All records containing the words "logical" and "structure", but with up to 5 words between |
[logical structure, diagram, 3-6]" | All records containing the phrase "logical structure" and the word "diagram" again, with between 3 and 6 words between |
It's also possible to use AND, OR and NOT, as well as parentheses to combine words into more complex search strings, other than the implicit AND which is the default behavior for multi-word search:
Search string | What will be retrieved |
---|---|
Fixed | All records containing the word "fixed" |
Fixed and stored | All records containing "fixed" and "stored" |
Fixed and not stored | All records containing "fixed" but not "stored" |
Fixed and not "stored procedure" | All records containing "fixed" but not the positional string "stored procedure" |
fixed and ("stored procedure" or "default parameters") | All records containing "fixed" and either "stored procedure" or "default parameters" |
Fixed and \not | All records containing the words "fixed" and "not" |
Fixed \and \not | All records containing "fixed", "and" and "not" |
not generated | All records not containing "generated" |
\not generated | Implicit AND of "not" and "generated" |
Besides the name of the iFind index and the search string, the search_index() function supports two more optional parameters:
search_index(index_name, search_string [, search_option [, search_language]]
The search_option defines whether to search for exact occurrences of words in the search string (search_option=0), which is the default, to look for words that correspond to the same "normalized" form, based on a particular transformation. For example, stemming will normalize conjugated words to their base form and allow you to search for any conjugated form that corresponds to the same base form. Similarly, decompounding will normalize words even further by splitting up compound words in the atomic words it consists of (see also %iKnow.Stemming.DecompoundUtils). The following values can be used for search_option:
- search_option = 0 will perform a regular search, without any transformations. This is the default.
- search_option = 1 is a shorthand for stemmed search, using the default stemmer for the current language (see also %iKnow.Stemmer), optionally overridden by the STEMMINGCONFIG
- search_option = 2 is a shorthand for decompounded search, relying on the same settings as stemming described in the previous bullet
- search_option = 3 is a shorthand for fuzzy search, which will match any word that has at most 2 characters differing from the search term. (Note that this is not a transformation, strictly speaking).
- search_option = '3:n' will also perform fuzzy search, but the maximum edit distance (different characters) is now n
- search_option = 4 will perform a regular expression search
- search_option = '*' is a shorthand for searching based on all the transformations defined for this index
- search_option = 'string' will perform the tranformation identified by string (see also TRANSFORMATIONSPEC)
The search_language argument enables filtering records to those in a particular language, in cases where the indexed property contains text in multiple languages (LANGUAGE = "*"). This language is also passed on to an eventual word transformation method when search_option != 0.
If the IFINDMAPPINGS index is set to 1, the following additional SQL projections will be generated:
- [class_name]_[index_name]_WordSpread: stores the total number of records in which this word appears in this index. See also %iFind.Index.AbstractWordSpread.
- [class_name]_[index_name]_WordPos stores which word occurs at which position in a record, so it can be joined to the AttributePos table. See also %iFind.Index.AbstractWordPos.
Method Inventory
Parameters
This parameter controls whether leading and trailing punctuation is ignored in its entirety or not for this index and associated searches. If set to 0, the index will also track words with adjacent punctuation to enable specifically searching for those occurrences.
This parameter controls which characters are retained at the start and end of a word when calculating the "stripped" version of a word that will be indexed along with the original word as it appeared in the text.
This parameter only applies if IGNOREPUNCTUATION is set to false (the default for Basic, Semantic and Analytic indices).
The %iFind.Rank.Abstract implementation to use for ranking search results using the auto-generated rank SQL procedure "[package name].[class name]_[index name]Rank"
Methods
This SQL procedure returns the text indexed by pRecordID, in which all matches of the supplied pSearchString are highlighted using pTags.
SELECT %ID, Title, SomePackage.TheTable_MyIndexHighlight(%ID, 'cocktail* OR (hammock AND NOT bees)') FROM SomePackage.TheTable WHERE %ID %FIND search_index(MyIndex, 'cocktail* OR (hammock AND NOT bees)') ORDER BY 4 DESC
pTags is a comma-separated list of tags to use for highlighting. If only a single one is supplied, it will be used to highlight all matches of search terms. If a second one is supplied, it will be used for all terms in a NOT node of the search tree (such as 'bees' in the above example), while the first will be used for all other terms.
pLimit can be used to limit the text to a maximum number of hits rather than returning the entire, highlighted text. pSearchOption can be used as in other iFind search operations, for example to also mark fuzzy matches or stem matches.
This SQL procedure returns the score expressing how well the record identified by pRecordID matches pSearchString, based on the ranking algorithm defined by RANKERCLASS.
SELECT %ID, Title, FullText, SomePackage.TheTable_MyIndexRank(%ID, 'cocktail* OR (hammock AND NOT bees)') FROM SomePackage.TheTable WHERE %ID %FIND search_index(MyIndex, 'cocktail* OR (hammock AND NOT bees)') ORDER BY 4 DESC
pSearchOption can be used as in other iFind search operations, for example to also accept fuzzy matches or stem matches when calculating the rank score.