Caché SQL Reference
%CONTAINS
[Home] [Back] [Next]
InterSystems: The power behind what matters   
Class Reference   
Search:    

Matches a value to one or more phrases using word-aware matching.
Synopsis
scalar-expression %CONTAINS(word[,word...])
Arguments
scalar-expression A scalar expression (most commonly a data column) whose values are being compared with one or more word strings. Must be of data type %Text.
word An alphabetic string or comma-separated list of alphabetic strings to match with values in scalar-expression. A word should be a complete word, or a phrase consisting of several complete words. The word or phrase should be delimited with single quotes.
Description
The %CONTAINS predicate allows you to select those data values that match the string or strings specified in word. This comparison operation is word-aware; it is not a simple string match operation. If you specify more than one word, scalar-expression must contain all of the specified word strings. Word strings may be presented in any order. If word does not match any of the scalar expression data values, %CONTAINS returns the null string.
%CONTAINS is a collection predicate. It can only be used in the WHERE clause of a SELECT statement. %CONTAINS cannot be used as a predicate that selects fields for a JOIN operation.
%CONTAINS can be used on a %Text string or a character stream field.
To use %CONTAINS on a string, change the %String property to %Text, and set LANGUAGECLASS and MAXLEN property parameters. For example:
Property MySentences As %Text(LANGUAGECLASS = "%Text.English",MAXLEN = 1000);
Specifying a MAXLEN value (in bytes) is required for %Text properties.
To use %CONTAINS to search a character stream field, the stream field must be defined as type %Stream.GlobalCharacterSearchable. For example:
Property MyTextStream As %Stream.GlobalCharacterSearchable(LANGUAGECLASS = "%Text.English");
The %CONTAINS predicate is one of the few predicates that can be used on a stream field in a WHERE clause.
The available languages are English, French, German, Italian, Japanese, Portuguese, and Spanish. See the %Text package class documentation (in %SYS) in the InterSystems Class Reference for further details.
The system generates an SQLCODE -309 error if scalar-expression is neither data type %Text nor %Stream.GlobalCharacterSearchable.
The system generates an SQLCODE -472 error if scalar-expression is not a collection-valued field (property).
Word-Aware Matching
A word should always be a complete word or sequence of words. When the word argument(s) are single words, Caché uses language analysis rules, including punctuation analysis and stemming rules, to match only the specified word. For example, the word argument “set” would match the word “set” or “Set” (not case-sensitive), and also stem forms such as “sets” and “setting”. However, it would not match words such as “setscrew,” “settle,” or “Seth,” even though these words contain the specified string.
Stemming rules provide for matching between any two forms of the word stem. Stemming is performed on both the search term(s) and the searched text. For example, you can specify %CONTAINS('jumping') and match the word “jumps” in the text.
This word-aware matching is fundamentally different from the character-by-character string matching performed by the SQL Contains operator ([).
Multiple-Word Phrases
The %CONTAINS keyword operator can search for multiple-word word strings, such as 'United States of America'. How these searches are conducted depends upon the setting of the NGRAMLEN property for the class. By default, NGRAMLEN=1, which means that %CONTAINS treats a word argument as a single word, regardless of how many words it actually contains.
If NGRAMLEN is equal or greater than the number of words in a word phrase, the phrase is treated as a word-aware test. That is, comparisons are not case-sensitive, and comparison is word-by-word. Thus 'School of Art' would match 'School of Art', or 'school of art', but not 'School of Arthur Murray'.
If NGRAMLEN is less than the number of words in a word phrase, Caché consults the dictionary of the LANGUAGECLASS to select the least-frequently-occurring term of length NGRAMLEN (or less). For example, when NGRAMLEN=2 you may still use predicates such as myDoc %CONTAINS('big black dog'). In this case, the least-frequently-occurring term of length NGRAMLEN might be 'big black'. Caché then retrieves the myDoc property (typically from the master map), and the document is re-parsed to determine if the post-stemming representation of the query pattern appeared in the post-stemming representation of myDoc. This re-parsing is expensive. In the case of the Japanese language, which does not require case conversion, some of the post-processing done by %CONTAINS is unnecessary overhead.
Contains Analysis
%CONTAINS matching is governed by the class parameters of the %Text.Text system class, found in the %SYS namespace. These parameters allow you to specify, among other things, whether comparison is to be case-sensitive or not case-sensitive, and the treatment of numbers, punctuation characters, and multi-word phrases.
Caché can use specific language analysis rules, including common word analysis (“noise word” lists) and stemming rules, to determine similarity. The available languages are English, French, German, Italian, Japanese, Portuguese, and Spanish.
For a much more detailed treatment of %CONTAINS and %Text, refer to the %Text package class documentation in the InterSystems Class Reference.
%CONTAINS and %CONTAINSTERM
The %CONTAINS and %CONTAINSTERM predicates perform the same word-aware comparison for their supplied word arguments. They differ in their requirements for multiple-word phrases:
The %CONTAINSTERM operator gives superior performance for certain types of comparisons, especially very large search sets. %CONTAINSTERM is preferable when performing comparisons in Japanese.
A %CONTAINS comparison is case-sensitive, unless the CASEINSENSITIVE class property is specified. %CONTAINSTERM comparisons use the collation type defined for the scalar-expression. By default, string data type fields are defined with SQLUPPER collation, which is not case-sensitive. The “Collation” chapter of Using Caché SQL provides details on defining the string collation default for the current namespace and specifying a non-default field collation type when defining a field/property. Because %CONTAINSTERM uses the field collation type, the CASEINSENSITIVE class property is not applicable to the %CONTAINSTERM comparison. Collation can optionally be set to %EXACT for case-sensitive operations.
Use of %CONTAINS in a search of text represented as a stream field requires that the stemmed, noiseword-filtered text be converted to a string. If such a stream is longer than the maximum length of a string, then applications should use %CONTAINSTERM instead of %CONTAINS, as there is no limitation on size of a stream when %CONTAINSTERM is in use. For information on the maximum length of a string, see the section Support for Long String Operations in the chapter Server Configuration Options in the Caché Programming Orientation Guide.
For further details, refer to the %CONTAINSTERM predicate.
Collection Predicates
%CONTAINS is a collection predicate. It can be used in most contexts where a predicate condition can be specified, as described in the Overview of Predicates page of this manual. It is subject to the following restrictions:
iKnow and iFind
The Caché iKnow text analysis tool and iFind text search tool also provide word-aware analysis. These facilities are entirely separate from %Text classes. They provide a substantially different and significantly more sophisticated form of textual analysis.
Examples
The following Embedded SQL example performs a %CONTAINS comparison with a literal phrase:
  &sql(SELECT name,stats
       INTO :badname, :badstat
       FROM Sample.Employee
       WHERE status %CONTAINS('an invalid value'))
The following Embedded SQL example uses a host variable to perform a %CONTAINS comparison:
  SET text="invalid"
  &sql(SELECT name,stats
       INTO :badname, :badstat
       FROM Sample.Employee
       WHERE status %CONTAINS(:text))
Other Equivalence Comparisons
You can perform other types of equivalence comparisons by using string comparison operators. These include the following:
For further details on these and other comparison conditional predicates, refer to the WHERE clause.
See Also