Domain Parameters
InterSystems has deprecated InterSystems IRIS® Natural Language Processing (NLP). It may be removed from future versions of InterSystems products. The following documentation is provided as reference for existing users only. Existing users who would like assistance identifying an alternative solution should contact the WRCOpens in a new tab.
This appendix lists the available domain parameters. Domain parameter names are case-sensitive. Each domain parameter has a %IKPublic macro equivalent (for example, $$$IKPFULLMATCHONLY). The recommended programming practice is to specify a domain parameter by its macro equivalent, not its parameter name. For information on setting domain parameters, refer to “Defining an NLP Domain”.
A domain parameter can be defined for a specific domain, or can be defined “system-wide,” which means defined for all current and future domains in the current namespace.
Domain parameters are divided into two groups, Basic and Advanced. Basic parameters are useful for customizing NLP default behavior. Advanced parameters significantly change NLP behavior and performance, and should be used with caution.
Parameter | Description |
---|---|
DefaultConfig | $$$IKPDEFAULTCONFIG: A string specifying the name of the NLP Configuration used for the domain. By default, no Configuration is assigned to a domain; this is indicated by returning the string DEFAULT. You can specify the name of any Configuration defined in the current namespace. The Configuration must exist when you assign it to this parameter. If the Configuration does not exist, this parameter remains unchanged. The DefaultConfig value can be overridden by SetConfig() or ProcessBatch()Opens in a new tab. |
EnableNgrams | $$$IKPENABLENGRAMS: A boolean parameter. If set to 1, NLP generates n-grams for the domain. n-Grams are used for Similarity entity matching within words. If set to 0, NLP matches only parts, whole words or the beginning letters of words. The default is 0. At the domain level, you can only change the EnableNgrams setting for an empty domain (a domain the does not yet contain NLP data). At the system-wide level, you can only change the EnableNgrams setting if there are no domains containing text data in the current namespace. n-Gram matching greatly increases the size of the data stored by NLP, and can have a significant performance impact. It should only be enabled when required. You should not use n-gram matching with most languages. However, matching operations on German text often requires n-gram matching. |
IgnoreDuplicateExtIds | $$$IKPIGNOREDUPLICATEEXTIDS: A boolean parameter. If set to 1, NLP does not log an error if a source is loaded that has the same external Id as an already-loaded source. 1 is the recommended setting when loading sources from a location previously loaded in order to include the new sources added since the last load. The default is 0. |
IgnoreEmptyBatch | $$$IKPIGNOREEMPTYBATCH: A boolean parameter. If set to 0, the Loader issues a $$$IKNothingToProcess error when a batch load is specified that specifies no sources. If set to 1, the Loader does not generate this error. The default is 0. |
MAT:DefaultProfile |
$$$IKPMATDEFAULTPROFILE: This parameter takes the name of a user-defined matching profile that you wish to establish as the default for the domain. The matching profile must exist when you assign it to this parameter. If the user-defined matching profile is defined as namespace-wide (not specific to a domain), you must add a zero colon (0:) preface to the name. For example, "0:NoDomainProfile". If you do not set this parameter, the NLP default matching profile is used as the domain default. For further details, refer to Defining a Matching Profile in the “Smart Matching: Using a Dictionary” chapter. You can override the domain default matching profile by specifying a custom matching profile in the MatchSource() or MatchSources() methods. |
MAT:SkipRelations | $$$IKPMATSKIPRELATIONS: A boolean parameter for dictionary matching. A value of 1 (the default) specifies that only concepts, not relations, will be matched during entity matching. (Relations are matched during CRC and path matching operations.) Skipping relation entity matching can significantly improve performance. A value of 0 performs relation entity matching. You should only set this parameter to 0 if your dictionary includes single-entity terms that are relations. |
SortField |
$$$IKPSORTFIELD: A boolean parameter. Every NLP entity has two integer counts associated with it: frequency (number of occurrences of an entity in all sources) and spread (number of sources in which the entity occurs). If set to 0, the NLP default is to sort by frequency ($$$SORTBYFREQUENCY). If set to 1, the NLP default is to sort by spread ($$$SORTBYSPREAD). The default is 0. At the domain level, you can only change the SortField setting for an empty domain (a domain the does not yet contain NLP data). At the system-wide level, you can only change the SortField setting if there are no domains containing text data in the current namespace. Alternatively, you can change this sort order domain-wide by specifying a second parameter (sortField) to the Create()Opens in a new tab or GetOrCreateId()Opens in a new tab method of the %iKnow.DomainOpens in a new tab class. 0=sort by frequency (the default). 1=sort by spread. The sort order default should only be changed when required. You can only change the sort order default for an empty domain (a domain that does not yet contain NLP data). In certain individual queries you can also specify sort order ($$$SORTBYFREQUENCY or $$$SORTBYSPREAD) or use the current domain default ($$$SORTBYDOMAINDEFAULT). |
Status | $$$IKPSTATUS: A boolean parameter. If set to 1, NLP displays detailed status information on the progress of the source loading process. If set to 0, NLP does not display this information. The default is 0. |
Stemming | $$$IKPSTEMMING: A boolean parameter. If set to 1, NLP activates stemming on the domain. If set to 0, NLP does not perform stemming. The default is 0. |
Parameter | Description |
---|---|
EntityLevelMatchOnly | $$$IKPENTITYLEVELMATCHONLY: A boolean parameter that is used to limit the types of match operations performed when matching against a dictionary. By default NLP matches entities, CRCs, paths, and sentences. You can set the this parameter to limit matching to entities only. Note that this may result in a much larger number of match results. The default is 0. Changing this parameter affects any subsequent match operations for this domain. Therefore, sources already matched before you changed this parameter must be explicitly re-matched to reflect this change. |
FullMatchOnly | $$$IKPFULLMATCHONLY: A boolean parameter that is used to limit the types of match operations performed when matching against a dictionary. You can set this parameter to restrict matching to exact matches only. When this option is set (1), partial matches and disordered matches are ignored. The default is 0. Changing this parameter affects any subsequent match operations for this domain. Therefore, sources already matched before you changed this parameter must be explicitly re-matched to reflect this change. |
LanguageFieldName | $$$IKPLANGUAGEFIELDNAME: A string that specifies the name of a metadata field. When set to an existing metadata field that was populated during source loading, NLP uses that metadata field’s value (if set) as the language to be used when processing the corresponding source. This option overrides automatic language identification. The metadata field value must be a two-letter ISO language code (see $$$IKLANGUAGES) for a language that has been specified in the current configuration object. |
MAT:StandardizedForm | $$$IKPMATSTANDARDIZEDFORM: To enable standardized-form matching for a domain, set this domain parameter to the desired standardization function, as follows: SET stat=domain.SetParameter("MAT:StandardizedForm","%Text"). Standardized-form matching supports matching different forms of the same word in source text with a single dictionary term, for example singular/plural or verb forms, using the standardized form of the dictionary term. Any dictionary terms created in such a domain result in "standardized" dictionary elements, which are at match-time compared to the standardized forms of the entities in the sources. For the right standardization algorithm to be used, it's important dictionary terms are created with the right language annotation (parameter in CreateDictionary() methods) and sources are assigned the proper language (either through selecting the appropriate language model in a Configuration, or by using Automatic Language Identification). |
MetadataAPI | $$$IKPMETADATAAPI: A string that specifies which metadata API class to use to extend %iKnow.Queries.MetadataIOpens in a new tab. The default is %iKnow.Queries.MetadataAPIOpens in a new tab. You can only change the MetadataAPI setting for an empty domain (a domain the does not yet contain NLP data). |
QUERY:MinTopConceptLength | $$$IKPMINTOPCONCEPTLENGTH: An integer that specifies the smallest concept (fewest number of characters) that a GetTop() query can return. This parameter is used to filter meaninglessly short concepts from the GetTop() result. The default is 3, specifying that concepts that are 3 letters in length or larger are returned by GetTop(). This minimum character count is inclusive of spaces between words and punctuation symbols in a concept. |
SimpleExtIds | $$$IKPSIMPLEEXTIDS: A boolean parameter that is used to specify the format for external IDs for sources. If set to 0, NLP stores the full reference as the external ID. If set to 1, NLP stores the local reference as the external ID. The default is 0. You can only change the SimpleExtIds setting for an empty domain (a domain the does not yet contain NLP data). |
SkipExtIdCheck | $$$IKPSKIPEXTIDCHECK: A boolean parameter that specifies whether to check for duplicate external IDs. If set to 1, NLP skips checking whether a duplicate external ID already exists when loading sources. If set to 0, NLP checks for duplicate external IDs. The default is 0. |
UseEntityVectorsJP | $$$IKPUSEENTITYVECTORSJP: A boolean parameter that specifies whether a domain will employ the entity vector algorithm to parse and store Paths when a sentence is in Japanese. By default, use of entity vectors for Japanese language text is enabled. This is the preferred behavior in nearly all cases, as explained in Paths, in the Conceptual Overview. If UseEntityVectorsJP is set to 0, NLP will analyze Japanese sentences using the same algorithm which it employs to identify Paths in Western language text.
At the domain level, you can only change the UseEntityVectorsJP parameter for an empty domain (a domain the does not yet contain NLP data). At the system-wide level, you can only change the UseEntityVectorsJP parameter if there are no domains containing text data in the current namespace. |