iKnow Architect
Caché provides the iKnow Architect as an interactive interface for creating and populating iKnow domains and performing analysis on the indexed data. iKnow Architect is accessed using the Caché Management Portal.
It consists of three tools:
-
Architect: for creating an iKnow domain and populating it with source text data.
-
Knowledge Portal: for analyzing the data in an iKnow domain by looking at specific entities.
-
Indexing Results: for displaying how iKnow analyzed the text data in a source, using highlighting to show different types of entities.
All functionality provided through the iKnow Architect is also available by using ObjectScript to invoke iKnow class methods and properties.
Accessing iKnow Architect
The starting point for accessing the iKnow Architect is the Management Portal System Explorer option. From there you select the iKnow option.
All iKnow domains exist within a specific namespace. Therefore, you must specify which namespace you wish to use by selecting the Switch option at the top of any Management Portal interface page. This displays the list of available namespaces, from which you can make your selection.
A namespace must be iKnow-enabled before it can be used. Selecting an iKnow-enabled namespace displays the iKnow Domain Architect option.
If selecting an iKnow-enabled namespace does not display the Domain Architect option, you do not have a valid iKnow license. Look at Licensed to in the Management Portal header. Review or activate your license key.
Enabling a Namespace
A namespace must be iKnow-enabled before it can be used with iKnow Architect.
-
If no namespaces are iKnow-enabled, the iKnow option displays the greyed out (disabled) message “No iKnow-enabled namespaces found”.
-
If the current namespace is not iKnow-enabled, the iKnow option displays a list of iKnow-enabled namespaces. You can select one of these displayed namespaces, and then select it from the Namespace Chooser window.
To enable a namespace for iKnow from the Management Portal, select System Administration, Security, Applications, Web Applications (System, Security Management, Web Applications). This displays a list of web applications; the third column indicates if a listed item is a namespace (“Yes”) or not. Select the desired namespace name from the list. This display the Edit Web Application page. Select the Enabled check box for iKnow. Click the Save button.
You cannot enable the %SYS namespace. This is because you cannot create iKnow domains in the %SYS namespace.
You can set your Management Portal default namespace. From the Management Portal select System Administration, Security, Users (System, Security Management, Users). Select the name of the desired user. This allows you to edit the user definition. From the General tab, select a Startup Namespace from the drop-down list. Click Save.
Creating a Domain
From the iKnow Architect press the New button to define a domain. You specify the following domain values (in the specified order):
-
Domain name: The name you assign to a domain must be unique for the current namespace (not just unique within its package class). A domain name may be of any length and contain any typeable characters, including spaces (the % character is valid, but should be avoided). Domain names are not case-sensitive. However, because iKnow Architect uses the domain name to generate a default domain definition class name, it is recommended that you follow class naming conventions when naming a domain, unless there are compelling reasons to do otherwise.
-
Definition class name: the domain definition package name and class name, separated by a period. If you first specified the domain name, clicking on the Definition class name generates default names for the domain definition package and class. The package name defaults to User. The class name defaults to the domain name, stripped of non-alphanumeric characters. You can accept or modify this default.
The package name and the class name can contain only alphanumeric characters, and are case-sensitive. Specifying a package name that differs from an existing package name only in lettercase results in an error. Within a package, specifying a class name that differs from an existing class name only in lettercase results in an error.
-
Allow Custom Updates: optionally select this box if you wish to enable adding data or dictionaries to this domain manually; the default is to not allow custom updates.
Click the Finish button to create the domain. This displays the Model Elements selection screen.
You must Save and Compile a newly created domain before exiting that domain.
Both domain names and domain definitions must be unique within the namespace. If a duplication occurs, iKnow Architect performs the following operations:
Duplicate domain: if you create two domain names in the namespace that are the same but have different domain definitions, the iKnow Architect will appear to create both domains. However, when attempting to compile the domains, compilation will fail for the second domain name.
Duplicate domain definitions: if you create two domains that have the different names but have the same domain definition, the iKnow Architect will overwrite the first domain with the second domain. This is a delete and replace operation, not a rename. iKnow Architect issues no message when performing this overwrite.
For other ways to create a domain, refer to iKnow Domains. Note that iKnow Architect is the only domain creation interface that allows you to define a domain definition package name and class name.
Opening a Domain
Creating a domain using the Management Portal interface immediately opens the domain, allowing you to begin immediately to manage this new domain.
To manage an existing domain, click the Open button to list all existing domains in the namespace. This display lists the packages that contain domains. Select a package to display its domains. Select an existing domain. This displays the Model Elements selection screen.
Changing the Domain Name and Check Boxes
Creating or opening a domain displays the Model Elements window. If you click on the domain name in this window, the Details tab displays the Domain Name field, the Domain Tables Package field, and the Allow Custom Updates and Disabled check boxes. You can modify these characteristics of the domain. Changing the Domain Name does not change the Definition class name.
Checking the Allow Custom Updates check box allows the manual loading of data sources and dictionaries into this domain using interfaces other than iKnow Architect.
Checking the Disabled check box prevents the loading of all data (source data, metadata, dictionary matching data) during the Build operation. Each of these types of data also has its own Disabled check box that allows you to disable loading of each types of data separately.
You must Save and Compile a renamed domain before exiting that domain.
Deleting a Domain
To delete the current domain, click the Delete button. This displays the Drop domain data window. you can either delete just the domain contents or delete the domain definition. Click Drop domain & definition class to delete the domain and its associated class definition, including the specifications of data sources, blacklists, and other model elements.
Model Elements
After creating a domain, or opening an existing domain, you can define model elements for the domain. To add or modify model elements, click on the expansion triangle next to one of the headings. Initially, no expansion occurs. Once you have defined some model elements, clicking the expansion triangle shows the model elements you have defined.
To add a model element, click the heading. Then click the Add button shown in the Details tab on the right side. Specify the name and values. The model element is automatically generated when you leave the Details area. Model elements are listed in the order of their creation, with the most-recently-created element at the top of the list; modifying a model element does not change its position in the list.
To modify a model element, expand the heading, then click a defined model element. The current values are shown in the Details tab on the right side. Modify the name and/or values as desired. The model element is automatically re-generated when you leave the Details area.
Once you have created model elements, clicking on the Expand All button (or one of the expansion triangles) displays these defined values. The Element Type column shows the type of each model element. Clicking on the red “X” deletes that model element.
The Save button saves all changes. The Domain Architect page heading is followed by an asterisk (*) if there are unsaved changes. Click Save to save your changes.
The Undo button reverses the most recent unsaved change. You can click Undo repeatedly to reverse unsaved changes in the reverse order that they were made. Once changes are saved, this button disappears.
The following Model Elements are provided:
Domain Settings
This model element allows you to modify the characteristics of the domain. All Domain Settings are optional and take default values. Domain Settings provides the following options:
-
Languages: select one or more languages that you wish iKnow to identify in the text data. If you check more than one language, automatic language identification is activated. This increases the processing required for texts. Therefore, you should not select multiple languages unless there is a real likelihood that texts in the selected language will be part of the data set. The default language is English.
-
Add Parameter: this button allows you to specify a domain parameter value. Specify the domain parameter name and the new value. Domain parameter names are case-sensitive. For example, Name=SortField, Value=1. No validation is performed. All unspecified domain parameters take their default values. To view the parameters that you have added, expand the Domain Settings heading.
-
Maximum Concept Length: the largest number of words that should be indexed as a concept. This option is provided to prevent a long sequence of words from being indexed as a concept. The default (0) uses the language-specific default for the maximum number of words. This default should be used unless there are compelling reasons to modify it.
-
Manage User Dictionary: this button displays a “Manage User Dictionary” box that allows you to specify one or more strings to the user dictionary. Each specified string either specifies a string that will rewrite to a new string, or specifies a string to which you assign an attribute label from a drop-down list.
Metadata Fields
Add Metadata: this button allows you to specify a source metadata field. For each metadata field you specify the field name, the data type (String, Number, or Date), the supported operators, and the storage type. After creating a domain, you can optionally specify one or more metadata fields that you can use as criteria for filtering sources. A metadata field is data associated with an iKnow data source that is not itself iKnow indexed data. For example, the date and time that a text source was loaded is a metadata field for that source. Metadata fields must be defined before loading text data sources into a domain.
Case Sensitive check box: By default, a metadata field is not case-sensitive; you can select this check box to make it case-sensitive.
Disabled check box: You can select the Disabled check box to disable all metadata fields, or you can select the Disabled check box displayed with an individual metadata field to disable just that metadata field. A disabled field is not loaded during the Build operation.
The metadata fields that you specify here appear in the Data Locations Add data from table and Add data from query details under the title “Metadata mappings”.
Data Locations
Specifies the source for adding data. Option are Add data from table, Add data from query, Add data from files, Add RSS data, and Add data from global.
-
The Drop existing data before build check box allows you to specify whether source text data already indexed in this iKnow domain should be deleted before adding the source text data specified here. To use this check box to drop data, data loading must not be disabled. To drop existing data without loading new data, use the Delete button Drop domain contents only option.
-
The Disabled check box allows you to disable source indexing; disabled source data is not loaded during the Build operation. If data loading is disabled, the Drop existing data before build check box is ignored.
A Build operation for a large number of texts may take some time. If you have already loaded the data locations and wish to add or modify metadata or a matching dictionary you can click the Data Locations Disabled check box to index these model elements without reloading the data locations.
After specifying data locations, you must Save and Compile the domain, then select the Build button to build the data indices.
Add Data from Table
This option allows you to specify data stored in an existing SQL table in the current namespace. It provides the following fields:
-
Name: you can either specify a name or take the default name for the extracted result set table. Follows SQL table naming conventions. The default name is Table_1 (with the integer incrementing for each additional extracted result set table you define).
-
Batch Mode: a check box indicating whether or not to load source text data in batch mode.
-
Schema: from this drop-down list select an existing schema in the current namespace.
-
Table Name: from this drop-down list select an existing table in the selected schema.
-
ID Field: from this drop-down list select a field from the selected table to serve as the ID field (primary record identifier). An ID field must contain unique, non-null values.
Selecting –custom– from the drop-down list allows you to input a field name; for example, a hidden RowId field or a field that does not (yet) exist. Field names are not case-sensitive. Selecting –custom– also displays the Show Default Options button. This button selects the first non-hidden field in the table from the drop-down list and also allows you to return to the drop-down list of fields.
-
Group Field: an SQL select-item expression that retrieves a secondary record identifier from the selected table. This field defaults to the initial ID Field selection.
Selecting –custom– from the drop-down list allows you to input a field name; for example, a hidden RowId field or a field that does not (yet) exist. Field names are not case-sensitive. Selecting –custom– also displays the Show Default Options button. This button selects the first non-hidden field in the table from the drop-down list and also allows you to return to the drop-down list of fields.
-
Data Field: from this drop-down list select a field from the selected table to serve as the data field. The data field contains the text data loaded for iKnow indexing.
Selecting –custom– from the drop-down list allows you to input a field name; for example, a hidden RowId field or a field that does not (yet) exist. Field names are not case-sensitive. Selecting –custom– also displays the Show Default Options button. This button selects the first non-hidden field in the table from the drop-down list and also allows you to return to the drop-down list of fields.
-
Where Clause: you can optionally specify an SQL WHERE clause to limit which records are included in the result set table. Do not include the WHERE keyword.
If you have defined one or more Metadata Fields for this domain, the Metadata mapping option allows you to specify a metadata field for this table. From the drop-down list you can select a field from the selected table, select – not mapped –, or select – custom –. If you select – custom – the Architect displays an empty field in which you can specify the custom mapping.
If you have not defined any Metadata Fields for this domain, the Metadata mapping option provides a Declare Metadata button that directs you to the Add Metadata domain option.
Add Data from Query
Add data from query is similar to Add data from table, but allows you to specify a fully-formed SQL query for an existing table (or tables), from which you provides the following fields:
-
Name: you can either specify a name or take the default name for the extracted result set table. Follows SQL table naming conventions. The default name is Query_1 (with the integer incrementing for each additional extracted result set table you define).
-
Batch Mode: a check box indicating whether or not to load source text data in batch mode.
-
SQL: the query text, a Caché SQL SELECT statement. Defining a query allows you to select fields from more than one table by using JOIN syntax. When specifying more than one table, assign column aliases to selected fields. Defining a query also allows you to specify an expression field that you can use as the Group field.
The following field selection drop-down lists display the selected fields. They do not display table alias prefixes. If the field has a column alias, this alias is listed rather than the field name.
-
ID Field: from this drop-down list select a field from the selected table to serve as the ID field. An ID field must contain unique, non-null values.
-
Group Field: from this drop-down list select a select-item expression (such as an SQL function expression) from the query to serve as a secondary record identifier (group field). For example, YEAR(EventDate).
-
Data Field: from this drop-down list select a field from the selected table to serve as the data field. The data field contains the text data loaded for iKnow indexing.
If you have defined one or more Metadata Fields for this domain, the Metadata mapping option allows you to select either – not mapped – or – custom – for each defined metadata field. The default is – not mapped –. If you select – custom – the Architect displays an empty field in which you can specify the custom mapping.
If you have not defined any Metadata Fields for this domain, the Metadata mapping option provides a Declare Metadata button that directs you to the Add Metadata domain option.
The Model Elements window Element Type column displays a truncated form of the query you defined; the query is truncated after the first table name in the FROM clause. The full query is shown in the Details window.
Add Data from File
This option allows you to specify data stored in files. It provides the following fields:
-
Name: you can either specify a name or take the default name for the extracted data file. The default name is File_1 (with the integer incrementing for each additional extracted data files you define).
-
Path: the complete directory path to the directory containing the desired files. The Path syntax is filesystem dependent; on a Windows system it might look like the following: C:\\temp\iKnowSources\
-
Extensions: the file extension, such as txt or xml. Do not include the dot prefix when specifying the file extension. Specify multiple extensions as a comma-separated list with no dots and no spaces; for example, txt,xml. If specified, only files with the specified extensions are included in the resulting extracted data. If the Extensions field is left blank (the default) all files are included, regardless of their extensions.
-
Filter Condition: a condition used to restrict which files are to included in the resulting extracted data.
-
Recursive: a check box indicating whether to select files recursively. When checked, data can be extracted from the files in the specified directory and files in all of its subdirectories, and their sub-subdirectories, etc. When not checked, data can be extracted only from files in the specified directory. The default is non-recursive (check box not checked).
-
Batch Mode: a check box indicating whether or not to load source text data in batch mode.
-
Encoding: a drop-down list of the types of character set encoding to use to process the files.
Add RSS Data
This option allows you to specify data from an RSS stream feed. It provides the following fields:
-
Name: you can either specify a name or take the default name for the extracted data. The default name is RSS_1 (with the integer incrementing for each additional RSS source you define).
-
Batch Mode: a check box indicating whether or not to load source text data in batch mode.
-
Server Name: the name of the host server on which the URL is found.
-
URL: the navigation path within the server address to the actual RSS feed.
-
Text Elements: a comma-separated list of text elements to load from the RSS feed. For example title,description. Leave blank for defaults.
Add Data from Global
This option allows you to specify data from a Caché global. It provides the following fields:
-
Name: you can either specify a name or take the default name for the extracted data. The default name is Global_1 (with the integer incrementing for each additional global source you define).
-
Batch Mode: a check box indicating whether or not to load source text data in batch mode.
-
Global Reference: The global from which you wish to extract the source data.
-
Begin Subscript: the first global subscript in a range of subscripts to include.
-
End Subscript: the last global subscript in a range of subscripts to include.
-
Filter Condition: a condition used to restrict which files are to included in the resulting extracted data.
Blacklists
Define blacklists: After creating a domain, you can optionally create one or more blacklists for that domain. A blacklist is a list of terms (words or phrases) that you do not want a query to return. Thus a blacklist allows you to perform iKnow operations that ignore specific terms in data sources loaded in the domain.
-
Name: specify the name of a new blacklist, or take the default name. Blacklist names are not case-sensitive. Specifying a duplicate blacklist name results in a compile error. The default name is Blacklist_1 (with the integer incrementing for each additional blacklist you define).
-
Entries: specify terms to include in the blacklist, one term per line. Terms should be in lower case. Duplicate terms are permitted. You can copy/paste terms from one blacklist to another. You can include blank lines to separate groups of terms. A line return at the end of your list of terms is optional; blank lines are not counted as entries.
If you add, modify, or delete a blacklist, you must Save and Compile the domain for this change to take effect.
Because defining blacklists has no effect on how data is loaded into a domain, changes to blacklists do not require re-building the domain.
Defining blacklists has no effect on how data is loaded into a domain. The blacklists defined here are compiled, then supplied to the Knowledge Portal, which allows you to specify none, one, or multiple blacklists when performing analysis of source text data loaded into the domain. A blacklist is applied to some (but not all) Knowledge Portal analytics.
Matching
The Matching option provides the Add Dictionary option to define a dictionary and specify its items and terms.
The Matching option provides four check box options, as follows:
-
Disabled: You can select the Disabled check box to disable building of all dictionaries, or you can select the Disabled check box displayed with an individual dictionary to disable the building of that dictionary. Selecting Disabled check boxes allows you to build only those dictionaries that you have changed. The default is off.
-
DropBeforeBuild: default on
-
AutoExecute: default on
-
IgnoreDictionaryErrors: default on
Add Dictionary
The Add Dictionary button displays the dictionary definition options: dictionary name (with a supplied default), an optional description, the dictionary language selected from a drop-down list of iKnow supported languages, and the disabled check box. The default name is Dictionary_1 (with the integer incrementing for each additional dictionary you define).
The Add Item button displays the item definition options: item name (with a supplied default), a uri name (with a supplied default), the item language selected from a drop-down list of iKnow supported languages, and the disabled check box. To define more items, select the dictionary name. Items are listed in order of creation, with the most recent at the top of the list. Within each item you can define one or more terms. The default name is Item_1, the default uri name is uri:1 (with the integer incrementing for each additional item you define for this dictionary).
The Add Term button displays the term definition options: a string specifying the term, the term language selected from a drop-down list of iKnow supported languages, and the disabled check box. To define more terms, select the item name. Terms are listed in order of creation, with the most recent at the top of the list.
Save, Compile, and Build
You must save, compile, and build a domain (in that order) using the buttons provided. You must save and compile a domain after adding, modifying, or deleting any Model Elements.
The Save button saves the current domain definition. Architect greys out (disables) the Save button if no domain definition is open. Architect does not issue an error if you save a domain definition without changing it.
The Compile button compiles the current domain definition. It compiles all of the classes and routines that comprise the domain definition. If you have not saved changes that you made to the domain definition, the compile operation prompts you to save the domain definition before compiling.
The Build button loads the specified sources into the current domain. If you have made changes to the Data Locations, Metadata Fields, or Matching dictionaries, you must build the domain. The Build Domain window displays progress messages such as the following:
13:50:48: Loading data... 13:51:49: Finished loading 3 sources 13:51:49: Creating dictionaries and profiles... 13:51:49: Finished creating 1 dictionaries, 1 items, 3 terms and 0 formats 13:51:49: Matching sources... 13:51:50: Finished matching sources 13:51:50: Successfully built domain 'mydomain'
The build operation can be time-consuming. If a Disabled check box is checked for a model element, the Build operation does not load the corresponding sources. Selecting Disabled check boxes allows you to build only those model elements that you have changed.
Knowledge Portal
The Tools tab provides the Knowledge Portal button. Once you have specified Data Locations and populated the domain with this data using the Build button, you can select Knowledge Portal to display iKnow analysis of the data. This displays the Knowledge Portal as a separate browser tab.
The Knowledge Portal is a Zen page query display interface with broad application. It shows a wealth of information about the source text data indexed in a domain. It initially displays a list of either the top (most-frequently-occurring) concepts, or the dominant (highest dominance) concepts. You can toggle between these two lists.
If you select an entity, the Knowledge Portal provides analysis of similar entities and related concepts, and analysis of the appearance of the specified entity in larger text units (sources, paths, and CRCs). This provides a contextual at-a-glance view of what's in your data.
The Knowledge Portal provides generic filters that support selecting subsets of the sources in a domain based on metadata criteria. This interface provides a sample of how iKnow Smart Indexing can be used to quickly overview and navigate a large set of documents.
Selecting a Domain
By default, the Knowledge Portal displays analysis of the domain that was current in iKnow Architect when you invoked the Knowledge Portal.
To select another domain:
-
Select the Gear icon at the upper right of the Knowledge Portal. This displays the Settings box.
-
The Settings box contains the Switch domain drop-down list. Select a domain from this list.
The number at the top right of the Knowledge Portal is the number of sources loaded in the selected domain that are available for data analysis. This number can be limited by applying filters.
Listing All Concepts
The Knowledge Portal initially provides concept analysis of the data sources loaded in the domain. There are two ways to list concepts, by frequency or by dominance. You can toggle between these two by selecting the frequency or dominance button:
-
Top Concepts: selecting the frequency button lists all concepts in the sources in descending order of frequency. If multiple concepts have the same frequency, the concepts are listed in descending collation order. Each concept is listed with its frequency (total number of occurrences in all sources) and spread (number of sources containing that concept). To view frequency counts for a single source, use the Indexing Results tool.
-
Dominant Concepts: selecting the dominance button lists all concepts in the sources in descending order of dominance score. If multiple concepts have the same dominance score, the concepts are listed in descending collation order. The dominance score is calculated by taking the dominance values for each source and using an averaging algorithm to determine the dominance of a concept across all loaded sources. Dominance values in a single source are integer values, with the most dominant concept given a dominance of 1000. To view dominance values for a single source, use the Indexing Results tool.
Analyzing a Specified Entity
There are two ways to display analysis of a specific entity:
-
Select a concept from either the Top Concepts or Dominant Concepts listings.
-
In the entry field in the top left corner you can type the first few characters (minimum of 2, not case-sensitive) of a word found in an entity, and the Knowledge Portal displays a drop-down list of all of the existing entities that contain a word beginning with those characters. Select an entity from this drop-down list, then press the Explore! button. You can use this option to display Relations or Concepts; both types of Entities are shown in the drop-down list.
Selecting an entity displays two kinds of analysis of that entity: associated entities and specified entity in context.
Associated Entities
Selecting an entity displays the following listings:
-
Similar Entities: a list of concepts and relations that are similar to the specified entity, with the frequency (total number of occurrences in all sources) and spread (number of sources containing that concept) of each concept or relation. The first similar entity listed is always the specified entity itself. For a concept, this first listed entity is the same as the Top Concepts listing for that concept.
-
Related Concepts: selecting the related button displays a list of concepts that are related to the specified concept, with the frequency (total number of occurrences in all sources) and spread (number of sources containing that concept) each concept. A related concept is a concept that appears in a CRC with the specified concept.
-
Proximity Profile: selecting the proximity button displays the Proximity Profile table. This lists concepts associated by proximity to the specified concept, with a proximity score for each concept.
Selecting an entity from the Similar Entities, Related Concepts, or Proximity Profile listings changes all listings to analysis of that entity. It does not change the Top Concepts and Dominant Concepts listings.
Entity in Context
Selecting an entity also displays the following listings of that entity in context:
-
Sources: a list of source texts containing the specified entity (shown highlighted in green), along with the internal source ID (an integer) and external source ID. Sources are listed in descending order by internal source ID. The source text displays all sentences in the source that contain the entity; intervening sentences that do not contain the entity are not displayed, but are indicated by ellipsis (...); note that leading ellipsis is not shown when the first displayed sentence is not the first sentence in the source, and trailing ellipsis is always shown after the final sentence, even when the last displayed sentence is actually the last sentence in the source.
Red text indicates negation, with the entities within the scope of the negation attribute in red letters. Negation scope is not necessarily the same as the corresponding path, sentence, or CRC.
Selecting the Eye icon or clicking anywhere in the listing for a source displays the full text of the source. Each occurrence of the specified entity is highlighted and each negation scope text is shown in red letters in the full text. (The % option must be set to 100% to display all occurrence of the specified entity in this full text box.)
Selecting the Arrow icon displays the Indexing Results tool.
-
Paths: a list of paths containing the specified entity. Paths are listed in descending order by ID. Note that because path IDs are assigned on a per-source basis, the same path text may be listed multiple times with different path IDs.
The elements of the path are highlighted by type:
-
Green: the specified entity (either a Concept or a Relation).
-
Blue: a Concept.
-
White: a Relation.
-
Light Blue: a Path-relevant Word.
Negation scope text is displayed in red letters.
Selecting a path element changes all listings to analysis of that entity. It does not change the Top Concepts and Dominant Concepts listings.
Selecting the Eye icon displays the full text of the source with the specified entity highlighted in green.
Selecting the Arrow icon displays the Indexing Results tool.
-
-
CRCs: a list of Concept-Relation-Concept (CRC) sequences that contain the specified entity, with the frequency (total number of occurrences of that CRC in all sources) and spread (number of sources containing that CRC). Note that many CRCs contain only one concept: CR or RC. The entity type highlighting is the same as for Paths, except that Path-relevant Words are not part of CRCs and are therefore not displayed.
Selecting a CRC element changes all listings to analysis of that entity. It does not change the Top Concepts and Dominant Concepts listings.
Selecting the Eye icon displays the Sources with selected CRCs box, listing each source that contains an instance of the CRC. The CRC is highlighted in green in the context of its sentence, and flagged with the Source ID of the source. A source ID listing can contain multiple sentences containing the specified CRC; intervening sentences that do not contain the CRC are indicated by ellipsis. From the Sources with selected CRCs box you can select the Eye icon for a source containing the CRC to display the full text of the source with the specified entity (not the CRC) highlighted in green.
If Japanese is the only language supported for the domain, the Knowledge Portal display differs as follows: the Related Concepts and CRCs listings are not shown. An Entity Vectors listing is substituted for the Paths listing.
Full Text Box
The Eye icon displays the full text of a selected source. This text box is identified by the external ID of the source. For example, :SQL:1171:1171.
The source text is tagged as follows:
-
The specified entity is highlighted in green.
-
Red text indicates negation, with the entities within the scope of the negation attribute in red letters.
This full text box provides the following option buttons:
-
metadata: displays the metadata for the source. All sources are provided with a DateIndexed metadata field. This date stamp is represented as a UTC date and time in the Display format for your locale. It is truncated to whole seconds. To return to the source text, press the metadata button again.
-
highlight: performs no action.
-
indexing: displays the source text highlighted to indicate the types of entities, as follows:
-
Green: the specified entity (either a Concept or a Relation).
-
Blue: a Concept.
-
White: a Relation.
-
Light Blue: a Path-relevant Word.
-
Unmarked: a Non-relevant word.
Negation scope text is displayed in red letters.
-
-
dictionaries: performs no action.
-
%: summarizes the source text. The default percentage is 100% (full text). Specifying a integer less than 100 and then pressing the % button summarizes the source text by reducing the text to (roughly) the specified size by eliminating sentences that are have a low relevancy score, when compared to the other sentences in the source. Summerization does not necessarily retain sentences that contain the specified entity.
Limiting the Sources to Analyze
You can limit the scope of your data analysis by using filters. A filter includes or excludes data sources that are loaded in the domain from analysis. By default, the Knowledge Portal analyzes all data sources loaded in the domain.
-
The Filter icon (funnel) button at the top right of the Knowledge Portal applies a filter, which includes or excludes sources from analysis based on the criteria you specify. You can specify several types of filters, and can apply more than one filter. Multiple filters can be associated with AND, OR, NOT AND, or NOT OR logic.
To add a filter, select the filter type from the drop-down list, specify the filter criteria, then select the add button, then the Apply button. When adding multiple filters, you select the AND/OR logic option associating the filters after the add button and before the Apply button.
When one or more filters are in effect, the Filter icon displays in green.
The number to the left of the Filter icon indicates the number of sources included after applying the filters. If no filters are applied, this number is the total number of sources in the domain.
-
To remove a single filter, select the Filter icon, then select the black X next to the filter description, then select the Apply button. To remove all filters, select the Filter icon, then the Clear button, then the Apply button.
The following filter types are supported:
-
Metadata: used to exclude sources by their metadata values. By default, all sources have DateIndexed metadata. To apply DateIndexed metadata, select this field, select an operator, and select a date value by clicking on the calendar icon, then selecting the desired day.
-
Source IDs: used to select sources for inclusion by source ID. You can specify a single source ID or a comma-separated list of source IDs.
-
Source ID Range: used to select sources for inclusion by source ID. You can range of source IDs by specifying the from and to range values. The range is inclusive of these values.
-
External IDs: used to select sources for inclusion by their external IDs. For example, :SQL:1171:1171. You can specify a single ID or a comma-separated list of IDs. External source IDs are listed in the Sources listing.
-
SQL: used to select sources for inclusion by specifying an SQL query.
-
Indexing Results
You can access the Indexing Results tool in two ways:
-
From the Caché Management Portal System Explorer iKnow option. All iKnow domains exist within a specific namespace. Therefore, you must specify which namespace you wish to use from the list of available namespaces. A namespace must be iKnow-enabled before it can be used. Selecting an iKnow-enabled namespace displays the iKnow Indexing Results option.
-
From the iKnow Knowledge Portal Tools tab Indexing Results button. Once you have specified Data Locations and populated the domain with this data using the Build button, you can select Indexing Results to display how iKnow has indexed the data. This displays the Indexing Results window as a separate browser tab.
At the top left of the Indexing Results window is a drop-down list that shows the sources loaded into the domain. (The domain is shown in the drop-down list at the top right.) Select a data source from the drop-down list, then press the manual input button.
This displays three listings: Indexed Sentences, Concepts, and CRCs
Indexed Sentences
The sentences in the source are listed in order, one sentence per line, with iKnow indexing indicated by highlighting. The sentence text is highlighted as follows:
-
Yellow: a concept.
-
Underlined: a relation.
-
Italic: a non-relevant word.
-
Red: a negation attribute phrase. The negation word is enclosed in a red box; multi-word negation terms (such as “was not”) are shown with each word enclosed in a red box. The concepts and relations included in the negation phrase are shown with their appropriate highlighting (yellow highlighting or underlining), with the text of the phrase shown in red. Non-relevants within the phrase are not shown in red.
Concepts and CRCs
The Indexing Results displays two listings, one of all concepts in the source, one of all of the CRCs in the source
-
Concepts in the source in descending order.
-
CRCs in the source highlighted (as above) to indicate concepts and relations, in descending order. Note that the CRCs listings do not indicate negation attributes, and do not include non-relevant words.
The sort by buttons at the top of the window allow you to toggle the Concepts and CRCs listings to display either frequency counts or dominance values in descending order.
In the Concepts listing, the most dominant concept(s) are given a dominance of 1000. Less dominant concepts are given smaller integer values, with larger sources tending to have lower least-dominant values. For example, a source containing 25 concepts might have a dominance range between 1000 and 83; a source containing 300 concepts might have a dominance range between 1000 and 2.
If Japanese is the only language supported for the domain, the Indexing Results display substitutes a single Entities listing for the Concepts and CRCs listings.