Skip to main content

Domain Explorer

Important:

InterSystems has deprecated InterSystems IRIS® Natural Language Processing (NLP). It may be removed from future versions of InterSystems products. The following documentation is provided as reference for existing users only. Existing users who would like assistance identifying an alternative solution should contact the WRCOpens in a new tab.

The Domain Explorer enables you to analyze the data in an NLP domain by looking at specific entities. This user interface is part of the Domain Architect.

All functionality provided via this tool is also available by using ObjectScript to invoke NLP class methods and properties.

Introduction

There are two ways to access the Domain Explorer:

  • From the Management Portal Analytics option select the Text Analytics option. This displays the Domain Explorer option. When you select this option it prompts you to select an existing domain from a drop-down list.

  • From the Management Portal Analytics option select the Text Analytics option. Access the Domain Architect and create or access a domain. Once you have specified Data Locations and populated the domain with this data using the Build button, you can select Domain Explorer from the Tools tab. This displays the Domain Explorer as a separate browser tab with the current domain selected.

The Domain Explorer is a display interface with broad application. It shows a wealth of information about the source text data indexed in a domain. It initially displays a list of either the top (most-frequently-occurring) concepts, or the dominant (highest dominance) concepts. You can toggle between these two lists.

If you select an entity, the Domain Explorer provides analysis of similar entities and related concepts, and analysis of the appearance of the specified entity in larger text units (sources, paths, and CRCs). This provides a contextual at-a-glance view of what's in your data.

The Domain Explorer provides generic filters that support selecting subsets of the sources in a domain based on metadata criteria. This interface provides a sample of how NLP Smart Indexing can be used to quickly overview and navigate a large set of documents.

Domain Explorer Settings

By default, the Domain Explorer displays analysis of the domain that was current in Domain Architect or the domain you selected when you invoked the Domain Explorer.

To select another domain:

  1. Select the Gear icon at the upper right of the Domain Explorer. This displays the Settings box.

  2. The Settings box contains the Switch domain drop-down list. Select a domain from this list. By default, this list include the domains defined in the current namespace. If you select the Include other namespaces check box, the drop-down list includes domains defined in all namespaces.

To apply skiplists:

  1. Select the Sunglasses icon at the upper right of the Domain Explorer. If the domain has no defined skiplists, this icon does not appear.

  2. The Skiplists box contains check boxes for each defined skiplist. Select one or more, then click the Apply button.

To use stemming:

  1. Select the Gear icon at the upper right of the Domain Explorer. This displays the Settings box.

  2. If the domain is configured for stemming, the Settings box also contains the Use stems instead of entities and Show representation form for stems check boxes. If Use stems instead of entities is checked, the Domain Explorer performs stemming analysis and changes the Domain Explorer headings as follows: Top Concepts/Dominant Concepts becomes Top Stems/Dominant Stems, Similar Entities becomes Similar Stems, Related Concepts disappears, leaving Proximity Profile, and the CRCs tab disappears. If Show representation form for stems is checked, each stem is displayed as a representative word; if not checked, the stem itself is displayed. Both boxes are checked by default.

The number at the top right of the Domain Explorer is the number of sources loaded in the selected domain that are available for data analysis. This number can be limited by applying filters.

Listing All Concepts

The Domain Explorer initially provides concept analysis of the data sources loaded in the domain. There are two ways to list concepts, by frequency or by dominance. You can toggle between these two by selecting the frequency or dominance button:

  • Top Concepts: selecting the frequency button lists all concepts in the sources in descending order of frequency. If multiple concepts have the same frequency, the concepts are listed in descending collation order. Each concept is listed with its frequency (total number of occurrences in all sources) and spread (number of sources containing that concept). To view frequency counts for a single source, use the Indexing Results tool.

  • Dominant Concepts: selecting the dominance button lists all concepts in the sources in descending order of dominance score. If multiple concepts have the same dominance score, the concepts are listed in descending collation order. The dominance score is calculated by taking the dominance values for each source and using an averaging algorithm to determine the dominance of a concept across all loaded sources. Dominance values in a single source are integer values, with the most dominant concept given a dominance of 1000. To view dominance values for a single source, use the Indexing Results tool.

Analyzing a Specified Entity

There are two ways to display analysis of a specific entity:

  • Select a concept from either the Top Concepts or Dominant Concepts listings.

  • In the entry field in the top left corner you can type the first few characters (minimum of 2, not case-sensitive) of a word found in an entity, and the Domain Explorer displays a drop-down list of all of the existing entities that contain a word beginning with those characters. Select an entity from this drop-down list, then press the Explore! button. You can use this option to display Relations or Concepts; both types of Entities are shown in the drop-down list.

Selecting an entity displays two kinds of analysis of that entity: associated entities and specified entity in context.

Associated Entities

Selecting an entity displays the following listings:

  • Similar Entities: a list of concepts and relations that are similar to the specified entity, with the frequency (total number of occurrences in all sources) and spread (number of sources containing that concept) of each concept or relation. The first similar entity listed is always the specified entity itself. For a concept, this first listed entity is the same as the Top Concepts listing for that concept.

  • Related Concepts: selecting the related button displays a list of concepts that are related to the specified concept, with the frequency (total number of occurrences in all sources) and spread (number of sources containing that concept) each concept. A related concept is a concept that appears in a CRC with the specified concept.

  • Proximity Profile: selecting the proximity button displays the Proximity Profile table. This lists concepts associated by proximity to the specified concept, with a proximity score for each concept.

Selecting an entity from the Similar Entities, Related Concepts, or Proximity Profile listings changes all listings to analysis of that entity. It does not change the Top Concepts and Dominant Concepts listings.

Entity in Context

Selecting an entity also displays the following listings of that entity in context:

  • Sources: a list of source texts containing the specified entity (shown highlighted in green), along with the internal source ID (an integer) and external source ID. Sources are listed in descending order by internal source ID. The source text displays all sentences in the source that contain the entity; intervening sentences that do not contain the entity are not displayed, but are indicated by ellipsis (...); note that leading ellipsis is not shown when the first displayed sentence is not the first sentence in the source, and trailing ellipsis is always shown after the final sentence, even when the last displayed sentence is actually the last sentence in the source.

    Red text indicates negation, with the entities within the scope of the negation attribute in red letters. Negation scope is not necessarily the same as the corresponding path, sentence, or CRC.

    Selecting the Eye icon or clicking anywhere in the listing for a source displays the full text of the source. Each occurrence of the specified entity is highlighted and each negation scope text is shown in red letters in the full text. (The % option must be set to 100% to display all occurrence of the specified entity in this full text box.)

    Selecting the Arrow icon displays the Indexing Results tool.

  • Paths: a list of paths containing the specified entity. Paths are listed in descending order by ID. Note that because path IDs are assigned on a per-source basis, the same path text may be listed multiple times with different path IDs.

    The entities and attributes of the path are color coded and highlighted as described in the Indexing Results tool description of Indexed Sentences, with the addition of the Explore! entity appearing in yellow-orange.

    Selecting a path element changes all listings to analysis of that entity. It does not change the Top Concepts and Dominant Concepts listings.

    Selecting the Eye icon displays the full text of the source with the specified entity highlighted in green.

    Selecting the Arrow icon displays the Indexing Results tool.

  • CRCs: a list of Concept-Relation-Concept (CRC) sequences that contain the specified entity, with the frequency (total number of occurrences of that CRC in all sources) and spread (number of sources containing that CRC). Note that many CRCs contain only one concept: CR or RC. The entity type highlighting is the same as for Paths, except that Path-relevant Words are not part of CRCs and are therefore not displayed. Attributes are not highlighted in the CRCs listing.

    Selecting a CRC element changes all listings to analysis of that entity. It does not change the Top Concepts and Dominant Concepts listings.

    Selecting the Eye icon displays the Sources with selected CRCs box, listing each source that contains an instance of the CRC. The CRC is highlighted in green in the context of its sentence, and flagged with the Source ID of the source. A source ID listing can contain multiple sentences containing the specified CRC; intervening sentences that do not contain the CRC are indicated by ellipsis. From the Sources with selected CRCs box you can select the Eye icon for a source containing the CRC to display the full text of the source with the specified entity (not the CRC) highlighted in green.

Note:

If Japanese is the only language supported for the domain, the Domain Explorer display differs as follows: the Related Concepts and CRCs listings are not shown. An Entity Vectors listing is substituted for the Paths listing.

Full Text Box

The Eye icon displays the full text of a selected source. This text box is identified by the external ID of the source. For example, :SQL:1171:1171.

The source text is tagged as follows:

  • The specified entity is highlighted in green.

  • Red text indicates negation, with the entities within the scope of the negation attribute in red letters.

This full text box provides the following option buttons:

  • metadata: displays the metadata for the source. All sources are provided with a DateIndexed metadata field. This date stamp is represented as a UTC date and time in the Display format for your locale. It is truncated to whole seconds. To return to the source text, press the metadata button again.

  • highlight: performs no action.

  • indexing: displays the source text highlighted to indicate the types of entities, as follows:

    • Green: the specified entity (either a Concept or a Relation).

    • Blue: a Concept.

    • White: a Relation.

    • Light Blue: a Path-relevant Word.

    • Unmarked: a Non-relevant word.

    Negation scope text is displayed in red letters.

  • dictionaries: performs no action.

  • %: summarizes the source text. The default percentage is 100% (full text). Specifying a integer less than 100 and then pressing the % button summarizes the source text by reducing the text to (roughly) the specified size by eliminating sentences that are have a low relevancy score, when compared to the other sentences in the source. Summerization does not necessarily retain sentences that contain the specified entity.

Limiting the Sources to Analyze

You can limit the scope of your data analysis by using filters. A filter includes or excludes data sources that are loaded in the domain from analysis. By default, the Domain Explorer analyzes all data sources loaded in the domain.

  • The Filter icon (funnel) button at the top right of the Domain Explorer applies a filter, which includes or excludes sources from analysis based on the criteria you specify. You can specify several types of filters, and can apply more than one filter. Multiple filters can be associated with AND, OR, NOT AND, or NOT OR logic.

    To add a filter, select the filter type from the drop-down list, specify the filter criteria, then select the add button, then the Apply button. When adding multiple filters, you select the AND/OR logic option associating the filters after the add button and before the Apply button.

    When one or more filters are in effect, the Filter icon displays in green.

    The number to the left of the Filter icon indicates the number of sources included after applying the filters. If no filters are applied, this number is the total number of sources in the domain.

  • To remove a single filter, select the Filter icon, then select the black X next to the filter description, then select the Apply button. To remove all filters, select the Filter icon, then the Clear button, then the Apply button.

    The following filter types are supported:

    • Metadata: used to exclude sources by their metadata values. By default, all sources have DateIndexed metadata. To apply DateIndexed metadata, select this field, select an operator, and select a date value by clicking on the calendar icon, then selecting the desired day.

    • Source IDs: used to select sources for inclusion by source ID. You can specify a single source ID or a comma-separated list of source IDs.

    • Source ID Range: used to select sources for inclusion by source ID. You can range of source IDs by specifying the from and to range values. The range is inclusive of these values.

    • External IDs: used to select sources for inclusion by their external IDs. For example, :SQL:1171:1171. You can specify a single ID or a comma-separated list of IDs. External source IDs are listed in the Sources listing.

    • SQL: used to select sources for inclusion by specifying an SQL query.

FeedbackOpens in a new tab