About This Book
This book describes how to use the NLP semantic analysis engine to access and analyze unstructured data on InterSystems IRIS® data platform. Commonly, unstructured data consists of a large number of text sources, such as a collection of newspaper articles or a collection of doctors’ notes. You load this text data into NLP, and then use NLP to retrieve meaningful information. NLP operates on data loaded from source texts; it does not modify source texts. NLP can perform semantic analysis on texts in Czech (cs), Dutch (nl), English (en), French (fr), German (de), Japanese (ja), Portuguese (pt), Russian (ru), Spanish (es), Swedish (sv), and Ukrainian (uk).
The book addresses a number of topics:
A Conceptual Overview, which describes the NLP approach to unstructured data and NLP architecture. It describes both what NLP is and what NLP is not, so that users can determine whether NLP is the best fit for their text access application.
NLP Implementation, which describes the implementation of NLP software in the ObjectScript environment and describes data source considerations. Data sources can be text files, SQL records, globals, RSS feeds, or any other type of text source.
Domain Architect, which describes the Management Portal interface for creating NLP components and populating them with data.
REST Interface, which describes the REST API interface for performing many NLP operations.
The operations performed through these interfaces are described in greater detail in the following chapters, which are specific for that type of operation.
NLP Queries describes how to write programs that can be run against data loaded into NLP. Attributes allow NLP queries to distinguish Negation and Sentiment attributes; terms flagged with these attributes affect query interpretation of a path or sentence. Dominance and Proximity provide more sophisticated calculations for comparing text elements, and Custom Metrics gives the user the means to extend and customize the calculations used for comparing text elements in queries.
Filtering Sources describes how to use various filters to limit the scope of queries to a subset of the data sources loaded into NLP. Text Categorization describes how to automatically assign data sources to categories based on an analysis of the contents of the source. Once assigned, this category metadata value can be used to filter sources or when querying sources.
The User Interfaces chapter describes several sample GUI interfaces for retrieving information. These interfaces use the queries, filters, and dictionaries described in the prior chapters.
NLP Tools describes the Terminal NLP Shell interface for displaying NLP components and data. This is a useful tool, but provides no additional NLP functionality. This chapter also describes the Data Upgrade utility for use on NLP domains created under an earlier version of NLP.
The KPIs and Dashboards chapter describes how to use queries as data sources for KPIs (key performance indicators) and how to display these KPIs on dashboards.
The Web Services chapter describes using NLP with Internet data.
The final two chapters describe advanced topics. Customizing NLP describes how to create additional NLP text processing facilities to supplement those supplied with NLP. Language Identification describes how to work with source texts in more than one language and with texts in languages that need special processing.
The Domain Parameters appendix provides a comprehensive list of available domain parameters. You can set these parameters to customize a domain, or set them for all domains systemwide.
For a detailed outline, see the Table of Contents.
The SQL Search, an SQL facility for performing text search operations uses many of the features of NLP.
Also see “Using Unstructured Data in Cubes” in Advanced Modeling for InterSystems Business Intelligence.