Because NLP typically handles large amounts of text data, the following InterSystems IRIS Data Platform™ performance considerations should be heeded when loading source text:
Before starting a batch load of a significant number of sources, stop database journaling. Once the batch load completes, make sure to restart journaling. Refer to the Journaling chapter of the Data Integrity Guide for information on stopping and restarting journaling.
Before starting a batch load of a significant number of sources (or a small number of very large sources), set the global buffer pool to a size large enough to handle this operation. NLP indexing creates a large number of temporary globals. If the global buffer pool is not large enough to handle these temporary globals in memory, they are written to disk. These disk I/O operations can significantly affect NLP performance. Refer to Memory and Startup Settings in the Configuring InterSystems IRIS chapter of the System Administration Guide.
NLP indexing requires substantially more disk space than the space occupied by the source texts. The approximate space requirements for temporary and permanent globals are described in Globals and Space Requirements section of the Implementation chapter of this manual.