Performance Considerations when Loading Texts
Because iKnow typically handles large amounts of text data, the following performance considerations should be heeded when loading source text:
-
Before starting a batch load of a significant number of sources, stop database journaling. Once the batch load completes, make sure to restart journaling. Refer to the “Journaling” chapter of the Caché Data Integrity Guide for information on stopping and restarting journaling.
-
Before starting a batch load of a significant number of sources (or a small number of very large sources), set the global buffer pool to a size large enough to handle this operation. iKnow indexing creates a large number of temporary globals. If the global buffer pool is not large enough to handle these temporary globals in memory, they are written to disk. These disk I/O operations can significantly affect iKnow performance. Refer to “Memory and Startup Settings” in the “Configuring Caché” chapter of the Caché System Administration Guide.
-
iKnow indexing requires substantially more disk space than the space occupied by the source texts. The approximate space requirements for temporary and permanent globals are described in “Globals and Space Requirements” section of the “Implementation” chapter of this manual.
-
Do not configure more language support than is required for your sources. Your iKnow Configuration should specify only those languages that are actually found in your sources. If all of your sources are in one language, do not specify automatic language identification. Unless n-grams are required for the language, do not set the EnableNgrams domain parameter.