Skip to main content

Unicode

Unicode

InterSystems IRIS supports the Unicode international character set. Unicode characters are 16-bit characters, also known as wide characters. The $ZVERSION special variable (Build nnnU) and the $SYSTEM.Version.IsUnicode()Opens in a new tab method show that the InterSystems IRIS installation supports Unicode.

For most purposes, InterSystems IRIS only supports the Unicode Basic Multilingual Plane (hex 0000 through FFFF) which contains the most commonly-used international characters. Internally, InterSystems IRIS uses the UCS-2 encoding, which for the Basic Multilingual Plane, is the same as UTF-16. You can work with characters that are not in the Unicode Basic Multilingual Plane by using $WCHAR, $WISWIDE, and related functions.

InterSystems IRIS encodes Unicode strings into memory by allocating 16 bits (two bytes) per character, as is standard with UTF-16 encodings. However, when saving a Unicode string to a global, if all characters have numerical values of 255 or lower, InterSystems IRIS stores the string using 8 bits (one byte) per character. If the string contains characters with numerical values greater than 255, InterSystems IRIS applies a compression algorithm to reduce the amount of space the string takes up in storage.

For conversion between Unicode and UTF-8, and conversions to other character encodings, refer to the $ZCONVERT function. You can use ZZDUMP to display the hexadecimal encoding for a string of characters. You can use $CHAR to specify a character (or string of characters) by its decimal (base 10) encoding. You can use $ZHEX to convert a hexadecimal number to a decimal number, or a decimal number to a hexadecimal number.

Letters in Unicode

On InterSystems IRIS, some names can contain Unicode letter characters, while other names cannot contain Unicode letters. Unicode letters are defined as alphabetic characters with decimal character code values higher than 255. For example, the Greek lowercase lambda is $CHAR(955), a Unicode letter.

Unicode letter characters are permitted throughout InterSystems IRIS, with the following exceptions:

  • Variable names: local variable names can contain Unicode letters. However, global variable names and process-private global names cannot contain Unicode letters. Subscripts for variables of all types can be specified with Unicode characters.

  • Administrator user names and passwords used for database encryption cannot contain Unicode characters.

The locale identifier is not taken into account when dealing with Unicode characters. That is, if a identifier consisting of Unicode characters is valid in one locale, the identifier is valid in any locale. Note that the above exceptions still apply.

Note:

The Japanese locale does not support accented Latin letter characters in InterSystems IRIS names. Japanese names may contain (in addition to Japanese characters) the Latin letter characters A-Z and a-z (65–90 and 97–122), and the Greek capital letter characters (913–929 and 931–937).

List Compression

ListFormat controls whether Unicode strings should be compressed when stored in a $LIST encoded string. The default is to not compress. Compressed format is automatically handled by InterSystems IRIS. Do not pass compressed lists to external clients, such as Java or C#, without verifying that they support the compressed format.

The per-process behavior can be controlled using the ListFormat()Opens in a new tab method of the %SYSTEM.ProcessOpens in a new tab class.

The system-wide default behavior can be established by setting the ListFormatOpens in a new tab property of the Config.MiscellaneousOpens in a new tab class or the InterSystems IRIS Management Portal, as follows: from System Administration, select Configuration, Additional Settings, Compatibility.

FeedbackOpens in a new tab