8-bit and Unicode String Handling
InterSystems Callin functions that operate on strings have both 8-bit and Unicode versions. These functions use a suffix character to indicate the type of string that they handle:
-
Names with an “A” suffix or no suffix at all (for example,IrisEvalA or IrisPopExStr) are versions for 8-bit character strings.
-
Names with a “W” suffix (for example,IrisEvalW or IrisPopExStrW) are versions for Unicode character strings on platforms that use 2–byte Unicode characters.
-
Names with an “H” suffix (for example,IrisEvalH or IrisPopExStrH) are versions for Unicode character strings on platforms that use 4–byte Unicode characters.
For best performance, use the kind of string native to your installed version of InterSystems IRIS.
8-bit String Data Types
InterSystems IRIS supports the following data types that use local 8-bit string encoding:
The type definition for these is:
#define IRIS_MAXSTRLEN 32767
typedef struct {
unsigned short len;
Callin_char_t str[IRIS_MAXSTRLEN];
} IRIS_ASTR, *IRIS_ASTRP;
The IRIS_ASTR and IRIS_ASTRP structures contain two elements:
-
len — An integer. When used as input, this element specifies the actual length of the string whose value is supplied in the str element. When used as output, this element specifies the maximum allowable length for the str element; upon return, this is replaced by the actual length of str.
-
str — A input or output string.
IRIS_MAXSTRLEN is the maximum length of a string that is accepted or returned. A parameter string need not be of length IRIS_MAXSTRLEN nor does that much space have to be allocated in the program.
2–byte Unicode Data Types
InterSystems IRIS supports the following Unicode-related data types on platforms that use 2–byte Unicode characters:
The type definition for these is:
typedef struct {
unsigned short len;
unsigned short str[IRIS_MAXSTRLEN];
} IRISWSTR, *IRISWSTRP;
The IRISWSTR and IRISWSTRP structures contain two elements:
-
len — An integer. When used as input, this element specifies the actual length of the string whose value is supplied in the str element. When used as output, this element specifies the maximum allowable length for the str element; upon return, this is replaced by the actual length of str.
-
str — A input or output string.
IRIS_MAXSTRLEN is the maximum length of a string that is accepted or returned. A parameter string need not be of length IRIS_MAXSTRLEN nor does that much space have to be allocated in the program.
On Unicode-enabled versions of InterSystems IRIS, there is also the data type IRIS_WSTRING, which represents the native string type on 2–byte platforms. IrisType returns this type. Also, IrisConvert can specify IRIS_WSTRING as the data type for the return value; if this type is requested, the result is passed back as a counted Unicode string in a IRISWSTR buffer.
4–byte Unicode Data Types
InterSystems IRIS supports the following Unicode-related data types on platforms that use 4–byte Unicode characters:
The type definition for these is:
typedef struct {
unsigned int len;
wchar_t str[IRIS_MAXSTRLEN];
} IRISHSTR, *IRISHSTRP;
The IRISHSTR and IRISHSTRP structures contain two elements:
-
len — An integer. When used as input, this element specifies the actual length of the string whose value is supplied in the str element. When used as output, this element specifies the maximum allowable length for the str element; upon return, this is replaced by the actual length of str.
-
str — A input or output string.
IRIS_MAXSTRLEN is the maximum length of a string that is accepted or returned. A parameter string need not be of length IRIS_MAXSTRLEN nor does that much space have to be allocated in the program.
On Unicode-enabled versions of InterSystems IRIS, there is also the data type IRIS_HSTRING, which represents the native string type on 4–byte platforms. IrisType returns this type. Also, IrisConvert can specify IRIS_HSTRING as the data type for the return value; if this type is requested, the result is passed back as a counted Unicode string in a IRISHSTR buffer.
Because Unicode-enabled InterSystems IRIS uses only 2-byte characters, these strings are converted to UTF-16 when coming into InterSystems IRIS and from UTF-16 to 4-byte Unicode when going out from InterSystems IRIS. The $W family of functions (for example, $WASCII() and $WCHAR()) can be used in InterSystems IRIS code to work with these strings.
System-neutral Symbol Definitions
The allowed inputs and outputs of some functions vary depending on whether they are running on an 8-bit system or a Unicode system. For many of the “A” (ASCII) functions, the arguments are defined as accepting a IRISSTR, IRIS_STR, IRISSTRP, or IRIS_STRP type. These symbol definitions (without the “A” , “W”, or “H”) can conditionally be associated with either the 8-bit or Unicode names, depending on whether the symbols IRIS_UNICODE and IRIS_WCHART are defined at compile time. This way, you can write source code with neutral symbols that works with either local 8-bit or Unicode encodings.
The following excerpt from iris-callin.h illustrates the concept:
#if defined(IRIS_UNICODE) /* Unicode character strings */
#define IRISSTR IRISWSTR
#define IRIS_STR IRISWSTR
#define IRISSTRP IRISWSTRP
#define IRIS_STRP IRISWSTRP
#define IRIS_STRING IRIS_WSTRING
#elif defined(IRIS_WCHART) /* wchar_t character strings */
#define IRISSTR IRISHSTR
#define IRIS_STR IRISHSTR
#define IRISSTRP IRISHSTRP
#define IRIS_STRP IRISHSTRP
#define IRIS_STRING IRIS_HSTRING
#else /* 8-bit character strings */
#define IRISSTR IRIS_ASTR
#define IRIS_STR IRIS_ASTR
#define IRISSTRP IRIS_ASTRP
#define IRIS_STRP IRIS_ASTRP
#define IRIS_STRING IRIS_ASTRING
#endif