Skip to main content

8-bit and Unicode String Handling

8-bit and Unicode String Handling

InterSystems Callin functions that operate on strings have both 8-bit and Unicode versions. These functions use a suffix character to indicate the type of string that they handle:

  • Names with an “A” suffix or no suffix at all (for example,IrisEvalA or IrisPopExStr) are versions for 8-bit character strings.

  • Names with a “W” suffix (for example,IrisEvalW or IrisPopExStrW) are versions for Unicode character strings on platforms that use 2–byte Unicode characters.

  • Names with an “H” suffix (for example,IrisEvalH or IrisPopExStrH) are versions for Unicode character strings on platforms that use 4–byte Unicode characters.

For best performance, use the kind of string native to your installed version of InterSystems IRIS.

8-bit String Data Types

InterSystems IRIS supports the following data types that use local 8-bit string encoding:

  • IRIS_ASTR — counted string of 8-bit characters

  • IRIS_ASTRP — Pointer to an 8-bit counted string

The type definition for these is:

#define IRIS_MAXSTRLEN 32767
typedef struct {
   unsigned short  len;
   Callin_char_t   str[IRIS_MAXSTRLEN];
} IRIS_ASTR, *IRIS_ASTRP;

The IRIS_ASTR and IRIS_ASTRP structures contain two elements:

  • len — An integer. When used as input, this element specifies the actual length of the string whose value is supplied in the str element. When used as output, this element specifies the maximum allowable length for the str element; upon return, this is replaced by the actual length of str.

  • str — A input or output string.

IRIS_MAXSTRLEN is the maximum length of a string that is accepted or returned. A parameter string need not be of length IRIS_MAXSTRLEN nor does that much space have to be allocated in the program.

2–byte Unicode Data Types

InterSystems IRIS supports the following Unicode-related data types on platforms that use 2–byte Unicode characters:

  • IRISWSTR — Unicode counted string

  • IRISWSTRP — Pointer to Unicode counted string

The type definition for these is:

typedef struct {
   unsigned short len;
   unsigned short str[IRIS_MAXSTRLEN];
} IRISWSTR, *IRISWSTRP;

The IRISWSTR and IRISWSTRP structures contain two elements:

  • len — An integer. When used as input, this element specifies the actual length of the string whose value is supplied in the str element. When used as output, this element specifies the maximum allowable length for the str element; upon return, this is replaced by the actual length of str.

  • str — A input or output string.

IRIS_MAXSTRLEN is the maximum length of a string that is accepted or returned. A parameter string need not be of length IRIS_MAXSTRLEN nor does that much space have to be allocated in the program.

On Unicode-enabled versions of InterSystems IRIS, there is also the data type IRIS_WSTRING, which represents the native string type on 2–byte platforms. IrisType returns this type. Also, IrisConvert can specify IRIS_WSTRING as the data type for the return value; if this type is requested, the result is passed back as a counted Unicode string in a IRISWSTR buffer.

4–byte Unicode Data Types

InterSystems IRIS supports the following Unicode-related data types on platforms that use 4–byte Unicode characters:

  • IRISHSTR — Extended Unicode counted string

  • IRISHSTRP — Pointer to Extended Unicode counted string

The type definition for these is:

typedef struct {
   unsigned int len;
   wchar_t str[IRIS_MAXSTRLEN];
} IRISHSTR, *IRISHSTRP;

The IRISHSTR and IRISHSTRP structures contain two elements:

  • len — An integer. When used as input, this element specifies the actual length of the string whose value is supplied in the str element. When used as output, this element specifies the maximum allowable length for the str element; upon return, this is replaced by the actual length of str.

  • str — A input or output string.

IRIS_MAXSTRLEN is the maximum length of a string that is accepted or returned. A parameter string need not be of length IRIS_MAXSTRLEN nor does that much space have to be allocated in the program.

On Unicode-enabled versions of InterSystems IRIS, there is also the data type IRIS_HSTRING, which represents the native string type on 4–byte platforms. IrisType returns this type. Also, IrisConvert can specify IRIS_HSTRING as the data type for the return value; if this type is requested, the result is passed back as a counted Unicode string in a IRISHSTR buffer.

Because Unicode-enabled InterSystems IRIS uses only 2-byte characters, these strings are converted to UTF-16 when coming into InterSystems IRIS and from UTF-16 to 4-byte Unicode when going out from InterSystems IRIS. The $W family of functions (for example, $WASCII() and $WCHAR()) can be used in InterSystems IRIS code to work with these strings.

System-neutral Symbol Definitions

The allowed inputs and outputs of some functions vary depending on whether they are running on an 8-bit system or a Unicode system. For many of the “A” (ASCII) functions, the arguments are defined as accepting a IRISSTR, IRIS_STR, IRISSTRP, or IRIS_STRP type. These symbol definitions (without the “A” , “W”, or “H”) can conditionally be associated with either the 8-bit or Unicode names, depending on whether the symbols IRIS_UNICODE and IRIS_WCHART are defined at compile time. This way, you can write source code with neutral symbols that works with either local 8-bit or Unicode encodings.

The following excerpt from iris-callin.h illustrates the concept:

#if defined(IRIS_UNICODE) /* Unicode character strings */
#define   IRISSTR      IRISWSTR
#define   IRIS_STR     IRISWSTR
#define   IRISSTRP     IRISWSTRP
#define   IRIS_STRP    IRISWSTRP
#define   IRIS_STRING  IRIS_WSTRING

#elif defined(IRIS_WCHART)  /* wchar_t character strings */
#define   IRISSTR      IRISHSTR
#define   IRIS_STR     IRISHSTR
#define   IRISSTRP     IRISHSTRP
#define   IRIS_STRP    IRISHSTRP
#define   IRIS_STRING  IRIS_HSTRING

#else                  /* 8-bit character strings */
#define   IRISSTR      IRIS_ASTR
#define   IRIS_STR     IRIS_ASTR
#define   IRISSTRP     IRIS_ASTRP
#define   IRIS_STRP    IRIS_ASTRP
#define   IRIS_STRING  IRIS_ASTRING
#endif
FeedbackOpens in a new tab