$WASCII (ObjectScript)
Synopsis
$WASCII(expression,position)
$WA(expression,position)
Arguments
Argument | Description |
---|---|
expression | The character to be converted. |
position | Optional — The position of a character within a character string, counting from 1. The default is 1. |
Description
$WASCII returns the character code value for a single character specified in expression. $WASCII recognizes a surrogate pair as a single character. The returned value is a positive integer.
The expression argument may evaluate to a single character or to a string of characters. If expression evaluates to a string of characters, you can include the optional position argument to indicate which character you want to convert. The position counts a surrogate pair as a single character. You can use the $WISWIDE function to determine if a string contains a surrogate pair.
A surrogate pair is a pair of 16-bit InterSystems IRIS character elements that together encode a single Unicode character. Surrogate pairs are used to represent certain ideographs which are used in Chinese, Japanese kanji, and Korean hanja. (Most commonly-used Chinese, kanji, and hanja characters are represented by standard 16-bit Unicode encodings.) Surrogate pairs provide InterSystems IRIS support for the Japanese JIS X0213:2004 (JIS2004) encoding standard and the Chinese GB18030 encoding standard.
A surrogate pair consists of high-order 16-bit character element in the hexadecimal range D800 through DBFF, and a low-order 16-bit character element in the hexadecimal range DC00 through DFFF.
The $WASCII function recognizes a surrogate pair as a single character. The $ASCII function treats a surrogate pair as two characters. In all other aspects, $WASCII and $ASCII are functionally identical. However, because $ASCII is generally faster than $WASCII, $ASCII is preferable for all cases where a surrogate pair is not likely to be encountered. For further details on character to numeric code conversion, refer to the $ASCII function.
Examples
The following example shows $WASCII returning the Unicode value for a surrogate pair:
SET hipart=$CHAR($ZHEX("D806"))
SET lopart=$CHAR($ZHEX("DC06"))
WRITE !,$ASCII(hipart)," = high-order value"
WRITE !,$ASCII(lopart)," = low-order value"
SET spair=hipart_lopart /* surrogate pair */
SET xpair=hipart_hipart /* NOT a surrogate pair */
WRITE !,$WASCII(spair)," = surrogate pair value"
WRITE !,$WASCII(xpair)," = Not a surrogate pair"
The following example compares $WASCII and $ASCII return values for a surrogate pair:
SET hipart=$CHAR($ZHEX("D806"))
SET lopart=$CHAR($ZHEX("DC06"))
WRITE !,$ASCII(hipart)," = high-order value"
WRITE !,$ASCII(lopart)," = low-order value"
SET spair=hipart_lopart /* surrogate pair */
WRITE !,$ASCII(spair)," = $ASCII value for surrogate pair"
WRITE !,$WASCII(spair)," = $WASCII value for surrogate pair"
The following example shows the effects on position counting of surrogate pairs. It returns both the $WASCII and $ASCII values for each position. $WASCII counts a surrogate pair as one position; $ASCII counts a surrogate pair as two positions:
SET hipart=$CHAR($ZHEX("D806"))
SET lopart=$CHAR($ZHEX("DC06"))
WRITE !,$ASCII(hipart)," = high-order value"
WRITE !,$ASCII(lopart)," = low-order value",!
SET str="AB"_lopart_hipart_lopart_"CD"_hipart_lopart_"EF"
FOR x=1:1:11 {
WRITE !,"position ",x," $WASCII ",$WASCII(str,x)," $ASCII ",$ASCII(str,x) }