Skip to main content

$WASCII (ObjectScript)

Returns the numeric code corresponding to a character, recognizing surrogate pairs.

Synopsis

$WASCII(expression,position)
$WA(expression,position)

Arguments

Argument Description
expression The character to be converted.
position Optional — The position of a character within a character string, counting from 1. The default is 1.

Description

$WASCII returns the character code value for a single character specified in expression. $WASCII recognizes a surrogate pair as a single character. The returned value is a positive integer.

The expression argument may evaluate to a single character or to a string of characters. If expression evaluates to a string of characters, you can include the optional position argument to indicate which character you want to convert. The position counts a surrogate pair as a single character. You can use the $WISWIDE function to determine if a string contains a surrogate pair.

A surrogate pair is a pair of 16-bit InterSystems IRIS character elements that together encode a single Unicode character. Surrogate pairs are used to represent certain ideographs which are used in Chinese, Japanese kanji, and Korean hanja. (Most commonly-used Chinese, kanji, and hanja characters are represented by standard 16-bit Unicode encodings.) Surrogate pairs provide InterSystems IRIS support for the Japanese JIS X0213:2004 (JIS2004) encoding standard and the Chinese GB18030 encoding standard.

A surrogate pair consists of high-order 16-bit character element in the hexadecimal range D800 through DBFF, and a low-order 16-bit character element in the hexadecimal range DC00 through DFFF.

The $WASCII function recognizes a surrogate pair as a single character. The $ASCII function treats a surrogate pair as two characters. In all other aspects, $WASCII and $ASCII are functionally identical. However, because $ASCII is generally faster than $WASCII, $ASCII is preferable for all cases where a surrogate pair is not likely to be encountered. For further details on character to numeric code conversion, refer to the $ASCII function.

Examples

The following example shows $WASCII returning the Unicode value for a surrogate pair:

  SET hipart=$CHAR($ZHEX("D806"))
  SET lopart=$CHAR($ZHEX("DC06"))
  WRITE !,$ASCII(hipart)," = high-order value"
  WRITE !,$ASCII(lopart)," = low-order value"
  SET spair=hipart_lopart /* surrogate pair */
  SET xpair=hipart_hipart /* NOT a surrogate pair */
  WRITE !,$WASCII(spair)," = surrogate pair value"
  WRITE !,$WASCII(xpair)," = Not a surrogate pair"

The following example compares $WASCII and $ASCII return values for a surrogate pair:

  SET hipart=$CHAR($ZHEX("D806"))
  SET lopart=$CHAR($ZHEX("DC06"))
  WRITE !,$ASCII(hipart)," = high-order value"
  WRITE !,$ASCII(lopart)," = low-order value"
  SET spair=hipart_lopart /* surrogate pair */
  WRITE !,$ASCII(spair)," = $ASCII value for surrogate pair"
  WRITE !,$WASCII(spair)," = $WASCII value for surrogate pair"

The following example shows the effects on position counting of surrogate pairs. It returns both the $WASCII and $ASCII values for each position. $WASCII counts a surrogate pair as one position; $ASCII counts a surrogate pair as two positions:

  SET hipart=$CHAR($ZHEX("D806"))
  SET lopart=$CHAR($ZHEX("DC06"))
  WRITE !,$ASCII(hipart)," = high-order value"
  WRITE !,$ASCII(lopart)," = low-order value",!
  SET str="AB"_lopart_hipart_lopart_"CD"_hipart_lopart_"EF"
  FOR x=1:1:11 {
  WRITE !,"position ",x," $WASCII ",$WASCII(str,x)," $ASCII ",$ASCII(str,x) }

See Also

FeedbackOpens in a new tab