Caché ObjectScript Reference
$WEXTRACT
[Back] [Next]
   
Server:docs2
Instance:LATEST
User:UnknownUser
 
-
Go to:
Search:    

Extracts a substring from a character string by position, or replaces a substring by position, recognizing surrogate pairs.
Synopsis
$WEXTRACT(string,from,to)
$WE(string,from,to)

SET $WEXTRACT(string,from,to)=value
SET $WE(string,from,to)=value
Parameters
string The target string in which substrings are identified. Specify string as an expression that evaluates to a quoted string or a numeric value. In SET $WEXTRACT syntax, string must be a variable or a multi-dimensional property.
from
Optional — The starting position within the target string. Characters are counted from 1. A surrogate pair is counted as a single character. Permitted values are n (a positive integer specifying the start position as a character count from the beginning of string), * (specifying the last character in string), and *-n (offset integer count of characters backwards from end of string). SET $WEXTRACT syntax also supports *+n (offset integer count of characters to append beyond the end of string). If not specified, the default is 1. Different values are used for the two-parameter form $WEXTRACT(string,from), and the three-parameter form $WEXTRACT(string,from,to):
Without to: Specifies a single character. To count from the beginning of string, specify an expression that evaluates to a positive integer (counting from 1); a zero (0) or negative number returns the empty string. To count from the end of string specify *, or *-n. If from is omitted it defaults to 1.
With to: Specifies the start of a range of characters. To count from the beginning of string, specify an expression that evaluates to a positive integer (counting from 1). A zero (0) or negative number evaluates as 1. To count from the end of string specify *, or *-n.
to
Optional — Specifies the end position (inclusive) for a range of characters. Must be used with from. Permitted values are n (a positive integer equal to or larger than from that specifies the end position as a character count from the beginning of string), * (specifying the last character in string), and *-n (offset integer count of characters backwards from end of string). A surrogate pair is counted as a single character. You can specify a to value that is beyond the end of the string.
SET $WEXTRACT syntax also supports *+n (offset integer count of the end of a range of characters to append beyond the end of string).
Description
$WEXTRACT identifies a substring within string by position, either counting characters from the beginning of string or counting characters by offset from the end of string. A substring can be a single character or a range of characters. $WEXTRACT recognizes a surrogate pair as a single character.
$WEXTRACT can be used in two ways:
$WEXTRACT and $EXTRACT are functionally identical, except for the handling of surrogate pairs.
Surrogate Pairs
The $WEXTRACT from and to parameters count a surrogate pair as a single character. You can use the $WISWIDE function to determine if a string contains a surrogate pair.
A surrogate pair is a pair of 16-bit Unicode characters that together encode a single ideographic character. Surrogate pairs are used to represent certain ideographs which are used in Chinese, Japanese kanji, and Korean hanja. (Most commonly-used Chinese, kanji, and hanja characters are represented by standard 16-bit Unicode encodings, not surrogate pairs.) Surrogate pairs provide Caché support for the Japanese JIS X0213:2004 (JIS2004) encoding standard and the Chinese GB18030 encoding standard.
A surrogate pair consists of high-order Unicode character in the hexadecimal range D800 through DBFF, and a low-order Unicode character in the hexadecimal range DC00 through DFFF.
The $WEXTRACT function treats a surrogate pair as a single character. The $EXTRACT function treats a surrogate pair as two characters. If a string contains no surrogate pairs, either $WEXTRACT and $EXTRACT can be used and return the same value. However, because $EXTRACT is generally faster than $WEXTRACT, $EXTRACT is preferable for all cases where a surrogate pair is not likely to be encountered. For further details on extracting a substring, refer to the $EXTRACT function.
Returning a Substring
$WEXTRACT returns a substring by character position from string. The nature of this substring extraction depends on the parameters used:
Replacing a Substring
You can use $WEXTRACT with the SET command to replace a specified character or range of characters with another value. You can also use it to append characters to the end of a string. SET $WEXTRACT counts a surrogate pair as a single character.
When $WEXTRACT is used with SET on the left hand side of the equals sign, string can be a valid variable name. If the variable does not exist, SET $WEXTRACT defines it. The string parameter can also be a multidimensional property reference; it cannot be a non-multidimensional object property. Attempting to use SET $WEXTRACT on a non-multidimensional object property results in an <OBJECT DISPATCH> error.
You cannot use SET (a,b,c,...)=value syntax with $WEXTRACT (or $EXTRACT, $PIECE, or $LIST) on the left of the equals sign, if the function uses relative offset syntax: * representing the end of a string and *-n or *+n representing relative offset from the end of the string. You must instead use SET a=value,b=value,c=value,... syntax.
For further details on replacing a substring, refer to the $EXTRACT function.
Examples
The following example shows the two-parameter form of $WEXTRACT returning the Unicode value for a surrogate pair:
  IF $SYSTEM.Version.IsUnicode()  {
    SET hipart=$CHAR($ZHEX("D806"))
    SET lopart=$CHAR($ZHEX("DC06"))
    SET spair=hipart_lopart /* surrogate pair */
    SET x="ABC"_spair_"DEFGHIJK"
    WRITE !,"$EXTRACT character "
    ZZDUMP $EXTRACT(x,4)
    WRITE !,"$WEXTRACT character "
    ZZDUMP $WEXTRACT(x,4)
  }
  ELSE {WRITE "This example requires a Unicode installation of Caché"}
 
The following example shows the three-parameter form of $WEXTRACT including a surrogate pair in a substring range:
  IF $SYSTEM.Version.IsUnicode()  {
  SET hipart=$CHAR($ZHEX("D806"))
  SET lopart=$CHAR($ZHEX("DC06"))
  SET spair=hipart_lopart /* surrogate pair */
  SET x="ABC"_spair_"DEFGHIJK"
   WRITE !,"$EXTRACT two characters "
   ZZDUMP $EXTRACT(x,3,4)
   WRITE !,"$WEXTRACT two characters "
   ZZDUMP $WEXTRACT(x,3,4)
  }
  ELSE {WRITE "This example requires a Unicode installation of Caché"}
 
See Also