Caché ObjectScript Reference
$ZCONVERT
|
|
String conversion function.
Synopsis
$ZCONVERT(string,mode,trantable,handle)
$ZCVT(string,mode,trantable,handle)
$ZCONVERT converts a string from one form to another. The nature of the conversion depends on the parameters you use.
$ZCONVERT Returns a Converted String
The values you can use for
mode are as follows:
If
mode is a null string or any value other than the valid characters, you receive a <FUNCTION> error.
You can convert letters in strings to all uppercase letters or all lowercase letters. Conversion works on Unicode letters as well as ASCII letters. The following example converts the Greek alphabet from lowercase to uppercase:
IF $SYSTEM.Version.IsUnicode() {
FOR i=945:1:969 {WRITE $ZCONVERT($CHAR(i),"U")}
}
ELSE {WRITE "This example requires a Unicode installation of Caché"}
However, a small number of letters only have a lowercase letter form. For example, the German eszett ($CHAR(223)) is only defined as a lowercase letter. Attempting to convert it to an uppercase letter results in the same lowercase letter:
IF $ZCONVERT($CHAR(223),"U")=$ZCONVERT($CHAR(223),"L") {
WRITE "uppercase and lowercase letter are the same" }
ELSE {WRITE "uppercase and lowercase are different" }
For this reason, when converting alphanumeric strings to a single letter case it is always preferable to convert to lowercase.
You can perform similar letter case translations using the
$TRANSLATE function, as shown in the following example:
WRITE $TRANSLATE(text,"ABCDEFGHIJKLMNOPQRSTUVWXYZ","abcdefghijklmnopqrstuvwxyz")
Word and Sentence Translation
W and S modes determine whether a non-blank character is the first character of a word or the first character of a sentence, and if that character is a letter, translate it to uppercase. All other letters are translated to lowercase. Case translation works on letters in any alphabet, as shown in the following example which converts Greek letters ($CHAR(945) is lowercase alpha; $CHAR(913) is uppercase alpha):
IF $SYSTEM.Version.IsUnicode() {
SET greek=$CHAR(945,946,947,913,914,915)
WRITE $ZCONVERT(greek,"W")
}
ELSE {WRITE "This example requires a Unicode installation of Caché"}
However the rules determining what constitutes a word or sentence are locale dependent. For example, the following example uses the Spanish inverted exclamation point $CHAR(161). The default (English) locale
does not recognize this character as beginning a sentence or word. In this example, all letters in
spanish are translated to lowercase:
SET spanish=$CHAR(161)_"ola MuNdO! "_$CHAR(161)_"olA!"
SET english="hElLo wOrLd! heLLo!"
WRITE !,$ZCONVERT(english,"S")
WRITE !,$ZCONVERT(spanish,"S")
Titlecase (T) mode converts
every letter in the string to its titlecase form. Titlecase
does not selectively uppercase letters based on their position in a word or string. Titlecase is the case that a letter is represented in when it is the first character of a word in a title. For standard Latin letters, the titlecase form is the same as the uppercase form.
Some languages (for example, Croatian) represent particular letters by two letter glyphs. For example, lj is a single letter in the Croatian alphabet. This letter has three forms: lowercase lj, uppercase LJ, and titlecase Lj.
$ZCONVERT titlecase translation is used for this type of letter conversion.
Three-Parameter Form: Encoding Translation
$ZCONVERT(
string,
mode,
trantable) performs either an input encoding translation or an output encoding translation on
string. In the three-argument form, the
mode values you can use are either "I" or "O". You must define the
mode value. For I translations, the
string may be a hexadecimal string, such as
%4B (the letter K); hexadecimal strings are not case-sensitive.
You can use
ZZDUMP to display the hexadecimal encoding for a string of characters. You can use
$CHAR to specify a character (or string of characters) by its decimal (base 10) encoding; you can use
$ZHEX to converts a hexadecimal number to a decimal number, or a decimal number to a hexadecimal number. If the translated value is a non-printing character, Caché displays it as a null string. If the target device cannot represent a translated character, Caché substitutes a question mark (?) character for the non-displayable character.
-
An integer value specifying a process I/O translation object. Available values are 0 through 3 (0 represents the current process I/O translation object).
-
An uppercase string value identifying a Caché-supplied I/O translation table. Available translation tables include:
-
RAW which performs no translation for 8-bit characters or 16-bit Latin-1 characters (Unicode characters in which the high-order byte has the value 00). RAW translation should not be used for Unicode systems using non-Latin-1 locales, such as rusw.
-
SAME which performs no translation on 8-bit systems, and translates 8-bit characters to the corresponding Unicode character on Unicode systems.
-
HTML which adds (output mode) or removes (input mode) HTML escape characters to a string.
-
JS (or JSML) which uses a supplied JavaScript translation table to convert to the format for Zen component pages. For output translations see the table below. For input translations, \0, \000, \x00, and \u0000 are all valid escape sequences for NULL.
-
JSON (or JSONML) which uses a supplied translation table to convert to JSON format. For output translations see the table below. For input translations, \0, \000, \x00, and \u0000 are all valid escape sequences for NULL.
-
URL which adds (output mode) or removes (input mode) URL parameter escape characters to a string. Characters higher than $CHAR(255) are represented in Unicode hexadecimal notation: $CHAR(256) = %u0100.
-
UTF8 (UTF-8 encoding) which converts (output mode) 16-bit Unicode characters to a series of 8-bit characters. An ASCII 16bit Unicode character translates to a single 8bit character; for example, hex 0041 (the letter A) translates to the 8-bit character hex 41. A non-ASCII Unicode character is converted to two or three 8bit characters. Unicode hex 0080 through 07FF convert to two 8bit characters; these include the Latin-1 Supplement and Latin Extended characters and the Greek, Cyrillic, Hebrew, and Arabic alphabets. Unicode hex 0800 through FFFF convert to three 8bit characters; these comprise the rest of the Unicode Basic Multilingual Plane. Thus, the ASCII characters $CHAR(0) through $CHAR(127) are the same in RAW and UTF8 mode; characters $CHAR(128) and above are converted. Input mode reverse this conversion. Refer to
Unicode in
Using ObjectScript for further details.
-
XML which adds (output mode) or removes (input mode) XML escape characters to a string.
-
A string value specifying an I/O translation table defined by an NLS locale. For example, Latin2 or CP1252. For a list of locale translation tables, refer to the
XLTTables property of
%SYS.NLS.Locale, as shown in the following example:
SET nlsoref=##class(%SYS.NLS.Locale).%New()
WRITE $LISTTOSTRING($PROPERTY(nlsoref,"XLTTables"),"^")
-
A string value specifying a user-defined I/O translation table. A named table can be defined in a locale and points to one or two translation tables. Use a named table to define a specific system-to/from-device encoding.
-
An empty string ("") specifying the use of the default process I/O translation table. (For equivalent functionality, see the $$GetPDefIO^%NLS() function of the %NLS utility.)
The following is a table of Output mode escape characters:
A URL or URI can only contain certain 8-bit ASCII characters. All other characters must be represented by an escape sequence beginning with %. If you wish to convert a string containing UNICODE characters to a URL or URI, you must first convert your local representation to an 8-bit intermediate representation, using UTF8 encoding. You then convert the UTF8 results to URL encoding. To convert a URL back to its original UNICODE string, you perform the reverse operation. This is shown in the following example:
IF $SYSTEM.Version.IsUnicode() {
SET ustring="US$ to "_$CHAR(8364)_" échange"
WRITE "initial string is: ",ustring,!
ConvertUnicodeToURL
SET utfo = $ZCONVERT(ustring,"O","UTF8")
SET urlo = $ZCONVERT(utfo,"O","URL")
WRITE "UNICODE to URL conversion: ",urlo,!
ConvertURLtoUnicode
SET urli = $ZCONVERT(urlo,"I","URL")
SET utfi = $ZCONVERT(urli,"I","UTF8")
WRITE "URL to UNICODE conversion: ",utfi
}
ELSE {WRITE "This example requires a Unicode installation of Caché"}
Four-Parameter Form: Input/Output String
The
handle parameter is a local variable that
$ZCONVERT reads at the beginning of execution and writes when it completes execution. It is used to hold information between consecutive invocations of the
$ZCONVERT function. It can be used for two purposes: concatenating a string to the beginning of
string, and converting extremely long strings.
SET handle="the "
WRITE $ZCVT("quick brown fox","O","URL",handle),!
/* the%20quick%20brown%20fox */
WRITE $ZCVT("quick brown fox","O","URL",handle),!
/* quick%20brown%20fox */
Note that
$ZCONVERT resets
handle when it completes execution. In the previous example, it resets
handle to the empty string.
This
handle parameter may be used for input conversions. Specifying a
handle is useful when dealing with multibyte character sequences when working with partial sets of characters, such as a stream read. In these cases,
$ZCONVERT uses the
handle parameter to hold a partial character sequence that may be the leading bytes of a multibyte sequence. If there are input characters left in the buffer at the end of a
$ZCONVERT which do not make a complete translation unit, these leftover characters are returned in the
handle. At the beginning of next
$ZCONVERT, if the
handle contains data, these leftover characters are prepended to the normal input data. This is particularly valuable for use in UTF8 conversions, as shown in the following example:
SET handle=""
WHILE 'stream.AtEnd() {
WRITE $ZCONVERT(stream.Read(20000),"I","UTF8",handle)
}
To convert an extremely long string, it may be necessary to perform more than one string conversions by invoking
$ZCONVERT multiple times.
$ZCONVERT provides the optional
handle parameter to hold the remaining unconverted portion of
string. If you specify a
handle parameter, it is updated by each invocation of
$ZCONVERT. When the string conversion completes,
$ZCONVERT sets
handle to the empty string.
SET handle=""
SET out = $ZCVT(hugestring,"O","HTML",handle)
IF handle '= "" {
SET out2 = $ZCVT(handle,"O","HTML",handle)
WRITE "Converted string is: ",out,out2 }
ELSE {
WRITE "Converted string is: ",out }
The following example returns "HELLO":
WRITE $ZCONVERT("Hello","U")
The following example returns "hello":
The following example returns "HELLO":
The following example uses the concatenate operator (_) to append and case-convert an accented character:
WRITE "CACH"_$CHAR(201),!, $ZCVT("CACH"_$CHAR(201),"L")
The following example converts the angle brackets in the string to HTML escape characters for output, returning <TAG>
WRITE $ZCVT("<TAG>","O","HTML")
Note that how these angle brackets display depends on the output device; try running this program here and then running it from the Terminal prompt.
The following example shows how
$ZCONVERT substitutes a ? character for a translated character it cannot display. Both the UTF8 and the current process I/O translation object (
trantable 0) conversions in this example display $CHAR(63), which is the actual ? character. UTF8 cannot display translated characters above $CHAR(127). Translation table 0 cannot display translated characters above $CHAR(255):
FOR i=1:1:300 {IF $ZCONVERT($CHAR(i),"I","UTF8") '= "?"
{ CONTINUE }
ELSE {WRITE "UTF8 ",i,"=",$ZCONVERT($CHAR(i),"I","UTF8")}
IF $ZCONVERT($CHAR(i),"I",0)="?"
{WRITE " trantable 0 ",i,"=",$ZCONVERT($CHAR(i),"I",0),!}
ELSE {WRITE !}
}