Collation specifies how values are ordered and compared, and is part of both Caché SQL and Caché Objects. There are two fundamental collations: numeric and string.
Numeric collation orders numbers based on the complete number in the following order: null, then negative numbers from largest to smallest, zero, then positive numbers from smallest to largest. This creates a sequence such as the following: 210, 185, 54, 34, -.02, 0, 1, 2, 10, 17, 100, 120.
String collation orders strings by collating on each sequential character. This creates an order such as the following: null, A, AA, AAA, AAB, AB, B. For numbers, this creates an order such as the following: .02, 185, 210, 34, 54, 0, 1, 10, 100, 120, 17, 2.
The default string collation is SQLUPPER; this default is set for each namespace. SQLUPPER collation converts all letters to uppercase (for the purpose of collation), and appends a space character to the beginning of the string. This conversion is for the purposes of collation only; in Caché SQL strings are usually displayed in uppercase and lowercase letters, regardless of the collation applied, and the length of a string does not include the appended space character.
A timestamp is a string, and therefore follows the current string collation. However, because a timestamp is in ODBC format, the string collation is the same as chronological sequence, if leading zeros are specified.
A string expression (such as those using the scalar string functions LEFT
) makes its result collation EXACT.
Any comparison of two literals uses EXACT collation.
You can use the ObjectScript Sorts After operator to determine the relative collation sequence order of two values.
You can specify collation as follows:
Collation can be specified as a keyword in the definition of a field/property or the definition of an index.
Collation can be specified by applying a collation function to a field name in a query clause. The % prefix is required when specifying a collation function.
Collation is in ascending ASCII/Unicode sequence, with the following transformations:
Enforces case sensitivity for string data. Not recommended for use if your string data contains values in canonical numeric
format (for example 123
(For compatibility with MultiValue database systems.) For a string containing both numeric and non-numeric characters. MVR collation divides the string into substrings, each substring containing either all numeric or all non-numeric characters. The numeric substrings are sorted in signed numeric order. The non-numeric substrings are sorted in case-sensitive ASCII collation sequence. (Note that this collation does not appear in the Studio New Index Wizard.)
Strips trailing whitespace (spaces, tabs, and so on), and adds one leading blank space to the beginning of the string. It collates any value containing only whitespace (spaces, tabs, and so on) as the SQL empty string. SQLSTRING supports an optional maxlen
Converts all alphabetic characters to uppercase, strips trailing whitespace (spaces, tabs, and so on), and then adds one leading space character to the beginning of the string. The reason this space character is appended is to force numeric values to be collated as strings (because the space character is not a valid numeric character). This transformation also causes SQL to collate the SQL empty string ('') value and any value containing only whitespace (spaces, tabs, and so on) as a single space character. SQLUPPER supports an optional maxlen
integer value. Note that the SQLUPPER transform is not
the same as the result of the SQL function UPPER
Enforces case sensitivity for string data and (unlike EXACT) allows you to specify a length at which to truncate the value. This is useful when indexing exact data that is longer than what is supported for use in a subscript. It takes a positive integer argument, in the form %TRUNCATE(string
, to truncate the string to the first n
characters, which improves indexing and sorting on long strings. If you do not specify a length for TRUNCATE, it behaves identically to EXACT; while this behavior is supported. your definitions and code may be easier to maintain if you use TRUNCATE only when you have a length defined and EXACT when you do not.
Makes the value numeric. A non-numeric string value is returned as 0.
Makes the value numeric and changes its sign. A non-numeric string value is returned as 0.
Three collation types: SQLSTRING, SQLUPPER, and TRUNCATE support an optional maxlen
integer value. If specified, maxlen
truncates parsing of the string to the first n
characters. This can be used to improve performance when indexing and sorting long strings. You can use maxlen
in a query to sort on, group by, or return a truncated string value.
Namespace-wide Default Collation
Each namespace has a current string collation setting. This string collation is defined for the data type in %Library.String
. The default is SQLUPPER. This default can be changed.
You can define the collation default on a per-namespace basis. By default, namespaces have no assigned collation, which means they use SQLUPPER collation. You can assign a different default collation to a namespace. This namespace default collation applies to all processes, and persists across Caché restarts until explicitly reset.
WRITE "initial collation for ",$NAMESPACE,!
WRITE "user-assigned collation for ",$NAMESPACE,!
WRITE "restored collation default for ",$NAMESPACE,!
Note that if you have never set the namespace collation default, $$GetEnvironment returns an undefined collation variable, such as .collval
in this example. This undefined collation defaults to SQLUPPER.
If your data contains German text, uppercase collation may not be a desirable default. This is because the German eszett
character ($CHAR(223)) has only a lowercase form. The uppercase equivalent is the two letters SS. SQL collations that convert to uppercase do not convert eszett
, which remains unchanged as a single lowercase letter.
Table Field/Property Definition Collation
Within SQL, collation can be assigned as part of field/property definition. The data type used by a field determines its default collation. The default collation for string data types is SQLUPPER. Non-string data types do not support collation assignment.
CREATE TABLE Sample.MyNames (
FirstName CHAR(30) COLLATE SQLstring)
When specifying collation for a field using CREATE TABLE
and ALTER TABLE
, the % prefix is optional: COLLATE SQLstring or COLLATE %SQLstring.
You can specify collation for a property when defining a table using a persistent class definition:
Class Sample.MyNames Extends %Persistent
Property LastName As %String;
Property FirstName As %String(COLLATION = "SQLstring");
When specifying collation for class definitions and class methods do not use the % prefix for collation type names.
In these examples, the LastName field takes default collation (SQLUPPER, which is not case-sensitive), the FirstName field is defined with SQLSTRING collation, which is case-sensitive.
If you change the collation for a class property and you already have stored data for that class, any indices on the property become invalid. You must rebuild all indices based on this property.
Index Definition Collation
The CREATE INDEX
command cannot specify an index collation type. The index uses the same collation as the field being indexed.
An index defined as part of class definition can specify a collation type. By default, an index on a given property (or properties) uses the collation type of the property data. For example, suppose you have defined a property Name
of type %String
Class MyApp.Person Extends %Persistent
Property Name As %String;
Index NameIDX On Name;
The collation for Name
is SQLUPPER (the default for %String
). Suppose that the Person
table contains the following data:
Then an index on Name
will contain the following entries:
The SQL Engine can use this index directly for ORDER BY or comparison operations using the Name
You can override the default collation used for an index by adding an As clause to the index definition:
Class MyApp.Person Extends %Persistent
Property Name As %String;
Index NameIDX On Name As SQLstring;
In this case the NameIDX index will now store values in SQLSTRING (case-sensitive) form. Using the data from the above example:
In this case, the SQL Engine can take advantage of this index for any queries requiring case-sensitive collation.
In general, you should not have to change the collations of indices. If you want to use a different collation, it is better to define it at the property level and let any indices on the property pick up the correct collation.
If you are performing a property comparison using an indexed property, the property as specified in the comparison should have the same collation type as the corresponding index. For example, the Name property in the WHERE clause of a SELECT or in the ON clause of a JOIN should have the same collation as the index defined for the Name property. If there is a mismatch between the property collation and the index collation, the index may be less effective or may not be used at all. For further details, refer to Index Collation
in the Defining and Building Indices
chapter of the Caché SQL Optimization Guide
If your index is defined to use multiple properties, you can specify the collation of each individually:
Index MyIDX On (Name As SQLstring, Code As Exact);
Caché SQL provides collation functions that can be used to change the collation or display of a field.
Applying a collation function to a query select-item changes the display of that item.
Letter Case: By default, a query displays strings with uppercase and lowercase letters. The exceptions to this are the DISTINCT or GROUP BY operations on a field of collation type SQLUPPER. These operations display that field in all uppercase letters. You can use the %EXACT collation function to reverse this letter case transformation and display the field in uppercase and lowercase letters. You should not use an %SQLUPPER collation function in the select-item
list to display a field in all uppercase letters. This is because %SQLUPPER adds a space character to the length of the string. Use the UPPER function instead:
SELECT TOP 5 Name,$LENGTH(Name) AS NLen,
%SQLUPPER(Name) AS UpCollN,$LENGTH(%SQLUPPER(Name)) AS UpCollLen,
UPPER(Name) AS UpN,$LENGTH(UPPER(Name)) AS UpLen
String Truncation: You can use the %TRUNCATE collation function to limit the length of the string data you wish to display. %TRUNCATE is preferable to %SQLUPPER, which adds a space character to the length of the string.
SELECT TOP 5 Name,$LENGTH(Name) AS NLen,
%TRUNCATE(Name,8) AS TruncN,$LENGTH(%TRUNCATE(Name,8)) AS TruncLen
Note that you cannot nest collation functions or case-transformation functions.
WHERE clause comparisons: Most WHERE clause predicate condition comparisons use the collation type of the field/property. Because string fields default to SQLUPPER, these comparisons are commonly not case-sensitive. You can use the %EXACT collation function to make them case-sensitive:
The following example returns Home_City string matches regardless of letter case:
SELECT Home_City FROM Sample.Person WHERE Home_City = 'albany'
The following example returns Home_City string matches that are case-sensitive:
SELECT Home_City FROM Sample.Person WHERE %EXACT(Home_City) = 'albany'
However, the SQL Contains operator ( [ )
uses EXACT collation, regardless of the collation type of the field/property:
SELECT Home_City FROM Sample.Person WHERE Home_City [ 'c'
ORDER BY Home_City
predicate conditions use EXACT collation, regardless of the collation type of the field/property. The %PATTERN
predicate provides both case-sensitive wildcards and a wildcard (‘A’) which is not case-sensitive.
ORDER BY clause: The ORDER BY clause uses the namespace default collation to order string values. Therefore, ORDER BY does not order based on lettercase. You can use %EXACT collation to order strings based on lettercase.
DISTINCT and GROUP BY Collation
By default, these operation use the current namespace collation. The default namespace collation is SQLUPPER.
DISTINCT: The DISTINCT keyword uses the namespace default collation to eliminate duplicate values. Therefore, DISTINCT Name returns values in all uppercase letters. You can use EXACT collation to return values in mixed uppercase and lowercase. DISTINCT eliminates duplicates that differ only in letter case. To preserve duplicates that differ in case, but eliminate exact duplicates, use EXACT collation. The following example eliminates exact duplicates (but not lettercase variants) and returns all values in mixed uppercase and lowercase:
SELECT DISTINCT %EXACT(Name) FROM Sample.Person
involves an implicit DISTINCT operation.
GROUP BY: The GROUP BY clause uses the namespace default collation to eliminate duplicate values. Therefore, GROUP BY Name returns values in all uppercase letters. You can use EXACT collation to return values in mixed uppercase and lowercase. GROUP BY eliminates duplicates that differ only in letter case. To preserve duplicates that differ in case, but eliminate exact duplicates, you must specify the %EXACT collation function on the GROUP BY clause, not the select-item.
The following example returns values in mixed uppercase and lowercase; the GROUP BY eliminates duplicates, including those that differ in lettercase:
SELECT %EXACT(Name) FROM Sample.Person GROUP BY Name
The following example returns values in mixed uppercase and lowercase; the GROUP BY eliminates exact duplicates (but not lettercase variants):
SELECT Name FROM Sample.Person GROUP BY %EXACT(Name)
Caché SQL supports several legacy collation types. These are deprecated and not recommended for use with new code, as their purpose is to provide continued support for legacy systems. They are:
Removes all punctuation characters except question marks (?
) and commas (,
), and translates all the lowercase letters to uppercase. Used mostly for mapping legacy globals. Replaced by SQLUPPER.
Converts a logical value to uppercase, strips all punctuation and white space (except for commas), and adds one leading blank space to the beginning of the string. It collates any value containing only whitespace (spaces, tabs, and so on) as the SQL empty string. Replaced by SQLUPPER.
Translates all lowercase letters into uppercase letters. Used mostly for mapping legacy globals. Replaced by SQLUPPER.
SPACE SPACE collation appends a single leading space to a value, forcing it to be evaluated as a string. To establish SPACE collation, CREATE TABLE
provides a SPACE collation keyword, and ObjectScript provides a SPACE option in the Collation()
method of the %SYSTEM.Util
class. There is no corresponding SQL collation function.
If a string data type field is defined with EXACT, UPPER, or ALPHAUP collation, and a query applies a %STARTSWITH
condition on this field, inconsistent behavior may result. If the substring
you specify to %STARTSWITH
is a canonical number (especially a negative and/or fractional number), %STARTSWITH
may give different results depending on whether the field is indexed. The %STARTSWITH
should perform as expected if the column is not indexed. If the column is indexed, unexpected results may occur.
The SQL collations described above should not be confused with the Caché NLS collation feature, which provides subscript-level encoding that adhere to particular national language collation requirements. These are two separate systems of providing collations, and they work at different levels of the product.
Caché NLS collations can have a process-level collation
for the current process, and different collations for specific globals.
To ensure proper functioning when using Caché SQL, it is a requirement that the process-level NLS collation matches exactly the NLS collation of all globals involved, including globals used by the tables and globals used for temporary files such as process private globals and for CACHETEMP globals; otherwise, different processing plans devised by the Query Processor might give different results. In situations where sorting occurs, such as an ORDER BY
clause or a range condition, the Query Processor selects the most efficient sorting strategy. It may use an index, use a temporary file in a process-private global, sort within a local array, or use a "]]" (Sorts After
) comparison. All these are subscript-type comparisons that adhere to the Caché NLS collation that is in effect, which is why it is necessary that all these types of globals use the exact same NLS collation.