Using Caché SQL
Collation
[Home] [Back] [Next]
InterSystems: The power behind what matters   
Class Reference   
Search:    

Collation specifies how values are ordered and compared, and is part of both Caché SQL and Caché Objects. There are two fundamental collations: numeric and string.

The default string collation is SQLUPPER; this default is set for each namespace. SQLUPPER collation converts all letters to uppercase (for the purpose of collation), and appends a space character to the beginning of the string. This conversion is for the purposes of collation only; in Caché SQL strings are usually displayed in uppercase and lowercase letters, regardless of the collation applied, and the length of a string does not include the appended space character.
A timestamp is a string, and therefore follows the current string collation. However, because a timestamp is in ODBC format, the string collation is the same as chronological sequence, if leading zeros are specified.
You can use the ObjectScript Sorts After operator to determine the relative collation sequence order of two values.
You can specify collation as follows:
Also see SQL Collation and NLS Collations,” later in this chapter.
Collation Types
Collation can be specified as a keyword in the definition of a field/property or the definition of an index.
Collation can be specified by applying a collation function to a field name in a query clause. The % prefix is required when specifying a collation function.
Collation is in ascending ASCII/Unicode sequence, with the following transformations:
Note:
There are also various legacy collation types, the use of which is not recommended.
In an SQL query, you can specify a collation function without parentheses %SQLUPPER Name or with parentheses %SQLUPPER(Name). If the collation function specifies truncation, the parentheses are required %SQLUPPER(Name,10).
Three collation types: SQLSTRING, SQLUPPER, and TRUNCATE support an optional maxlen integer value. If specified, maxlen truncates parsing of the string to the first n characters. This can be used to improve performance when indexing and sorting long strings. You can use maxlen in a query to sort on, group by, or return a truncated string value.
You can also perform collation type conversions using the %SYSTEM.Util.Collation() method.
Namespace-wide Default Collation
Each namespace has a current string collation setting. This string collation is defined for the data type in %Library.String. The default is SQLUPPER. This default can be changed.
You can define the collation default on a per-namespace basis. By default, namespaces have no assigned collation, which means they use SQLUPPER collation. You can assign a different default collation to a namespace. This namespace default collation applies to all processes, and persists across Caché restarts until explicitly reset.
   SET stat=$$GetEnvironment^%apiOBJ("collation","%Library.String",.collval)
   WRITE "initial collation for ",$NAMESPACE,!
     ZWRITE collval
SetNamespaceCollation
  DO SetEnvironment^%apiOBJ("collation","%Library.String","SQLstring")
  SET stat=$$GetEnvironment^%apiOBJ("collation","%Library.String",.collnew)
    WRITE "user-assigned collation for ",$NAMESPACE,!
     ZWRITE collnew
ResetCollationDefault
  DO SetEnvironment^%apiOBJ("collation","%Library.String",.collval)
  SET stat=$$GetEnvironment^%apiOBJ("collation","%Library.String",.collreset)
     WRITE "restored collation default for ",$NAMESPACE,!
     ZWRITE collreset
 
Note that if you have never set the namespace collation default, $$GetEnvironment returns an undefined collation variable, such as .collval in this example. This undefined collation defaults to SQLUPPER.
Note:
If your data contains German text, uppercase collation may not be a desirable default. This is because the German eszett character ($CHAR(223)) has only a lowercase form. The uppercase equivalent is the two letters “SS”. SQL collations that convert to uppercase do not convert eszett, which remains unchanged as a single lowercase letter.
Table Field/Property Definition Collation
Within SQL, collation can be assigned as part of field/property definition. The data type used by a field determines its default collation. The default collation for string data types is SQLUPPER. Non-string data types do not support collation assignment.
You can specify collation for a field in CREATE TABLE and ALTER TABLE:
CREATE TABLE Sample.MyNames (
    LastName CHAR(30),
    FirstName CHAR(30) COLLATE SQLstring)
Note:
When specifying collation for a field using CREATE TABLE and ALTER TABLE, the % prefix is optional: COLLATE SQLstring or COLLATE %SQLstring.
You can specify collation for a property when defining a table using a persistent class definition:
Class Sample.MyNames Extends %Persistent 
{
Property LastName As %String;
Property FirstName As %String(COLLATION = "SQLstring");
}
Note:
When specifying collation for class definitions and class methods do not use the % prefix for collation type names.
In these examples, the LastName field takes default collation (SQLUPPER, which is not case-sensitive), the FirstName field is defined with SQLSTRING collation, which is case-sensitive.
If you change the collation for a class property and you already have stored data for that class, any indices on the property become invalid. You must rebuild all indices based on this property.
Index Definition Collation
The CREATE INDEX command cannot specify an index collation type. The index uses the same collation as the field being indexed.
An index defined as part of class definition can specify a collation type. By default, an index on a given property (or properties) uses the collation type of the property data. For example, suppose you have defined a property Name of type %String:
Class MyApp.Person Extends %Persistent 
{
Property Name As %String;
Index NameIDX On Name;
}
The collation for Name is SQLUPPER (the default for %String). Suppose that the Person table contains the following data:
ID Name
1 Jones
2 JOHNSON
3 Smith
4 jones
5 SMITH
Then an index on Name will contain the following entries:
Name ID(s)
JOHNSON 2
JONES 1, 4
SMITH 3, 5
The SQL Engine can use this index directly for ORDER BY or comparison operations using the Name field.
You can override the default collation used for an index by adding an As clause to the index definition:
Class MyApp.Person Extends %Persistent 
{
Property Name As %String;
Index NameIDX On Name As SQLstring;
}
In this case the NameIDX index will now store values in SQLSTRING (case-sensitive) form. Using the data from the above example:
Name ID(s)
JOHNSON 2
Jones 1
jones 4
SMITH 5
Smith 3
In this case, the SQL Engine can take advantage of this index for any queries requiring case-sensitive collation.
In general, you should not have to change the collations of indices. If you want to use a different collation, it is better to define it at the property level and let any indices on the property pick up the correct collation.
If you are performing a property comparison using an indexed property, the property as specified in the comparison should have the same collation type as the corresponding index. For example, the Name property in the WHERE clause of a SELECT or in the ON clause of a JOIN should have the same collation as the index defined for the Name property. If there is a mismatch between the property collation and the index collation, the index may be less effective or may not be used at all. For further details, refer to Index Collation in the “Defining and Building Indices” chapter of the Caché SQL Optimization Guide.
If your index is defined to use multiple properties, you can specify the collation of each individually:
Index MyIDX On (Name As SQLstring, Code As Exact);
Query Collation
Caché SQL provides collation functions that can be used to change the collation or display of a field.
select-item Collation
Applying a collation function to a query select-item changes the display of that item.
DISTINCT and GROUP BY Collation
By default, these operation use the current namespace collation. The default namespace collation is SQLUPPER.
Legacy Collation Types
Caché SQL supports several legacy collation types. These are deprecated and not recommended for use with new code, as their purpose is to provide continued support for legacy systems. They are:
Note:
If a string data type field is defined with EXACT, UPPER, or ALPHAUP collation, and a query applies a %STARTSWITH condition on this field, inconsistent behavior may result. If the substring you specify to %STARTSWITH is a canonical number (especially a negative and/or fractional number), %STARTSWITH may give different results depending on whether the field is indexed. The %STARTSWITH should perform as expected if the column is not indexed. If the column is indexed, unexpected results may occur.
SQL and NLS Collations
The SQL collations described above should not be confused with the Caché NLS collation feature, which provides subscript-level encoding that adhere to particular national language collation requirements. These are two separate systems of providing collations, and they work at different levels of the product.
Caché NLS collations can have a process-level collation for the current process, and different collations for specific globals.
To ensure proper functioning when using Caché SQL, it is a requirement that the process-level NLS collation matches exactly the NLS collation of all globals involved, including globals used by the tables and globals used for temporary files such as process private globals and for CACHETEMP globals; otherwise, different processing plans devised by the Query Processor might give different results. In situations where sorting occurs, such as an ORDER BY clause or a range condition, the Query Processor selects the most efficient sorting strategy. It may use an index, use a temporary file in a process-private global, sort within a local array, or use a "]]" (Sorts After) comparison. All these are subscript-type comparisons that adhere to the Caché NLS collation that is in effect, which is why it is necessary that all these types of globals use the exact same NLS collation.
The system creates a global with the data base default collation. You can use the Create() method of the %Library.GlobalEdit class to create a global with a different collation. The only requirement is that the specified collation be either built-in (such as the Caché standard) or one of the national collations available in the current locale. See Using %Library.GlobalEdit to Set Collation For A Global in Caché Specialized System Tools and Utilities.