# Numeric Computing in InterSystems Applications

This appendix provides details on the numeric formats supported by Caché. It discusses the following topics:

## Representations of Numbers

Caché has two different ways of representing numbers.

The first of these has its roots in the original implementation of Caché. This representation will be referred to as decimal format.

In class definitions, you use the %Library.Decimal datatype class when you want a property to contain a decimal format number.

The second, more recently supported, form adheres to the IEEE Binary Floating-Point Arithmetic standard (#754–1985). This latter format is referred to as $DOUBLE format after the ObjectScript function ($DOUBLE) that is used to convert numbers into this form.

In class definitions, you use the %Library.Double datatype class when you want a property to contain a $DOUBLE format number.

### Decimal Format

Caché represents decimal numbers internally in two parts. The first is called the significand, and the second is called the exponent:

The significand contains the significant digits of the number. It is stored as a signed 64–bit integer with the decimal point assumed to be to the right of the value. The largest positive integer with an exponent of 0 that can be represented without loss of precision is 9,223,372,036,854,775,807; the largest negative integer is -9,223,372,036,854,775,808.

The exponent is stored internally as a signed byte. Its values range from 127 to -128.

This is the base-10 exponent of the value. That is, the value of the number is the significand multiplied by 10 raised to the power of the exponent.

For example, for the ObjectScript literal value 1.23, the significand is 123, and -2 is the exponent.

Thus, the range of numbers that can be represented in Caché native format approximately covers the range 1.0E-128 to 9.22E145. (The first value is the smallest integer with the smallest exponent. The second value is the largest integer with the decimal point moved to the left and the exponent increased correspondingly in the displayed representation. )

All numbers with 18 digits of precision can be represented exactly; numbers which are within the representation bounds of the significand can be accurately represented as 19-digit values.

Caché does not normalize the significand unless necessary to fit the number in decimal format. So numbers with a significand of 123 and an exponent of 1, and a significand of 1230 and an exponent of zero compare as equal.

### $DOUBLE Format

The Caché $DOUBLE format conforms to IEEE-754–1985, specifically, the 64-bit binary (double-precision) representation. This means it consists of three parts:

A sign bit

An 11–bit power of two exponent. The exponent value is biased by 1023, so the internal value of the exponent for the number $DOUBLE(1.0) is 1023 rather than 0.

A positive 52–bit fractional significand. Because the significand is always treated as a positive value and normalized, a 1-bit is assumed as the lead binary digit even though it is not present in the significand. Thus, the significand is numerically 53 bits long: the value 1, followed by the implied binary point, followed by the fractional significand. This can be thought of as an integer implicitly divided by 2**52.

As an integer, all values between 0 and 9,007,199,254,740,992 can be represented exactly. Larger integers may or may not have exact representations depending on their pattern of bits.

This representation has three optional features that are not available with Caché native format:

The ability to represent the results of invalid computations (such as taking the square root of a negative number) as a NaN (Not any Number).

The ability to represent both a +0 and -0.

The ability to represent infinity.

The standard provides for representation of numbers smaller than 2 ** -1022. This is done by a technique referred to as a “gradual loss of precision”. Please refer to the standard for details.

These features are under program control via the IEEEError() method of the %SYSTEM.Process class for an individual process or the IEEEError() method of the Config.Miscellaneous class for the system as a whole.

Calculations using IEEE binary floating-point representations can give different results for the same IEEE operation. InterSystems has written its own implementations for:

Conversions between $DOUBLE binary floating-point and decimal;

Conversion between $DOUBLE and numeric strings;

Comparisons between $DOUBLE and other numeric types.

This guarantees that when a $DOUBLE value is inserted into, or fetched from, a Caché data base, the result is the same across all hardware platforms.

However, for all other calculations involving the $DOUBLE type, Caché uses the vendor-supplied floating-point library subroutines. This means that there can be minor differences between platforms for the same set of operations. In all cases, however, Caché $DOUBLE calculations equal the local calculations performed on the C double type; that is, the differences between platforms for Caché $DOUBLE computations are never worse than the differences exhibited by C programs computing IEEE values running on those same platforms.

### SQL Representations

The Caché SQL data types DOUBLE and DOUBLE PRECISION represent IEEE floating-point numbers, that is, $DOUBLE. The SQL FLOAT data type represents standard Caché decimal numbers.

## Choosing a Numeric Format

The choice of which format to use is largely determined by the requirements of the computation. Caché decimal format permits over 18 decimal digits of accuracy while $DOUBLE guarantees only 15.

In most cases, decimal format is simpler to use and provides more precise results. It is usually preferred for computations involving decimal values (such as currency calculations) because it gives the expected results. Decimal fractions cannot often be represented exactly as binary fractions.

On the other hand, the range of numbers in $DOUBLE is significantly larger than permitted by native format: 1.0E308 versus 1.0E145. Those applications where the range is a significant factor should use $DOUBLE.

Applications that will share data externally may also consider maintaining data in $DOUBLE format because it will not be subject to implicit conversion. Most other systems use the IEEE standard as their representation of binary floating-point numbers because it is supported directly by the underlaying hardware architecture. So values in decimal format must be converted before they can be exchanged, for example, via ODBC/JDBC, SQL, or language binding interfaces.

If a $DOUBLE value is within the bounds defined for Caché decimal numbers, then converting it to decimal and then converting back to a $DOUBLE value will always yield the same number. The reverse is not true because $DOUBLE values have less precision than decimal values.

For this reason, InterSystems recommends that computation be done in one representation or the other, when possible. Converting values back and forth between representations may cause loss of accuracy. Most applications can use Caché decimal format for all their computations. The $DOUBLE format is intended to support those applications that exchange data with systems that use IEEE formats.

The reasons for preferring Caché decimal over $DOUBLE are:

Caché decimal has more precision, almost 19 decimal digits compared to less than 16 decimal digits for $DOUBLE.

Caché decimal can exactly represent decimal fractions. The value 0.1 is an exact value in Caché decimal; but there is no exact equivalent in binary floating point, so 0.1 must be approximated in $DOUBLE format.

The advantages of $DOUBLE over Caché decimal for scientific numbers are:

$DOUBLE uses exactly the same representation as the IEEE double precision binary floating point used by most computing hardware.

$DOUBLE has a greater range: 1.7E308 maximum for $DOUBLE and 9.2E145 maximum for Caché decimal.

## Converting Numeric Representations

Beginning in Caché 2007.1, numbers – numeric literals and the results of computations – were automatically converted to a $DOUBLE representation when the value exceeded the range of a decimal number.

Beginning with Caché 2008.2, only numeric literals are automatically converted to $DOUBLE. Computational results in decimal that are out of range generate the appropriate error, as discussed later in this appendix.

InterSystems recommends that your application explicitly control conversions between decimal and $DOUBLE formats.

When converting values from string to number, or when processing written constants when a program is compiled, only the first 38 significant digits can influence the value of the significand. All digits following that will be treated as if they were zero; that is, they will be used in determining the value of the exponent but they will have no additional effect on the significand value.

### Strings

#### Strings as Numbers

In Caché, if a string is used in an expression, the value of the string is the value of the longest numeric literal contained in the string starting at the first character. If there is no such literal present, the computed value of the string is zero.

#### Numeric Strings As Subscripts

In computation, there is no difference between the strings “04” and “4”. However, when such strings are used as subscripts for local or global arrays, Caché makes a distinction between them.

In Caché, numeric strings that contain leading zeroes (after the minus sign, if there is one), or trailing zeroes at the end of decimal fractions, will be treated as if they were strings when used as subscripts. As strings, they have a numeric value; they can be used in computations. But as subscripts for local or global variables, they are treated as strings and are collated as strings. Thus, in the list of pairs:

“4” versus “04”

“10” versus “10.0”

“.001” versus “0.001”

“-.3” versus “-0.3”

“1” versus “+01”

those on the left are considered numbers when used as subscripts and those on the right are treated as strings. (The form on the left, without the extraneous leading and trailing zero parts, is sometimes referred to as “canonical” form.)

In normal collation, numbers sort before strings as shown in this example,

SET ^||TEST("2") = "standard" SET ^||TEST("01") = "not standard" SET NF = "Not Found" WRITE """2""", ": ", $GET(^||TEST("2"),NF), ! WRITE 2, ": ", $GET(^||TEST(2),NF), ! WRITE """01""", ": ", $GET(^||TEST("01"),NF), ! WRITE 1, ": ", $GET(^||TEST(1),NF), !, ! SET SUBS=$ORDER(^||TEST("")) WRITE "Subscript Order:", ! WHILE (SUBS '= "") { WRITE SUBS, ! SET SUBS=$ORDER(^||TEST(SUBS)) }

### Decimal to $DOUBLE

Conversion to $DOUBLE format is done explicitly via the $DOUBLE function. This function also permits the explicit construction of IEEE representations for not-a-number and infinity via the expression, $DOUBLE(<S>) where <S> is:

the string, “nan” to generate a NaN

any one of the strings “inf”, “+inf”, “-inf”, “infinity”, “+infinity”, or “-infinity” for infinity.

the numeric and string literals, -0 and “-0”, respectively

The case of the string, <S>, is ignored on input. On output, only “NAN”, “INF” and “-INF” are produced.

### $DOUBLE to Decimal

Values in $DOUBLE form are converted to decimal values with the $DECIMAL function. The result of calling the function is a string suitable for conversion to a decimal value.

Although this description assumes the value presented to $DECIMAL is a $DOUBLE value, this is not a requirement. Any numeric value may be supplied as the argument and the same rules apply for rounding.

#### $DECIMAL(x)

The single argument form of the function converts the $DOUBLE value given as its argument to decimal. $DECIMAL rounds the decimal portion of the number to 19 digits. $DECIMAL always rounds to the nearest decimal value.

#### $DECIMAL(x, n)

The two-argument form allows precise control over the number of digits returned. If n is greater than 38, an <ILLEGAL VALUE> error occurs. If n, is greater than 0, the value of x rounded to n significant digits is returned.

When n is zero, the following rules are used to determine the value:

If x is an Infinity, return “INF” or “-INF” as appropriate.

If x is a NaN, return “NAN”.

If x is a positive or negative zero, return “0”.

If x can be exactly represented in 20 or fewer significant digits, return the canonical numeric string contains those exact significant digits.

Otherwise, truncate the decimal representation to 20 significant digits, and

If the 20th digit is a “0”, replace it with a “1”;

If the 20th digit is a “5”, replace it with a “6”.

Then, return the resulting string.

This rounding rule involving truncation-to-zero of the 20th digit except when it would inexactly make the 20th digit be a “0” or “5” has these properties:

If a $DOUBLE value is different from a decimal value, these two values will always have unequal representation strings.

When a $DOUBLE value can be converted to decimal without generating a <MAXNUMBER> error, the result is the same as converting the $DOUBLE value to a string and then converting that string to a decimal value. There is no possibility of a “double round” error when doing the two conversions.

### Decimal to String

Decimal values can be converted to strings by default when they are used as such, for example, as one of the operands to the concatenation operator. When more control over the conversion is needed, use the $FNUMBER function.

## Operations Involving Numbers

### Arithmetic

#### Homogeneous Representations

Expressions involving only decimal values will always yield a decimal result. Similarly, expressions with only $DOUBLE values will always produce a $DOUBLE result. In addition,

If the result of a computation involving decimal values overflows, a <MAXNUMBER> error will result. There is no automatic conversion to $DOUBLE in this case as there is for literals.

If a decimal expression underflows, 0 is generated as the result of the expression.

By default the IEEE errors of overflow, divide-by-zero, and invalid-operation will signal the <MAXNUMBER>, <DIVIDE>, and <ILLEGAL VALUE> errors, respectively, rather than generating an Infinity or NaN result. This behavior can be modified by the IEEEError() method of the %SYSTEM.Process class for an individual process or the IEEEError() method of the Config.Miscellaneous class for the system as a whole.

The expression 0 ** 0 (decimal) produces the decimal value, 0; but, the expression $DOUBLE(0) ** $DOUBLE(0) produces the $DOUBLE value, 1. The former has always been true in Caché; the latter is required by the IEEE standard.

#### Heterogenous Representations

Expressions involving both decimal and $DOUBLE representations always produce a $DOUBLE value. The conversion of the value takes place when it is used. Thus, in the expression

1 + 2 * $DOUBLE(4.0)

Caché first adds 1 and 2 together as decimal values. Then it converts the result, 3, to $DOUBLE format and does the multiplication. The result is $DOUBLE(12).

#### Rounding

When necessary, numeric results are rounded to the nearest representable value. When the value to be rounded is equally close to two available values, then:

$DOUBLE values are rounded to even as defined in the IEEE standard

Decimal values are rounded away from zero, that is toward a larger value (in absolute terms)

### Comparison

#### Homogeneous Representations

Comparisons between $DOUBLE(+0) and $DOUBLE(-0) treat these values as equal. This follows the IEEE standard. This is the same as in Caché decimal because, when either $DOUBLE(+0) or $DOUBLE(-0) is converted to a string, the result in both cases is “0”.

Comparisons between $DOUBLE(“nan”) and any other numeric value — including $DOUBLE(“nan”) — will say these values are not greater than, not equal, and not less than. This follows the IEEE standard. This is a departure from usual Caché rule that says the equality comparison is done by converting to strings and checking the strings for equality.

The expression, “nan”, is equal to $DOUBLE(“nan”) because the comparison is done as a string compare.

#### Heterogeneous Representations

Comparisons between a decimal value and $DOUBLE value are fully accurate. The comparisons are done without any rounding of either value. If only finite values are involved then these comparisons get the same answer that would result if both values were converted to strings and those strings were compared using the default collation rules.

Comparison involving the operators <, <=, >, and => always produce a boolean result, 0 or 1, as a decimal value. If one of the operands is a string, that operand is converted to a decimal value before the comparison is performed. Other numeric operands are not converted. As noted, the comparison of mixed numeric types is done with full accuracy and no conversion.

In the case of the string comparison operators (=, '=, ], '], [, '[, ]], ']], and so on), any numeric operand is first converted to a string before the comparison is done.

#### Less-Than Or Equal, Greater-Than Or Equal

In Caché, the operators “<=” and “>=” are treated as synonyms for the operators “'>” and “'<”, respectively.

If the operators “<=” or “>=” are used in comparisons where either or both of the operands may be NaNs, the results will be different from those mandated by the IEEE standard.

The expression “A >= B” when either A and/or B is a NaN is interpreted as follows:

The expression is transformed to “A '> B”.

It is further transformed to “'(A <B)”.

As noted previously, comparisons involving NaNs give results that are (a) not equal, (b) not greater-than, and (c) not less-than, so the expression in parenthesis results in a value of false.

The negation of that value results in a value of true.

The expression “A >= B” can be rewritten to provide the IEEE expected results if it is expressed as “((A > B) | (A = B))”.

### Boolean Operations

For boolean operations and, or not, nor, nand and so on) any string operand is converted to decimal. Any numeric operand (decimal or $DOUBLE) is left unchanged.

A numeric value that is zero is treated as FALSE; all other numeric values (including $DOUBLE(“nan”) and $DOUBLE(“inf”)) are treated as TRUE. The result is 0 or 1 (as decimal.)

$DOUBLE(-0) is also false.

## Summary of Changes Introduced in Version 2008.2

The following are the significant changes to numeric processing for Caché 2008.2 and following:

Conversion of $DOUBLE values to a decimal string representation will now provide 20 significant digits. In prior releases, the conversion yielded 15.

Computational errors involving $DOUBLE values result in Caché errors (<MAXNUMBER>, <DIVIDE>, <VALUE>) by default. They do not result in the IEEE infinity or NaN values. This can be changed via the IEEEError() method of the %SYSTEM.Process class for an indivudal process or the IEEEError() method of the Config.Miscellaneous class for the system as a whole.

There is an automatic conversion of numeric literal values outside the range of Caché decimal to $DOUBLE. This is true only for literals. It does not happen for the results of computations.

## Exact Representation of Values

Each of the number forms has a bounds on the number of significant digits that can be retained. A common belief is that numbers with more digits than the bounds automatically result in a loss of precision. As a generalization, this is not true.

This effect is most easily seen in Caché decimal values. The value 9,223,372,036,854,775,807 is naively considered the biggest number that can be represented exactly, but that ignores the exponent. It is obvious that 9,223,372,036,854,775,807,000,000 can also be represented exactly even though it is a million times larger. The same is true for all the other numbers with an exponent of zero whose values lie in the allowed range. But this is not all possible numbers in the full decimal range.

In this representation, the situation is similar except that instead of powers of 10, the numbers are multiplied by powers of 2. The situation can be graphically illustrated if we consider a very simple model that has the following characteristics:

It always deals with positive numbers so no sign bit is needed

It has a 3-bit unnormalized significand with values in the range 0 to 7

It has an unbiased, signed 3–bit exponent whose values run from -4 to +3.

The following table indicates the values that can be represented:

Exponent | -4 | -3 | -2 | -1 | 0 | 1 | 2 | 3 |
---|---|---|---|---|---|---|---|---|

Significand | ||||||||

0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

1 | 0.0625 | 0.125 | 0.25 | 0.5 | 1 | 2 | 4 | 8 |

2 | 0.125 | 0.25 | 0.5 | 1 | 2 | 4 | 8 | 16 |

3 | 0.1875 | 0.375 | 0.75 | 1.5 | 3 | 6 | 12 | 24 |

4 | 0.25 | 0.5 | 1 | 2 | 4 | 8 | 16 | 32 |

5 | 0.3125 | 0.625 | 1.25 | 2.5 | 5 | 10 | 20 | 40 |

6 | 0.375 | 0.75 | 1.5 | 3 | 6 | 12 | 24 | 48 |

7 | 0.4375 | 0.875 | 1.75 | 3.5 | 7 | 14 | 28 | 56 |

Some things to note:

With only three bits for the significand, this model does not have enough precision for a single decimal digit. The number 9 is missing; representing it requires a significand of 1001 which is clearly too long.

Exact comparisons of any of these binary numbers with fully precise decimal strings will require that the representation use 4 significant decimal digits.

There are numbers in the range 0 to 56 that cannot be represented. The value 2.75 is one example. Converting the significand to an integer by stages results in values of 5.5, and 11 while the exponent correspondingly becomes -1 and then -2. But 11 is a value that cannot be represented in three bits and is, therefore, not in the preceding table.

The following diagram shows how sparsely the values that can be represented by this form fit onto the number line. For illustration purposes, the increment is the value of the smallest representable value, 0.0625 or one-sixteenth. The number line is folded so that each line is four units long – 64 increments. The Xs mark the numbers that this format can represent. Of the possible 56*16=896 positions available, only 64 (an average of 1 out of every 14) can be exactly represented.

+0 +1 +2 +3 | | | | 0 XXXXXXXXX.X.X.X.X...X...X...X...X.......X.......X.......X....... 4 X...............X...............X...............X............... 8 X..............._...............X..............._............... 12 X..............._...............X..............._............... 16 X..............._..............._..............._............... 20 X..............._..............._..............._............... 24 X..............._..............._..............._............... 28 X..............._..............._..............._............... 32 X..............._..............._..............._............... 36 _..............._..............._..............._............... 40 X..............._..............._..............._............... 44 _..............._..............._..............._............... 48 X..............._..............._..............._............... 52 _..............._..............._..............._............... 56 X..............._..............._..............._...............

An analogous situation applies to Caché decimal values. The largest integer (with an exponent of zero) that can be represented exactly is 9,223,372,036,854,775,807. For the sake of discussion, call that value, MAX. Then it is also true that MAX*10 and (MAX-3)*100 have exact representations; their exponents are nonzero but their significands are exact. However, there are values such as MAX+1 that cannot be represented exactly.

$DOUBLE values experience this phenomenon too except that the exponent is a power of 2, not 10 as in Caché.

## See Also

For more information, see the following sources:

The IEEE-754–1985 standard. The full title of this standard is “IEEE Standard for Binary Floating-Point Arithmetic”. In the United States, its standards designation is ANSI/IEEE Std 754-1985.

This is also an international standard. Its international reference is IEC 60559:1989 entitled “Binary floating-point arithmetic for microprocessor systems.”

“What Every Computer Scientist Should Know About Floating-Point Arithmetic,” by David Goldberg, published in the March, 1991 issue of Computing Surveys.