Caché MultiValue Basic Reference
Returns the Soundex code for an alphabetic string.
||An expression that resolves to an alphabetic string.
function is used to group and sort near-equivalents of alphabetic strings, such as variant spellings of a name. The Soundex algorithm takes an alphabetic string of any length, such as a name or an English word or phrase, and returns a four-character equivalence code. This code consists of the first recognized letter of the string (which may not be the first character), followed by three integers between 0 and 6 (inclusive) for the remaining 3 code characters. The three numbers assigned by the Soundex algorithm represent up to three distinct consonant sounds (syllables) that follow the initial letter. Repeating letters (such as mm or mn) have no effect on assigning a Soundex number.
For example, Fred is represented as F630, because F is the first character, 6 is assigned to the letter sound R, 3 is assigned to the letter sounds D or T, and 0 indicates that there are no more consonant sounds in the string. Note that vowels and unvoiced letters (A, E, I, O, U, H, W, Y) are not assigned a number. Ann, Anne, Anna, Ana, and Annie are all represented by A500. Anita, Anida, Annette, and Ann T. are all represented by A530. Anton, Anthony, Anoinette are all represented by A535.
Caché MVBasic uses the Soundex algorithm used by the United States Census Bureau; this is not the same algorithm used by other MultiValue implementations. Therefore, all files using Soundex should be regenerated when moving them to Caché MultiValue. The MVBasic Soundex numeric codes for English consonants are as follows: 1=B,F,P,V; 2=C,G,J,K,Q,S,X,Z; 3=D,T; 4=L; 5=M,N, 6=R.
The Soundex algorithm is not case-sensitive; all Soundex codes return the first recognized letter as an uppercase letter, regardless of its case in the input string. All non-alphabetic characters are ignored, including numbers, punctuation characters, and blank spaces. Soundex does not recognize accented letters or non-Latin letters. For example, Ü-boat returns B300, exactly the same as Boat. If SOUNDEX
cannot recognize at least one letter in string
, it returns 0000 (four zeros). If string
is the null string, SOUNDEX
returns the null string.
The following examples use the SOUNDEX
function to return equivalence codes. Note how the Soundex code is established by the initial letter and the next three significant consonants:
PRINT SOUNDEX("M"); ! Returns M000
PRINT SOUNDEX("MMMM"); ! Returns M000
PRINT SOUNDEX("Mc"); ! Returns M200
PRINT SOUNDEX("Mac"); ! Returns M200
PRINT SOUNDEX("McD"); ! Returns M230
PRINT SOUNDEX("McT"); ! Returns M230
PRINT SOUNDEX("McDuff"); ! Returns M231
PRINT SOUNDEX("McDufflebag"); ! Returns M231