Skip to main content

VECTOR_COSINE (SQL)

Finds the cosine similarity between two vectors and returns the result.

Synopsis

VECTOR_COSINE(vec1, vec2)

Description

The VECTOR_COSINE function finds the cosine similarity of two input vectors. These vectors must be of a numeric type, either integer, double, or decimal. The result is the value of the cosine between them, represented as a double between -1 and 1. This value is calculated as the dot product of the two vectors divided by the product of the lengths of the two vectors. The formula is represented below:

generated description: cosine calc

The cosine similarity is one measure of similarity between two vectors. If the cosine similarity is positive, the two vectors are considered to be similar; if the cosine similarity between two vectors is 1, the two vectors are exactly the same. If the cosine similarity is negative, the two vectors are considered to be dissimilar; if the cosine similarity between two vectors is -1, the two vectors have nothing in common.

InterSystems recommends only using this function on unit vectors (that is, vectors normalized to 1). When this operation is performed on non-unit vectors, the result can rapidly approach the numeric overflow value. The following table lists what the overflow (and underflow) values for each valid VECTOR type are:

  Overflow Boundary Underflow Boundary
Integer 2,147,483,647 (231-1) -2,147,483,648 (-231)
Double 1.79769e+308 2.22507e-308
Decimal 1.79769e+308 2.22507e-308
Float 3.402823e+38 1.175494e-38

Arguments

vec1, vec2

Two vectors that are both the same numeric type and the same length. If the vectors are of differing numeric types, the function fails with a SQLCODE -259.

If you specify a vector that has a non-numeric type, such as string, the function fails with a SQLCODE -258.

If the two vectors have different lengths, the function fails with a SQLCODE -257 error.

Examples

The following example uses the VECTOR_COSINE function on two integer vectors.

SELECT VECTOR_COSINE(TO_VECTOR('6,4,5',integer), TO_VECTOR('1,4,3',integer))

The following example uses the VECTOR_COSINE function to return the text descriptions that are most similar to results based on the similarity between the vector stored in each row and an input vector. In reality, an embedding vector has many more than three elements; however, for the sake of keeping this example brief, the embedding vector is represented as having three elements.

SELECT TOP 5 Description FROM Sample.DescAndEmbedding emb
ORDER BY VECTOR_COSINE(Embedding,TO_VECTOR('0.3,0.4,0.6',double)) DESC

See Also

FeedbackOpens in a new tab