Skip to main content
InterSystems IRIS for Health 2024.3
AskMe (beta)
Loading icon

Using Vector Search

Vector search is a foundational concept to systems that use machine learning and artificial intelligence. After using an embedding model to transform unstructured data, such as text or images, into embedding vectors, users can perform operations on these vectors to process input and return the vectors that are the most semantically similar.

InterSystems IRIS SQL supports the storage of vectors in the compact and performant VECTOR and EMBEDDING types, enabling you to efficiently store vectorized data as a part of your traditional relational schemas. By leveraging the EMBEDDING type, InterSystems IRIS SQL converts text into embedding vectors through familiar SQL syntax without interacting with an embedding model directly.

Vectors and Embeddings

Vectors can be used to represent the semantic meaning in embeddings. These embeddings are determined by an embedding model, which is a machine learning model that maps text, images, or audio to a geometric space with high dimensionality. This page talks specifically using an embedding model with text, but the procedures described here can be used on many different types of data.

Modern embedding vectors typically range between hundreds and thousands of dimensions. Words that share similar semantic meaning occupy nearby positions in that space, while words with disparate semantic meaning occupy positions distant from each other. These spatial positions allow an application to algorithmically determine the similarity between two words or sentences by computing the distance between them on their embedding vectors.

In vector search, a user compares an input vector with the vectors stored in the database using a distance function, such as the dot product. A vector search enables you to algorithmically determine which pieces of text are semantically most similar to an input. As a result, vector search is a great fit for tasks that involve information retrieval.

InterSystems IRIS SQL supports a dedicated VECTOR type that leverages SIMD CPU instructions to efficiently perform distance computations and an EMBEDDING type that uses all the optimizations of the VECTOR type, but also simplifies the conversion of a string value from a source field in the same table (typically of type VARCHAR) into an embedding vector. There are three numeric vector types: double (widely used in embeddings), decimal, and integer.

As VECTOR and EMBEDDING are standard SQL data types, you can store them alongside other data in a relational table, converting a SQL database transparently into a hybrid vector database. To insert VECTOR-typed data with an INSERT statement, use the TO_VECTOR function; note that this requires you to manually create embeddings before you can include them in the INSERT statement. To insert EMBEDDING-typed data, define an embedding configuration, then use an INSERT statement to insert the string into the EMBEDDING column’s source field. InterSystems recommends using the EMBEDDING type for ease of use.

Inserting VECTOR-typed Data

Translate Text to Embeddings

Before you can store embeddings as vectors in InterSystems IRIS, you first need to create them from a source. In general, you can transform a piece of text into an embedding in four steps.

  1. Import a package to use to turn your text into a series of embeddings.

  2. Pre-process your text to best fit your chosen embedding model’s input specifications.

  3. Instantiate the model and convert your text to the embeddings, using your chosen package’s workflow.

With Embedded Python, you can execute Python code that converts text to embeddings (using whichever package you like) alongside ObjectScript code that can insert those embeddings directly into your database. For information on importing a Python package into InterSystems IRIS, see Install and Import Python Packages.

Example: Create Embeddings with Embedded Python

The following example takes an input list of sentences, passed as a built-in Python list, converts them to embeddings using the sentence_transformers package, and returns a list of the embeddings for later insertion into a table. See Performing an INSERT for more information on inserting the embeddings. This example assumes that the input sentences have already been pre-processed to match the embedding model’s specifications.

ClassMethod GetEmbeddingPy(sentences) [ Language = python ]
{
  import json

  # import the package
  import sentence_transformers

  # perform any preprocessing

  # create the model and form the embeddings
  model = sentence_transformers.SentenceTransformer('all-MiniLM-L6-v2')
  embeddings = model.encode(sentences)

  return embeddings
}

Perform an INSERT

Once you have a list of strings that represent embeddings, you can insert them into your table as VECTORs either with an INSERT statement or by creating an object and storing the embedding as a property of the object. The following example demonstrates how to insert data with an INSERT.

Example

For each embedding, execute an INSERT statement that adds the embedding to the desired table. Use TO_VECTOR to convert the string representation of the embedding to a VECTOR.

In the following command, one embedding is inserted into a table called Sample.Description that has two columns: one for the embedding that represents a textual description and the other for a unique identifier that can be used to link the embedding with the text it is derived from (implicitly stored in a separate table with the same unique identifier). (Note that the example uses ?s as placeholders for the embedding and the unique identifier, as these are typically supplied programmatically as parameters to the statement.)

INSERT INTO Sample.Descriptions (DescriptionEmbedding, UID) 
   VALUES (TO_VECTOR(?,double), ?)

The following code samples use this query to insert a single embedding into the table.

 ClassMethod InsertEmbeddings(embedding As %String, uid As %Integer)
 {
   set sc=$$$OK
   try {
     set myquery = "INSERT INTO Sample.Descriptions (DescriptionEmbedding, UID)" 
                  _"VALUES (TO_VECTOR(?,double), ?)"
     set tStatement = ##class(%SQL.Statement).%New()
     $$$ThrowOnError(tStatement.%Prepare(query))
 
     set rset = tStatement.%Execute(embedding, uid)
     if (rset.%SQLCODE < 0) {
       throw ##class(%Exception.SQL).CreateFromSQLCODE(rset.%SQLCODE,rset.%Message)
     }
   }
   catch e {
     set sc = e.AsStatus()
     return sc
   }
   return 0
 }
 public void InsertEmbeddings(String embeddings, Integer uid) {
   try {
     // set connection parameters
     IRISDataSource ds = new IRISDataSource();
     ds.setServerName("127.0.0.1");
     ds.setPortNumber(51776);
     ds.setDatabaseName("USER");
     ds.setUser("_SYSTEM");
     ds.setPassword("SYS");
     IRISConnection conn = ds.GetConnection();
 
     String sql = "INSERT INTO Sample.Embeddings (Embedding, UID) " + 
                  "VALUES (TO_VECTOR(?,double), ?)";
     PreparedStatement pstmt = conn.prepareStatement(sql);
 
     pstmt.SetString(embedding);
     pstmt.SetInt(uid);
  
     pstmt.executeUpdate();
  
     pstmt.close();
   } catch (Exception ex) {
     System.out.println("caught exception: "
             + ex.GetClass().getName() + ": " + ex.GetMessage());
   }
 }
 def insertEmbeddings(embeddings, uid):
   // set the connection parameters
   conn_string = "localhost:1972/USER"
   username = "_system"
   password = "SYS"
   connection = iris.connect(conn_string, username, password)
   cursor = connection.cursor()
   
   try:
     sql = "INSERT INTO Sample.Embeddings (Embedding, UID) " +
           "VALUES (TO_VECTOR(?,double), ?)"
     params = [embeddings,uid]
     cursor.execute(sql,params))
     cursor.close()
   except Exception as ex:
     print(ex)
   finally:
     if cursor:
       cursor.close()
     if connection:
       connection.close()
   
 static void InsertEmbeddings(string emb, Integer uid)
 {
    // set the connection parameters
    String host = "127.0.0.1";
    String port = "51776";
    String username = "_SYSTEM";
    String password = "SYS";
    String namespace = "USER";

    IRISConnection conn = new IRISConnection();
    IRISConnect.ConnectionString = "Server = " + host
        + "; Port = " + port + "; Namespace = " + namespace
        + "; Password = " + password + "; User ID = " + username;

    conn.Open();

    String sql = "INSERT INTO Sample.Embeddings (Embedding, UID) " +
                 "VALUES (TO_VECTOR(?,double), ?)";
    IRISCommand cmd = new IRISCommand(sql, conn);
    cmd.ExecuteNonQuery();

    cmd.Dispose();
    conn.Close();
 }

Inserting EMBEDDING-typed Data

Create an Embedding Configuration

Before you can insert embeddings into a table, you must decide on an embedding model you wish to use to convert text into an embedding. Once you have selected a model, you can insert metadata for that model into the %Embedding.Config table to create an embedding configuration. The embedding configuration stores the information needed to make API calls to your chosen embedding model.

The %Embedding.Config table has five columns:

  • Name: The unique and valid identifier used to refer to the model.

  • Configuration: A JSON-formatted string that includes particular data for a particular source.

  • EmbeddingClass: The name of the ObjectScript class that extends %Embedding.Interface and defines logic for retrieving embeddings from the endpoint. InterSystems IRIS provides two such classes out of the box: %Embedding.OpenAI and %Embedding.SentenceTransformers. To use an embedding model from a different source, you must manually define a new class that extends the %Embedding.Interface class.

  • VectorLength: The length of the vector (number of dimensions) the embedding model returns.

  • Description: An optional description of this endpoint.

The following example inserts an embedding configuration for an OpenAI embedding modelOpens in a new tab into the %Embedding.Config table.

INSERT INTO %Embedding.Config (Name, Configuration, EmbeddingClass, VectorLength, Description)
  VALUES ('my-opanai-config', 
          '{"apiKey":"<api key>", 
            "sslConfig": "llm_ssl", 
            "modelName": "text-embedding-3-small"}',
          '%Embedding.OpenAI', 
          1536,  
          'a small embedding model provided by OpenAI') 

Define a Table with EMBEDDING-type Columns

Once you have stored an embedding configuration in the %Embedding.Config table, you can create a table that uses this configuration to populate EMBEDDING-typed columns.

When defining a column with the embedding type, you must specify two arguments for the model and source parameters. The model parameter is the name of the embedding configuration that converts text into an embedding. The source parameter is a comma separated list of properties in the class that are used to calculate the embedding.

The following example creates a table with four columns, two of which store strings and two of which store embeddings of those strings. Both of the EMBEDDING-typed fields use an embedding configuration called my-openai-config.

CREATE TABLE Embedding.Example (
  Description VARCHAR(200),
  Name VARCHAR(30),
  DescriptionEmbedding EMBEDDING('my-opanai-config','Description'),
  NameEmbedding EMBEDDING('my-opanai-config','Name')
)

See More

Data Types
FeedbackOpens in a new tab