# Vector Search
Typesense has the ability to index embeddings generated by any machine learning model, and then do a nearest-neighbor (KNN) search on this data.
- Use-cases
- What is an embedding?
- Index Embeddings
- Option A: Importing externally-generated embeddings into Typesense
- Option B: Auto-embedding generation within Typesense
- Creating an auto-embedding field
- Using Built-in Models
- Using a GPU (optional)
- Using OpenAI API
- Using OpenAI-compatible APIs
- Using Google PaLM API
- Using GCP Vertex AI API
- Remote Embedding API parameters
- Using your own models
- Nearest-neighbor vector search
- Querying for similar documents
- Semantic Search
- Hybrid Search
- Searching with historical queries
- Rank keyword search via vector search
- Brute-force searching
- Configuring HNSW parameters
- UI Examples
# Use-cases
Here are some example use-cases you can build, using vector search as the foundation:
- Semantic search
- Recommendations
- Hybrid search (Keyword Search + Semantic Search + Filtering)
- Visual image search
- Integrate with LLMs (opens new window), to get them to respond to queries using your own dataset (RAG)
You can also combine any of the above with features like filtering, faceting, sorting, grouping, etc to build a user-friendly search experience.
# What is an embedding?
An embedding for a JSON document is just an array of floating point numbers (eg: [0.4422, 0.49292, 0.1245, ...]
), that is an alternate numeric representation of the document.
These embeddings are generated by Machine Learning models in such a way that documents that are "similar" to each other (for different definitions of similarity depending on the model used), have embeddings that are "closer" to each other (cosine similarity).
Here are some common models you can use to generate these document embeddings:
- Sentence-BERT
- E-5
- CLIP
- OpenAI's Text Embeddings model
- Google's PaLM API
- Google's Vertex API
You can import embeddings generated by these models into Typesense into a special vector field and then do a nearest neighbor search, giving another set of vectors or a document ID as the input, and get the documents that are closest (cosine similarity) to your input.
You can also have Typesense generate these embeddings for you, using OpenAI, PaLM API or one of the built-in ML models listed here (opens new window).
# Live Demo
Here is one (of many possible) practical applications of vector search - a "Find Similar" feature in an ecommerce store: ecommerce-store.typesense.org (opens new window). (Click on Find Similar
below each product).
# Read More
Here are two articles that talk about embeddings in more detail:
- What Are Word and Sentence Embeddings? (opens new window)
- Getting Started With Embeddings (opens new window)
Let's now discuss how to do index and search embeddings in Typesense.
# Index Embeddings
# Option A: Importing externally-generated embeddings into Typesense
If you have already generated embeddings using your own models outside Typesense, you can import them into Typesense.
TIP
Here's (opens new window) a quick example of how to use the Sentence-BERT model to generate embeddings outside Typesense.
Once your document embeddings are ready, you want to create a collection that contains a float[]
field
with a num_dim
property for indexing them. The num_dim
property specifies the number of
dimensions (length of the float array) that your embeddings contain.
Let's create a collection called docs
with a vector field called embedding
that contains just 4 dimensions.
TIP
We're creating a vector with 4 dimensions in the examples to keep the code snippets readable.
Depending on what model you use, real world use will require creating vector fields with at least 256 dimensions to produce good results.
Let's now index a document with a vector.
# Option B: Auto-embedding generation within Typesense
To simplify the process of embedding generation, Typesense can automatically use your JSON data and either OpenAI API, PaLM API or any of the built-in embedding models listed here (opens new window) to generate & store embeddings.
When you do a search query on this automatically-generated vector field, your search query will be vectorized using the same model used for the field, which then allows you to do semantic search or combine keyword and semantic search to do hybrid search.
# Creating an auto-embedding field
To create a field that automatically embeds other string or string array fields, you need to set the embed
property of the field.
Here's an example:
In this example the embedding
vector field will be generated automatically while indexing a document, using the concatenated values of the product_name
and categories
fields (separated by spaces).
# Using Built-in Models
These models are officially supported by Typesense and stored in the Typesense Hugging Face repository here (opens new window).
You can specify them by adding the ts
namespace before the model name. Typesense will automatically download these models and make them available for use when you index documents after creating the collection.
When you create a collection with the schema above, the all-MiniLM-L12-v2
model will be downloaded and your documents will be automatically embedded by this model and will be stored in the embedding
field.
See our Hugging Face repo (opens new window) for all officially supported models. If you need support for additional publicly-available models, feel free to convert the model to ONNX format and send a PR to our Hugging Face models repo (opens new window).
# Using a GPU (optional)
Embedding models are computationally intensive to run. So when using one of the built-in models, you might want to consider running Typesense on a server with a GPU to improve the performance of embedding generation, especially for large datasets.
# On Typesense Cloud:
For select RAM / CPU configurations (opens new window), you'll find the option to turn on "GPU Acceleration" when provisioning a new cluster or under Cluster Configuration > Modify for Typesense versions 0.25.0
and above.
# When Self Hosting:
Follow the installation guide.
# Using OpenAI API
You can also have Typesense send specific fields in your JSON data to OpenAI's API to generate text embeddings.
You can use any of OpenAI models listed here (opens new window).
When you create the collection above, we will call the OpenAI API to create embeddings from the product_name
field and store them in the embedding
field every time you index a document.
You have to provide a valid OpenAI API key in model_config
to use this feature.
# Using OpenAI-compatible APIs
You can also use OpenAI-API-compatible API providers like Azure, by customizing the base URL in the model_config
:
When you create the collection above, Typesense will call the OpenAI-API compatible API server running behind https://your-custom-openai-compatible-api.domain.com
to create embeddings from the product_name
field and store them in the embedding
field every time you index a document.
The custom API server behind the specified URL should provide the following endpoint:
Endpoint:
POST /v1/embeddings
Request Body:
Parameter | Type | Description |
---|---|---|
model | string | Model name |
input | string or string[] | Input string or string array |
Response Body:
{
"data": [
{
"embedding": [
....
]
}
]
}
Response body might have additional data, but the embeddings MUST be returned in the format above.
Error Response Body:
{
"error": {
"message": "Error message",
"type": "error_type",
"param": null,
"code": "error_code"
}
}
# Using Google PaLM API
This API provided by Google MakerSuite (opens new window) to generate embeddings.
Note: The only supported model is embedding-gecko-001
for now.
# Using GCP Vertex AI API
This API also provided by Google under the Google Cloud Platform (GCP) umbrella.
You would need the following authentication information to use this method:
- GCP access token (must be valid while creating the field)
- GCP refresh token
- GCP application client ID
- GCP application client secret
- GCP project ID
Please refer to the Vertex AI docs for more information on how to fetch these values.
# Remote Embedding API parameters
You can use any of the following parameters to fine-tune how API calls are made to remote embedding services:
# During Search
Parameter | Description | Default |
---|---|---|
remote_embedding_timeout_ms | How long to wait until an API call to a remote embedding service is considered a timeout, during a search | 30s |
remote_embedding_num_tries | The number of times to retry an API call to a remote embedding service on failure, during a search | 2 |
# During Indexing
Parameter | Description | Default |
---|---|---|
remote_embedding_batch_size | Max size of each batch that will be sent to remote APIs while importing multiple documents at once. Using lower amount will lower timeout risk, but increase number of requests made. | 200 |
remote_embedding_timeout_ms | How long to wait until an API call to a remote embedding service is considered a timeout during indeixng. | 60000 |
remote_embedding_num_tries | The number of times to retry an API call to a remote embedding service on failure during indexing. | 2 |
# Using your own models
You can also use your own models to generate embeddings from within Typesense. They must be in the ONNX file format.
Create a directory under <data_dir>/models
and store your ONNX model file, vocab file, and a JSON for model config there.
Note: Your model file MUST be named as model.onnx
and the config file MUST be named as config.json
.
# Model config file
This file will contain information about the type of model you want to use.
The JSON file must contain model_type
(type of the model; we support bert
and xlm_roberta
at the moment) and vocab_file_name
keys.
Directory Structure:
<data_dir>/models/test_model/model.onnx
<data_dir>/models/test_model/vocab.txt
<data_dir>/models/test_model/config.json
Contents of config.json
:
{
"model_type": "bert",
"vocab_file_name": "vocab.txt"
}
Create an embedding field using the directory name as model_name
in model_config
.
# Optional Model Parameters
These are optional model parameters, which may be required to use with your custom models.
# Indexing prefix and query prefix
Some models may require a prefix to know if texts are queries or they are actual texts to query on (you can check intfloat/e5-small
, for example).
If you set this property in model_config
, the given indexing prefix will be added to the text that will be used to create embeddings when you index a document and query_prefix
to the actual query before creating embeddings of it.Example:
For this example, when you index a document:
{
"product_name": "ABCD"
}
The text used to generate embeddings for the embedding
field will be passage: ABCD
instead of ABCD
. And when you query, if your query is EFGH
, it will be embedded as query: EFGH
instead of EFGH
.
# Nearest-neighbor vector search
Once you've indexed your embeddings in a vector field, you can now search for documents that are "closest" to a given query vector.
To control the number of documents that are returned, you can either use the per_page
pagination parameter or the k
parameter within the vector query.
NOTE: If both per_page
and k
parameters are provided, the larger value is used.
Every matching hit in the response will contain a vector_distance
field that indicates how "close" the document's
vector value is to the query vector. Typesense uses the cosine similarity, so this distance will be a value between
0
and 2
.
- If the document's vector perfectly matches the query vector, the distance will be
0
- If the document's vector is extremely different from the query vector, then the distance will be
2
.
The hits are automatically sorted in ascending order of the vector_distance
, i.e. best matching documents appear first.
TIP
Since vector search queries tend to be large because of the large dimension of the query vector, we are
using the multi_search
end-point that sends the search parameters as a POST request body.
Network Bandwidth Optimization
By default Typesense returns all fields in the document as part of the search API response.
So if your documents contain a vector field, this could lead to a lot of floating point vector data returned by Typesense for each search query, unnecessarily, eating into your network bandwidth and could lead to a lot of wasted CPU cycles.
To prevent this, you want to add exclude_fields: "your_embedding_field_name"
as a search parameter.
Sample Response
# Querying for similar documents
If you have a particular document id
and want to find documents that are "similar" to this document, you can do a vector query that references this id
directly.
curl 'http://localhost:8108/multi_search' \
-X POST \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
-d '{
"searches": [
{
"collection": "docs",
"q": "*",
"vector_query": "embedding:([], id: foobar)"
}
]
}'
# Be sure to replace `embedding` with the name of the field that stores your embeddings.
By specifying an empty query vector []
and passing an id
parameter, this query
would return all documents whose embedding
value is closest to the foobar
document's embedding
value.
TIP
The foobar
document itself will not be returned in the results.
# Semantic Search
When using auto-embedding, you can directly set query_by
to the auto-embedding field to do a semantic search on this field.
Typesense will use the same embedding model that was used to generate the auto-embedding field to generate vectors for the q
parameter and then do a nearest neighbor search internally.
This will automatically embed the chair
query with the same model used for the embedding
field and will perform a nearest neighbor vector search.
# Hybrid Search
When using auto-embedding, you can set query_by
to a list of both regular fields and auto-embedding fields, to do a hybrid search on multiple fields.
Typesense will do a keyword search on all the regular fields, and a semantic search on the auto-embedding field and combine the results using Rank Fusion to arrive at a fusion score that is used to rank the hits.
K = rank of document in keyword search
S = rank of document in semantic search
rank_fusion_score = 0.7 * K + 0.3 * S
The 0.7
and 0.3
values can be changed using the alpha
parameter.
TIP
During hybrid search, the _text_match
clause in sort_by
will refer to the combined fusion score.
If you are populating the embedding field externally, without using auto-embedding, you can still do a hybrid
search by passing the embedding of the query string manually via the vector_query
parameter.
curl 'http://localhost:8108/multi_search' \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
-X POST \
-d '{
"searches": [
{
"q": "chair",
"query_by": "product_name,embedding",
"vector_query": "embedding:([0.2, 0.4, 0.1])",
"sort_by": "_text_match:desc"
}
]
}'
Typesense will do a keyword search using the q
parameter, and a nearest neighbor search
using the vector_query
field and combine the results into a ranked set of results using rank fusion as described earlier.
# Weightage for Semantic vs Keyword matches
By default, Typesense assigns a weight of 0.3
for vector search rank and a weight of 0.7
for keyword search rank.
You can adjust the weight assigned to vector search ranking via the alpha
option of the vector_query
parameter.
For example, to set a weight of 0.8
to vector search ranking, set alpha
to 0.8
:
curl 'http://localhost:8108/multi_search' \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
-X POST \
-d '{
"searches": [
{
"collection": "products",
"query_by": "embedding,product_name",
"q": "chair",
"vector_query": "embedding:([], alpha: 0.8)",
"exclude_fields": "embedding"
}
]
}'
TIP
When querying on both an embedding field and regular search fields, some parameters like query_by_weights
won't have an impact on an embedding field mentioned in query_by
. However, since length of query_by_weights
must match the length of query_by
, you can use a placeholder value like 0
.
# Distance Threshold
You can also set a maximum vector distance threshold for results of semantic search and hybrid search. You should set distance_threshold
in vector_query
parameter for this.
# Sorting hybrid matches on vector distance
If you want to fetch both keyword and vector search matches but sort the results only on the vector distance, you can use
the special sort keyword _vector_distance
in sort_by
.
Here's an example:
{
"q": "chair",
"query_by": "title,embedding",
"sort_by": "popularity_score:desc,_vector_distance:asc"
}
We are searching on both the title text field and the embedding
vector field, but the final results are sorted
first on popularity_score
and then on vector distance.
# Searching with historical queries
You can send a list of search queries via the queries
parameter to make vector search compute a weighted query embedding
from the queries. You can use this for personalizing search results based on historical search queries.
In the following example, vector search is made from the embedding calculated from the embeddings of the queries
smart phone
and apple ipad
.
We can also use the optional query_weights
parameter to assign appropriate weights to the queries. If the
query_weights
parameter is not passed, all queries will have equal weightage.
{
"vector_query": "embedding:([], queries:[smart phone, apple ipad], query_weights:[0.9, 0.1])"
}
# Rank keyword search via vector search
Instead of combining the scores from both keyword and vector search, you can also use vector search distances as a sorting clause for reordering keyword search hits.
In the example below, we are using the vector distance as a secondary sorting condition to text match score.
{
"q": "shoes",
"query_by": "title,brand",
"sort_by": "_text_match:desc,_vector_query(embedding:([0.43, 0.13, 0.21])):asc"
}
# Brute-force searching
By default, Typesense uses the built-in HNSW index to do approximate nearest neighbor vector searches. This scales
well for large datasets. However, if you wish to bypass the HNSW index and do a flat / brute-force ranking of
vectors, you can do that via the flat_search_cutoff
parameter.
For example, if you wish to do brute-force vector search when a given query matches fewer than 20 documents, sending
flat_search_cutoff=20
will bypass the HNSW index when the number of results found is less than 20.
Here's an example where we are filtering on the category
field and asking the vector search to use direct
flat searching if the number of results produced by the filtering operation is less than 20 results.
curl 'http://localhost:8108/multi_search' \
-X POST \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
-d '{
"searches": [
{
"collection": "docs",
"q": "*",
"filter_by": "category:shoes",
"vector_query": "embedding:([0.96826, 0.94, 0.39557, 0.306488], k:100, flat_search_cutoff: 20)"
}
]
}'
# Be sure to replace `embedding` with the name of the field that stores your embeddings.
# Configuring HNSW parameters
# Indexing parameters
You can set ef_construction
(default: 200
) and M
(default: 16
) for vector and embedding fields while creating
the collection.
{
"name": "docs",
"fields": [
{
"name": "vec",
"type": "float[]",
"num_dim": 768,
"hnsw_params": {
"ef_construction": 100,
"M": 8
}
}
]
}
# Search Parameters
You can set a custom ef
via the vector_query
parameter (default value is 10
).
{
"vector_query" : "vec:([], ef:100)"
}
# UI Examples
Here's (opens new window) a demo that shows you how to implement Hybrid Search (Semantic Search + Keyword Search + Filtering + Faceting) using Typesense's built-in embedding generation mechanism.
Here's (opens new window) a demo that shows you how to implement a "Find Similar" feature using Vector Search in an ecommerce store.
Click on "Find Similar" below each product tile for notes on how to implement this.
Here's (opens new window) a demo that shows you how to implement Semantic Search, using an external embeddings API and Vector Search.