# Conversational Search (RAG)
Typesense has the ability to respond to free-form questions, with conversational responses and also maintain context for follow-up questions and answers.
Think of this feature as a ChatGPT-style Q&A interface, but with the data you've indexed in Typesense.
Typesense uses a technique called Retrieval Augmented Generation (opens new window) (RAG) to enable this style of conversational searches.
Instead of having to build your own RAG pipeline, Typesense essentially has built-in RAG using it's Vector Store for semantic search, and it's pre-built integration with LLMs for formulating conversational responses.
# Create a Conversation History collection
Let's start by creating a Typesense collection called conversation_store
that will store the conversation history generated by the conversational search feature.
The collection can be named anything, but it needs to have the following defined schema, since this collection is auto-populated by Typesense internally, as conversations happen.
curl "http://localhost:8108/collections" \
-X POST \
-H "Content-Type: application/json" \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
-d '{
"name": "conversation_store",
"fields": [
{
"name": "conversation_id",
"type": "string"
},
{
"name": "model_id",
"type": "string"
},
{
"name": "timestamp",
"type": "int32"
},
{
"name": "role",
"type": "string",
"index": false
},
{
"name": "message",
"type": "string",
"index": false
}
]
}'
# Create a Conversation Model
Once we've created the conversation history collection above, we can then create a conversation model resource using a large language model (LLM) of our choice.
Typesense currently supports the following LLM platforms:
- OpenAI (opens new window)
- Cloudflare Workers AI (opens new window)
- vLLM (opens new window) (useful when running local LLMs)
Here's how to use each of these platforms to create a Conversational Model. (Use the tabs in the code snippet below to navigate between each platform).
# Parameters
Parameter | Description |
---|---|
model_name | Name of the LLM model offered by OpenAI, Cloudflare or vLLM |
api_key | The LLM service's API Key |
history_collection | Typesense collection that stores the historical conversations |
account_id | LLM service's account ID (only applicable for Cloudflare) |
system_prompt | The system prompt that contains special instructions to the LLM |
ttl | Time interval in seconds after which the messages would be deleted. Default: 86400 (24 hours) |
max_bytes | The maximum number of bytes to send to the LLM in every API call. Consult the LLM's documentation on the number of bytes supported in the context window. |
vllm_url | URL of vLLM service |
Response:
{
"api_key": "sk-7K**********************************************",
"id": "conv-model-1",
"max_bytes": 16384,
"model_name": "openai/gpt-3.5-turbo",
"history_collection": "conversation_store",
"system_prompt": "You are an assistant for question-answering. You can only make conversations based on the provided context. If a response cannot be formed strictly using the provided context, politely say you do not have knowledge about that topic."
}
TIP
If you don't pass an explicit id
for the model, the API will return a response with an auto-generated conversation
model id
, that we can use in our search queries:
# Start a Conversation
Once we've created a conversation model, we can start a conversation using the search endpoint and the following search parameters:
conversation = true
conversation_model_id = X
q = <Any conversational question>
query_by = <an auto-embedding field>
Where X
is the auto-generated Conversation Model ID returned by Typesense in the step above and query_by
is an auto-embedding field.
Here's an example, where we ask the question "Can you suggest an action series?" in the q
parameter, using data we've indexed in a collection called tv_shows
in Typesense.
curl 'http://localhost:8108/multi_search?q=can+you+suggest+an+action+series&conversation=true&conversation_model_id=conv-model-1' \
-X POST \
-H "Content-Type: application/json" \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
-d '{
"searches": [
{
"collection": "tv_shows",
"query_by": "embedding",
"exclude_fields": "embedding"
}
]
}'
IMPORTANT
It's important to specify "exclude_fields": "embedding"
in the search request, so that the raw floating point numbers aren't being sent to the LLM unnecessarily, which will end up consuming most of the context window.
You also want to remove any other fields that might not be relevant to generating a conversational response by specifying them in exclude_fields
.
Response:
Typesense will now return a new field in the search API response called conversation
.
You'd display the conversation.answer
key to your user, as the response to their question.
{
"conversation": {
"answer": "I would suggest \"Starship Salvage\", a sci-fi action series set in a post-war galaxy where a crew of salvagers faces dangers and ethical dilemmas while trying to rebuild.",
"conversation_history": {
"conversation": [
{
"user": "can you suggest an action series"
},
{
"assistant": "I would suggest \"Starship Salvage\", a sci-fi action series set in a post-war galaxy where a crew of salvagers faces dangers and ethical dilemmas while trying to rebuild."
}
],
"id": "771aa307-b445-4987-b100-090c00a13f1b",
"last_updated": 1694962465,
"ttl": 86400
},
"conversation_id": "771aa307-b445-4987-b100-090c00a13f1b",
"query": "can you suggest an action series"
},
"results": [
{
"facet_counts": [],
"found": 10,
"hits": [
...
],
"out_of": 47,
"page": 1,
"request_params": {
"collection_name": "tv_shows",
"per_page": 10,
"q": "can you suggest an action series"
},
"search_cutoff": false,
"search_time_ms": 3908
}
]
}
Excluding Conversation History
You can exclude conversation history from the search API response by setting exclude_fields: conversation_history
as a search parameter.
Multi-Search
When using the multi_search
endpoint with the Conversations feature, the q
parameter has to be set as a query parameter and not as a body parameter inside a particular search.
You can search multiple collections within the multi_search endpoint, and Typesense will use the top results from each collection when communicating with the LLM.
Auto-Embedding Model
In our experience, we've found that models that are specifically meant for Q&A use-cases (like the ts/all-MiniLM-L12-v2
S-BERT model) perform well for conversations. You can also use OpenAI's text embedding models.
# Follow-up Questions
We can continue a conversation that we started previously and ask follow-up questions, using the conversation_id
parameter returned by Typesense, when starting a conversation.
Continuing our example from above, let's ask the follow-up question - "How about another one" in the q
parameter:
curl 'http://localhost:8108/multi_search?q=how+about+another+one&conversation=true&conversation_model_id=conv-model-1&conversation_id=771aa307-b445-4987-b100-090c00a13f1b' \
-X POST \
-H "Content-Type: application/json" \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
-d '{
"searches": [
{
"collection": "tv_shows",
"query_by": "embedding",
"exclude_fields": "embedding"
}
]
}'
Notice the addition of the conversation_id
as a query parameter above.
This parameter causes Typesense to include prior context when communicating with the LLM.
Response:
{
"conversation": {
"answer": "Sure! How about \"Galactic Quest\"? It could follow a group of intergalactic adventurers as they travel through different planets and encounter various challenges and mysteries along the way.",
"conversation_history": {
"conversation": [
{
"user": "can you suggest an action series"
},
{
"assistant": "I would suggest \"Starship Salvage\", a sci-fi action series set in a post-war galaxy where a crew of salvagers faces dangers and ethical dilemmas while trying to rebuild."
},
{
"user": "how about another one"
},
{
"assistant": "Sure! How about \"Galactic Quest\"? It follows a group of intergalactic adventurers as they travel through different planets and encounter various challenges and mysteries along the way."
}
],
"id": "771aa307-b445-4987-b100-090c00a13f1b",
"last_updated": 1694963173,
"ttl": 86400
},
"conversation_id": "771aa307-b445-4987-b100-090c00a13f1b",
"query": "how about another one"
},
"results": [
{
"facet_counts": [],
"found": 10,
"hits": [
...
],
"out_of": 47,
"page": 1,
"request_params": {
"collection_name": "tv_shows",
"per_page": 10,
"q": "how about another one"
},
"search_cutoff": false,
"search_time_ms": 3477
}
]
}
Under the hood, for each follow-up question, Typesense makes an API call to the LLM to generate a standalone question that captures all relevant context from the conversation history, using the following prompt:
Rewrite the follow-up question on top of a human-assistant conversation history as a standalone question that encompasses all pertinent context.
<Conversation history>
{actual conversation history without system prompts}
<Question>
{follow up question}
<Standalone Question>
The generated standalone question will be used for semantic/hybrid search within the collection, and the results will then be forwarded to the LLM as context for answering the generated standalone question.
Context Window Limits
Although we retain the entire conversation history in Typesense, only the most recent 3000 tokens (approximately 1200 characters) of the conversation history will be sent for generating the standalone question due to the context limit.
Similar to the conversation history, only the top search results, limited to 3000 tokens, will be sent along with the standalone question.
# Managing Past Conversations
Typesense stores all questions and answers (conversation history), in the Typesense collection specified by the
history_collection
parameter. Each conversation has a conversation_id
parameter associated with it and is
persisted by default for 24 hours to support follow-ups. This persistence period can be configured by the
model's ttl
parameter.
You can use the collection APIs to manage these messages. Since the conversation history is not tied to specific search collections, they are versatile and compatible with different search collections at any time.
# Managing Conversation Models
# Retrieve all models
curl 'http://localhost:8108/conversations/models' \
-X GET \
-H 'Content-Type: application/json' \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}"
# Retrieve a single model
curl 'http://localhost:8108/conversations/models/conv-model-1' \
-X GET \
-H 'Content-Type: application/json' \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}"
# Update a model
You can update the model parameters like this:
curl 'http://localhost:8108/conversations/models/conv-model-1' \
-X PUT \
-H 'Content-Type: application/json' \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
-d '{
"id": "conv-model-1",
"model_name": "openai/gpt-3.5-turbo",
"history_collection": "conversation_store",
"api_key": "OPENAI_API_KEY",
"system_prompt": "Hey, you are an **intelligent** assistant for question-answering. You can only make conversations based on the provided context. If a response cannot be formed strictly using the provided context, politely say you do not have knowledge about that topic.",
"max_bytes": 16384
}'
# Delete a model
curl 'http://localhost:8108/conversations/models/conv-model-1' \
-X DELETE \
-H 'Content-Type: application/json' \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}"
← Image Search JOINs →