# Conversational Search (RAG)

Typesense has the ability to respond to free-form questions, with conversational responses and also maintain context for follow-up questions and answers.

Think of this feature as a ChatGPT-style Q&A interface, but with the data you've indexed in Typesense.

Typesense uses a technique called Retrieval Augmented Generation (opens new window) (RAG) to enable this style of conversational searches.

Instead of having to build your own RAG pipeline, Typesense essentially has built-in RAG using it's Vector Store for semantic search, and it's pre-built integration with LLMs for formulating conversational responses.

# Create a Conversation History collection

Let's start by creating a Typesense collection called conversation_store that will store the conversation history generated by the conversational search feature.

The collection can be named anything, but it needs to have the following defined schema, since this collection is auto-populated by Typesense internally, as conversations happen.

curl "http://localhost:8108/collections" \
       -X POST \
       -H "Content-Type: application/json" \
       -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
       -d '{
        "name": "conversation_store",
        "fields": [
            {
                "name": "conversation_id",
                "type": "string"
            },
            {
                "name": "model_id",
                "type": "string"
            },
            {
                "name": "timestamp",
                "type": "int32"
            },
            {
                "name": "role",
                "type": "string",
                "index": false
            },
            {
                "name": "message",
                "type": "string",
                "index": false
            }
        ]
    }'

# Create a Conversation Model

Once we've created the conversation history collection above, we can then create a conversation model resource using a large language model (LLM) of our choice.

Typesense currently supports the following LLM platforms:

OpenAI (opens new window)
Cloudflare Workers AI (opens new window)
vLLM (opens new window) (useful when running local LLMs)

Here's how to use each of these platforms to create a Conversational Model. (Use the tabs in the code snippet below to navigate between each platform).

# Parameters

Parameter	Description
model_name	Name of the LLM model offered by OpenAI, Cloudflare or vLLM
api_key	The LLM service's API Key
history_collection	Typesense collection that stores the historical conversations
account_id	LLM service's account ID (only applicable for Cloudflare)
system_prompt	The system prompt that contains special instructions to the LLM
ttl	Time interval in seconds after which the messages would be deleted. Default: `86400` (24 hours)
max_bytes	The maximum number of bytes to send to the LLM in every API call. Consult the LLM's documentation on the number of bytes supported in the context window.
vllm_url	URL of vLLM service

Response:

{
  "api_key": "sk-7K**********************************************",
  "id": "conv-model-1",
  "max_bytes": 16384,
  "model_name": "openai/gpt-3.5-turbo",
  "history_collection": "conversation_store",
  "system_prompt": "You are an assistant for question-answering. You can only make conversations based on the provided context. If a response cannot be formed strictly using the provided context, politely say you do not have knowledge about that topic."
}

TIP

If you don't pass an explicit id for the model, the API will return a response with an auto-generated conversation model id, that we can use in our search queries:

# Start a Conversation

Once we've created a conversation model, we can start a conversation using the search endpoint and the following search parameters:

conversation = true
conversation_model_id = X
q = <Any conversational question>
query_by = <an auto-embedding field>

Where X is the auto-generated Conversation Model ID returned by Typesense in the step above and query_by is an auto-embedding field.

Here's an example, where we ask the question "Can you suggest an action series?" in the q parameter, using data we've indexed in a collection called tv_shows in Typesense.

curl 'http://localhost:8108/multi_search?q=can+you+suggest+an+action+series&conversation=true&conversation_model_id=conv-model-1' \
        -X POST \
        -H "Content-Type: application/json" \
        -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
        -d '{
              "searches": [
                {
                  "collection": "tv_shows",
                  "query_by": "embedding",
                  "exclude_fields": "embedding"
                }
              ]
            }'

IMPORTANT

It's important to specify "exclude_fields": "embedding" in the search request, so that the raw floating point numbers aren't being sent to the LLM unnecessarily, which will end up consuming most of the context window.

You also want to remove any other fields that might not be relevant to generating a conversational response by specifying them in exclude_fields.

Response:

Typesense will now return a new field in the search API response called conversation.

You'd display the conversation.answer key to your user, as the response to their question.

{
  "conversation": {
    "answer": "I would suggest \"Starship Salvage\", a sci-fi action series set in a post-war galaxy where a crew of salvagers faces dangers and ethical dilemmas while trying to rebuild.",
    "conversation_history": {
      "conversation": [
        {
          "user": "can you suggest an action series"
        },
        {
          "assistant": "I would suggest \"Starship Salvage\", a sci-fi action series set in a post-war galaxy where a crew of salvagers faces dangers and ethical dilemmas while trying to rebuild."
        }
      ],
      "id": "771aa307-b445-4987-b100-090c00a13f1b",
      "last_updated": 1694962465,
      "ttl": 86400
    },
    "conversation_id": "771aa307-b445-4987-b100-090c00a13f1b",
    "query": "can you suggest an action series"
  },
  "results": [
    {
      "facet_counts": [],
      "found": 10,
      "hits": [
        ...
      ],
      "out_of": 47,
      "page": 1,
      "request_params": {
        "collection_name": "tv_shows",
        "per_page": 10,
        "q": "can you suggest an action series"
      },
      "search_cutoff": false,
      "search_time_ms": 3908
    }
  ]
}

Excluding Conversation History

You can exclude conversation history from the search API response by setting exclude_fields: conversation_history as a search parameter.

Multi-Search

When using the multi_search endpoint with the Conversations feature, the q parameter has to be set as a query parameter and not as a body parameter inside a particular search.

You can search multiple collections within the multi_search endpoint, and Typesense will use the top results from each collection when communicating with the LLM.

Auto-Embedding Model

In our experience, we've found that models that are specifically meant for Q&A use-cases (like the ts/all-MiniLM-L12-v2 S-BERT model) perform well for conversations. You can also use OpenAI's text embedding models.

# Follow-up Questions

We can continue a conversation that we started previously and ask follow-up questions, using the conversation_id parameter returned by Typesense, when starting a conversation.

Continuing our example from above, let's ask the follow-up question - "How about another one" in the q parameter:

curl 'http://localhost:8108/multi_search?q=how+about+another+one&conversation=true&conversation_model_id=conv-model-1&conversation_id=771aa307-b445-4987-b100-090c00a13f1b' \
        -X POST \
        -H "Content-Type: application/json" \
        -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
        -d '{
              "searches": [
                {
                  "collection": "tv_shows",
                  "query_by": "embedding",
                  "exclude_fields": "embedding"
                }
              ]
            }'

Notice the addition of the conversation_id as a query parameter above.

This parameter causes Typesense to include prior context when communicating with the LLM.

Response:

{
  "conversation": {
    "answer": "Sure! How about \"Galactic Quest\"? It could follow a group of intergalactic adventurers as they travel through different planets and encounter various challenges and mysteries along the way.",
    "conversation_history": {
      "conversation": [
        {
          "user": "can you suggest an action series"
        },
        {
          "assistant": "I would suggest \"Starship Salvage\", a sci-fi action series set in a post-war galaxy where a crew of salvagers faces dangers and ethical dilemmas while trying to rebuild."
        },
        {
          "user": "how about another one"
        },
        {
          "assistant": "Sure! How about \"Galactic Quest\"? It follows a group of intergalactic adventurers as they travel through different planets and encounter various challenges and mysteries along the way."
        }
      ],
      "id": "771aa307-b445-4987-b100-090c00a13f1b",
      "last_updated": 1694963173,
      "ttl": 86400
    },
    "conversation_id": "771aa307-b445-4987-b100-090c00a13f1b",
    "query": "how about another one"
  },
  "results": [
    {
      "facet_counts": [],
      "found": 10,
      "hits": [
        ...
      ],
      "out_of": 47,
      "page": 1,
      "request_params": {
        "collection_name": "tv_shows",
        "per_page": 10,
        "q": "how about another one"
      },
      "search_cutoff": false,
      "search_time_ms": 3477
    }
  ]
}

Under the hood, for each follow-up question, Typesense makes an API call to the LLM to generate a standalone question that captures all relevant context from the conversation history, using the following prompt:

Rewrite the follow-up question on top of a human-assistant conversation history as a standalone question that encompasses all pertinent context.

<Conversation history>
{actual conversation history without system prompts}

<Question>
{follow up question}

<Standalone Question>

The generated standalone question will be used for semantic/hybrid search within the collection, and the results will then be forwarded to the LLM as context for answering the generated standalone question.

Context Window Limits

Although we retain the entire conversation history in Typesense, only the most recent 3000 tokens (approximately 1200 characters) of the conversation history will be sent for generating the standalone question due to the context limit.

Similar to the conversation history, only the top search results, limited to 3000 tokens, will be sent along with the standalone question.

# Managing Past Conversations

Typesense stores all questions and answers (conversation history), in the Typesense collection specified by the history_collection parameter. Each conversation has a conversation_id parameter associated with it and is persisted by default for 24 hours to support follow-ups. This persistence period can be configured by the model's ttl parameter.

You can use the collection APIs to manage these messages. Since the conversation history is not tied to specific search collections, they are versatile and compatible with different search collections at any time.

# Managing Conversation Models

# Retrieve all models

curl 'http://localhost:8108/conversations/models' \
  -X GET \
  -H 'Content-Type: application/json' \
  -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}"

# Retrieve a single model

curl 'http://localhost:8108/conversations/models/conv-model-1' \
  -X GET \
  -H 'Content-Type: application/json' \
  -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}"

# Update a model

You can update the model parameters like this:

curl 'http://localhost:8108/conversations/models/conv-model-1' \
  -X PUT \
  -H 'Content-Type: application/json' \
  -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
  -d '{
        "id": "conv-model-1",
        "model_name": "openai/gpt-3.5-turbo",
        "history_collection": "conversation_store",
        "api_key": "OPENAI_API_KEY",
        "system_prompt": "Hey, you are an **intelligent** assistant for question-answering. You can only make conversations based on the provided context. If a response cannot be formed strictly using the provided context, politely say you do not have knowledge about that topic.",
        "max_bytes": 16384
      }'

# Delete a model

curl 'http://localhost:8108/conversations/models/conv-model-1' \
  -X DELETE \
  -H 'Content-Type: application/json' \
  -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}"

← Image Search JOINs →