Getting Started Guide

Let's begin by installing Typesense, indexing some documents and exploring the data with some search queries.

For a detailed dive into the Typesense API, refer to our API documentation.

Installing Typesense

We have pre-built binaries available for Linux (X86_64) and Mac OS X from our downloads page.

We also publish official Docker images for Typesense on Docker hub.

Starting the Typesense server

You can start Typesense with minimal options like this:

mkdir /tmp/typesense-data
./typesense-server --data-dir=/tmp/typesense-data --api-key=$TYPESENSE_API_KEY

On Docker, you can run Typesense like this:

mkdir /tmp/typesense-data
docker run -p 8108:8108 -v/tmp/typesense-data:/data typesense/typesense:0.8.0 \
  --data-dir /data --api-key=$TYPESENSE_API_KEY

Server arguments

Parameter Required Description
data-dir true Path to the directory where data will be stored on disk.
api-key true API key that allows all operations.
search-only-api-key false API key that allows only searches. Use this to define a separate key for making requests directly from Javascript.
listen-address false Address to which Typesense server binds. Default: 0.0.0.0
listen-port false Port on which Typesense server listens.. Default: 8108
master false Starts the server as a read-only replica by defining the master Typesense server's address in
http(s)://<master_address>:<master_port> format.
ssl-certificate false Path to the SSL certificate file. You must also define ssl-certificate-key to enable HTTPS.
ssl-certificate-key false Path to the SSL certificate key file. You must also define ssl-certificate to enable HTTPS.
log-dir false By default, Typesense logs to stdout and stderr. To enable logging to a file, provide a path to a logging directory.

Installing a client

At the moment, we have clients for Javascript, Python, and Ruby.

We recommend that you use our API client if it's available for your language. It's also easy to interact with Typesense through its simple, RESTful HTTP API.

gem install typesense
pip install typesense
// Node.js
npm install typesense

// Browser
<script src="dist/typesense.min.js"></script>

Example application

At this point, we are all set to start using Typesense. We will create a Typesense collection, index some documents in it and try searching for them.

To follow along, download this small dataset that we've put together for this walk-through.

Initializing the client

Let's begin by configuring the Typesense client by pointing it to the Typesense master node.

Be sure to use the same API key that you used to start the Typesense server earlier.

require 'typesense'

client = Typesense::Client.new(
  master_node: {
    host:     'localhost',
    port:     8108,
    protocol: 'http',
    api_key:  '<API_KEY>'
  },
  timeout_seconds: 2
)
import typesense

client = typesense.Client({
  'master_node': {
    'host': 'localhost',
    'port': '8108',
    'protocol': 'http',
    'api_key': '<API_KEY>'
  },
  'timeout_seconds': 2
})
/*
 *  Our Javascript client library works on both the client and the browser.
 *  When using the library on the browser, please be sure to use the
 *  search-only API Key rather than the master API key since the latter
 *  has write access to Typesense and you don't want to expose that.
 */
let client = new Typesense.Client({
  'masterNode': {
    'host': 'master',
    'port': '8108',
    'protocol': 'http',
    'apiKey': '<API_KEY>'
  },
  'timeoutSeconds': 2
})
export TYPESENSE_API_KEY='<API_KEY>'
export TYPESENSE_MASTER='http://localhost:8108'

That's it - we're now ready to start interacting with the Typesense server.

Creating a "books" collection

In Typesense, a collection is a group of related documents that is roughly equivalent to a table in a relational database. When we create a collection, we give it a name and describe the fields that will be indexed when a document is added to the collection.

require 'typesense'

books_schema = {
  'name' => 'books',
  'fields' => [
    {'name' => 'title', 'type' => 'string' },
    {'name' => 'authors', 'type' => 'string[]' },
    {'name' => 'image_url', 'type' => 'string' },

    {'name' => 'publication_year', 'type' => 'int32' },
    {'name' => 'ratings_count', 'type' => 'int32' },
    {'name' => 'average_rating', 'type' => 'float' },

    {'name' => 'authors_facet', 'type' => 'string[]', 'facet' => true },
    {'name' => 'publication_year_facet', 'type' => 'string', 'facet' => true }
  ],
  'default_sorting_field' => 'ratings_count'
}

client.collections.create(schema)
import typesense

books_schema = {
  'name': 'books',
  'fields': [
    {'name': 'title', 'type': 'string' },
    {'name': 'authors', 'type': 'string[]' },
    {'name': 'image_url', 'type': 'string' },

    {'name': 'publication_year', 'type': 'int32' },
    {'name': 'ratings_count', 'type': 'int32' },
    {'name': 'average_rating', 'type': 'float' },

    {'name': 'authors_facet', 'type': 'string[]', 'facet': True },
    {'name': 'publication_year_facet', 'type': 'string', 'facet': True },
  ],
  'default_sorting_field': 'ratings_count'
}

client.collections.create(schema)
let booksSchema = {
  'name': 'books',
  'fields': [
    {'name': 'title', 'type': 'string' },
    {'name': 'authors', 'type': 'string[]' },
    {'name': 'image_url', 'type': 'string' },

    {'name': 'publication_year', 'type': 'int32' },
    {'name': 'ratings_count', 'type': 'int32' },
    {'name': 'average_rating', 'type': 'float' },

    {'name': 'authors_facet', 'type': 'string[]', 'facet': true },
    {'name': 'publication_year_facet', 'type': 'string', 'facet': true },
  ],
  'default_sorting_field': 'ratings_count'
}

client.collections().create(booksSchema)
  .then(function (data) {
    console.log(data)      
  })
curl "http://localhost:8108/collections" -X POST -H "Content-Type: application/json" \
      -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" -d '{
        "name": "books",
        "fields": [
          {"name": "title", "type": "string" },
          {"name": "authors", "type": "string[]" },
          {"name": "image_url", "type": "string" },

          {"name": "publication_year", "type": "int32" },
          {"name": "ratings_count", "type": "int32" },
          {"name": "average_rating", "type": "float" },

          {"name": "authors_facet", "type": "string[]", "facet": true },
          {"name": "publication_year_facet", "type": "string", "facet": true }
        ],
        "default_sorting_field": "ratings_count"
      }'

For each field, we define its name, type and whether it's a facet field. A facet field allows us to cluster the search results into categories and let us drill into each of those categories. We will be seeing faceted results in action at the end of this guide.

We also define a default_sorting_field that determines how the results must be sorted when no sort_by clause is provided. In this case, books that have more ratings will be ranked higher.

Adding books to the collection

We're now ready to index some books into the collection we just created.

require 'rubygems'
require 'json'
require 'typesense'

File.readlines('/tmp/books.jsonl').each do |json_line|
  book_document = JSON.parse(json_line)
  client.collections['books'].documents.create(book_document)
end
import json
import typesense

with open('/tmp/books.jsonl') as infile:
  for json_line in infile:
    book_document = json.loads(json_line)
    client.collections['books'].documents.create(book_document)
var fs = require('fs');
var readline = require('readline');

readline.createInterface({
    input: fs.createReadStream('/tmp/books.jsonl'),
    terminal: false
}).on('line', function(line) {
   let bookDocument = JSON.parse(line);
   client.collections('books').documents().create(bookDocument)
});
#!/bin/bash
input="/tmp/books.jsonl"
while IFS= read -r line
do
  curl "$TYPESENSE_MASTER/collections/books/documents" -X POST \
  -H "Content-Type: application/json" \
  -H "X-TYPESENSE-API-KEY: $TYPESENSE_API_KEY" \
  -d "$line"
done < "$input"

Searching for books

We will start with a really simple search query - let's search for harry potter and ask Typesense to rank books that have more ratings higher in the results.

search_parameters = {
  'q'         => 'harry potter',
  'query_by'  => 'title',
  'sort_by'   => 'ratings_count:desc'
}

client.collections['books'].documents.search(search_parameters)
search_parameters = {
  'q'         : 'harry',
  'query_by'  : 'title',
  'sort_by'   : 'ratings_count:desc'
}

client.collections['books'].documents.search(search_parameters)
let searchParameters = {
  'q'         : 'harry',
  'query_by'  : 'title',
  'sort_by'   : 'ratings_count:desc'
}

client.collections('books')
  .documents()
  .search(searchParameters)
  .then(function (searchResults) {
    console.log(searchResults)
  })
curl -H "X-TYPESENSE-API-KEY: $TYPESENSE_API_KEY" \
"$TYPESENSE_MASTER/collections/books/documents/search\
?q=harry+potter&query_by=title&sort_by=ratings_count:desc"
Sample response
{
  "facet_counts": [],
  "found": 62,
  "hits": [
    {
      "highlight": {
        "title": "<mark>Harry</mark> <mark>Potter</mark> and the Philosopher's Stone"
      },
      "document": {
        "authors": [
          "J.K. Rowling", "Mary GrandPré"
        ],
        "authors_facet": [
          "J.K. Rowling", "Mary GrandPré"
        ],
        "average_rating": 4.44,
        "id": "2",
        "image_url": "https://images.gr-assets.com/books/1474154022m/3.jpg",
        "publication_year": 1997,
        "publication_year_facet": "1997",
        "ratings_count": 4602479,
        "title": "Harry Potter and the Philosopher's Stone"
      }
    },
    ...
  ]
}

In addition to returning the matching documents, Typesense also highlights where the query terms appear in a document via the highlight property.

Want to actually see newest harry potter books returned first? No problem, we can change the sort_by clause to publication_year:desc:

search_parameters = {
  'q'         => 'harry potter',
  'query_by'  => 'title',
  'sort_by'   => 'publication_year:desc'
}

client.collections['books'].documents.search(search_parameters)
search_parameters = {
  'q'         : 'harry',
  'query_by'  : 'title',
  'sort_by'   : 'publication_year:desc'
}

client.collections['books'].documents.search(search_parameters)
let searchParameters = {
  'q'         : 'harry',
  'query_by'  : 'title',
  'sort_by'   : 'publication_year:desc'
}

client.collections('books')
  .documents()
  .search(searchParameters)
  .then(function (searchResults) {
    console.log(searchResults)
  })
curl -H "X-TYPESENSE-API-KEY: $TYPESENSE_API_KEY" \
"$TYPESENSE_MASTER/collections/books/documents/search\
?q=harry+potter&query_by=title&sort_by=publication_year:desc"
Sample response
{
  "facet_counts": [],
  "found": 62,
  "hits": [
  {
    "highlight": {
      "title": "<mark>Harry</mark> <mark>Potter</mark> and the Cursed Child..."
    },
    "document": {
      "authors": [
        "John Tiffany", "Jack Thorne", "J.K. Rowling"
      ],
      "authors_facet": [
        "John Tiffany", "Jack Thorne", "J.K. Rowling"
      ],
      "average_rating": 3.75,
      "id": "279",
      "image_url": "https://images.gr-assets.com/books/1470082995m/29056083.jpg",
      "publication_year": 2016,
      "publication_year_facet": "2016",
      "ratings_count": 270603,
      "title": "Harry Potter and the Cursed Child, Parts One and Two"
    }
  },
  ...
  ]
}

Now, let's tweak our query to only fetch books that are published before the year 1998. To do that, we just have to add a filter_by clause to our query:

search_parameters = {
  'q'         => 'harry potter',
  'query_by'  => 'title',
  'filter_by' => 'publication_year:<1998',
  'sort_by'   => 'publication_year:desc'
}

client.collections['books'].documents.search(search_parameters)
search_parameters = {
  'q'         : 'harry',
  'query_by'  : 'title',
  'filter_by' : 'publication_year:<1998',
  'sort_by'   : 'publication_year:desc'
}

client.collections['books'].documents.search(search_parameters)
let searchParameters = {
  'q'         : 'harry',
  'query_by'  : 'title',
  'filter_by' : 'publication_year:<1998',
  'sort_by'   : 'publication_year:desc'
}

client.collections('books')
  .documents()
  .search(searchParameters)
  .then(function (searchResults) {
    console.log(searchResults)
  })
curl -H "X-TYPESENSE-API-KEY: $TYPESENSE_API_KEY" \
"$TYPESENSE_MASTER/collections/books/documents/search\
?q=harry+potter&query_by=title&sort_by=publication_year:desc\
&filter_by=publication_year:<1998"
Sample response
{
  "facet_counts": [],
  "found": 24,
  "hits": [
    {
      "highlight": {
        "title": "<mark>Harry</mark> <mark>Potter</mark> and the Philosopher's Stone"
      },
      "document": {
        "authors": [
            "J.K. Rowling", "Mary GrandPré"
        ],
        "authors_facet": [
            "J.K. Rowling", "Mary GrandPré"
        ],
        "average_rating": 4.44,
        "id": "2",
        "image_url": "https://images.gr-assets.com/books/1474154022m/3.jpg",
        "publication_year": 1997,
        "publication_year_facet": "1997",
        "ratings_count": 4602479,
        "title": "Harry Potter and the Philosopher's Stone"
      }
    },
    ...
  ]
}

Finally, let's see how Typesense handles typographic errors. Let's search for experyment - noticed the typo there? We will also facet the search results by the authors field to see how that works.

search_parameters = {
  'q'         => 'experyment',
  'query_by'  => 'title',
  'facet_by'  => 'authors_facet',
  'sort_by'   => 'average_rating:desc'
}

client.collections['books'].documents.search(search_parameters)
search_parameters = {
  'q'         : 'harry',
  'query_by'  : 'title',
  'facet_by' : 'authors_facet',
  'sort_by'   : 'average_rating:desc'
}

client.collections['books'].documents.search(search_parameters)
let searchParameters = {
  'q'         : 'harry',
  'query_by'  : 'title',
  'facet_by' : 'authors_facet',
  'sort_by'   : 'average_rating:desc'
}

client.collections('books')
  .documents()
  .search(searchParameters)
  .then(function (searchResults) {
    console.log(searchResults)
  })
curl -H "X-TYPESENSE-API-KEY: $TYPESENSE_API_KEY" \
"$TYPESENSE_MASTER/collections/books/documents/search\
?q=harry+potter&query_by=title&sort_by=average_rating:desc\
&facet_by=authors_facet"

As we can see in the result below, Typesense handled the typographic error gracefully and fetched the results correctly. The facet_by clause also gives us a neat break-down of the number of books written by each author in the returned search results.

Sample response
{
  "facet_counts": [
    {
      "field_name": "authors_facet",
      "counts": [
          {
              "count": 2,
              "value": " Käthe Mazur"
          },
          {
              "count": 2,
              "value": "Gretchen Rubin"
          },
          {
              "count": 2,
              "value": "James Patterson"
          },
          {
              "count": 2,
              "value": "Mahatma Gandhi"
          }
      ]
    }
  ],
  "found": 3,
  "hits": [
    {
      "_highlight": {
        "title": "The Angel <mark>Experiment</mark>"
      },
      "document": {
        "authors": [
            "James Patterson"
        ],
        "authors_facet": [
            "James Patterson"
        ],
        "average_rating": 4.08,
        "id": "569",
        "image_url": "https://images.gr-assets.com/books/1339277875m/13152.jpg",
        "publication_year": 2005,
        "publication_year_facet": "2005",
        "ratings_count": 172302,
        "title": "The Angel Experiment"
      }
    },
    ...
  ]
}

We've come to the end of our little example. For a detailed dive into Typesense, refer to our API documentation.

Ranking and relevance

Typesense ranks search results using a simple tie-breaking algorithm that relies on two components:

  1. String similarity.
  2. User-defined sort_by numerical fields.

Typesense computes a string similarity score based on how much a search query overlaps with the fields of a given document. Typographic errors are also taken into account here. Let's see how.

When there is a typo in the query, or during prefix search, multiple tokens could match a given token in the query. For e.g. both “john” and “joan” are 1-typo away from “jofn”. Similarly, in the case of a prefix search, both “apple” and “apply” would match “app”. In such scenarios, Typesense would use the value of the default_sorting_field field to decide whether documents containing "john" or "joan" should be ranked first.

When multiple documents share the same string similarity score, user-defined numerical fields are used to break the tie. You can specify upto two such numerical fields.

For example, let's say that we're searching for books with a query like short story. If there are multiple books containing these exact words, then all those documents would have the same string similarity score.

To break the tie, we could specify upto two additional sort_by fields. For instance, we could say, sort_by=average_rating:DESC,publication_year:DESC. This would sort the results in the following manner:

  1. All matching records are sorted by string similarity score.
  2. If any two records share the same string similarity score, sort them by their average rating.
  3. If there is still a tie, sort the records by year of publication.

High Availability

You can run one or more Typesense servers as read-only replicas that asynchronously pull data from a master Typesense server. This way, if your primary Typesense server fails, search requests can be sent to the replicas.

Server configuration

To start Typesense as a read-only replica, pass the master Typesense server's address via the --master argument:

--master=http(s)://<master_address>:<master_port>

NOTE: The master Typesense server maintains a replication log for 24 hours. If you are pointing the replica to a master instance that has been running for longer than 24 hours, you need to first stop the master, take a copy of the data directory and then then start the replica server by pointing to this backup data directory.

Client configuration

Typesense clients would allow you to configure one or more replica nodes during client initialization.

Client libraries will send all writes to the master. Reads will first be sent to the master and if the server returns a 500 status code or if the connection times out, the reads will be sent in a round-robin fashion to the read replicas configured.

require 'typesense'

client = Typesense::Client.new(
  master_node: {
    host:     'localhost',
    port:     8108,
    protocol: 'http',
    api_key:  '<API_KEY>'
  },

  read_replica_nodes: [
    {
      host:     'read_replica_1',
      port:     8108,
      protocol: 'http',
      api_key:  '<API_KEY>'
    }
  ],

  timeout_seconds: 2
)
import typesense

client = typesense.Client({
  'master_node': {
    'host': 'localhost',
    'port': '8108',
    'protocol': 'http',
    'api_key': '<API_KEY>'
  },
  'read_replica_nodes': [{
    'host': 'read_replica_1',
    'port': '8108',
    'protocol': 'http',
    'api_key': '<API_KEY>'
  }],
  'timeout_seconds': 2
})
let client = new Typesense.Client({
  'masterNode': {
    'host': 'master',
    'port': '8108',
    'protocol': 'http',
    'apiKey': '<API_KEY>'
  },
  'readReplicaNodes': [{
    'host': 'read_replica_1',
    'port': '8108',
    'protocol': 'http',
    'apiKey': '<API_KEY>'
  }],
  'timeoutSeconds': 2
})