# Build A Search Application

Now that you have Typesense installed and running, we're now ready to create a Typesense collection, index some documents in it and try searching for them.

# Sample Dataset

To follow along, download (opens new window) this small dataset that we've put together for this walk-through.

cd /tmp
curl -O https://dl.typesense.org/datasets/books.jsonl.gz
gunzip books.jsonl.gz

This should give you a file called books.jsonl which we'll use below.

# Initializing the client

Let's begin by configuring the Typesense client by pointing it to a Typesense node.

  • Be sure to use the same API key that you used to start the Typesense server earlier.
  • Or if you're using Typesense Cloud, click on the "Generate API key" button on the cluster page. This will give you a set of hostnames and API keys to use.

That's it - we're now ready to start interacting with the Typesense server.

# Creating a "books" collection

In Typesense, a Collection is a group of related Documents that is roughly equivalent to a table in a relational database. When we create a collection, we give it a name and describe the fields that will be indexed when a document is added to the collection.

For each field, we define its name, type and whether it's a facet field. A facet field allows us to cluster the search results into categories and lets us drill into each of those categories. We will be seeing faceted results in action at the end of this guide.

We also define a default_sorting_field that determines how the results must be sorted when no sort_by clause is provided. In this case, books that have more ratings will be ranked higher.

Indexed fields vs un-indexed fields

You only need to include fields that you want to search / filter / facet / sort / group_by in the collection schema. We call these indexed fields. Indexed fields are stored in RAM with a backup on disk.

You can still send additional fields that you might use for display purposes (for eg: image URLs) when importing the documents into Typesense. Any fields not mentioned in the schema, but present in an imported document, will only be stored on disk and returned when the document is a hit. We call these un-indexed fields and this helps conserve memory usage and avoid wasted CPU cycles in trying to otherwise build unused indices for these fields in memory.

# Adding books to the collection

We're now ready to index some books into the collection we just created.

# Searching for books

We will start with a really simple search query - let's search for harry potter and ask Typesense to rank books that have more ratings higher in the results.

# Sample Response

In addition to returning the matching documents, Typesense also highlights where the query terms appear in a document via the highlight property.

Want to actually see newest harry potter books returned first? No problem, we can change the sort_by clause to publication_year:desc:

Typesense also supports the ability to return semantic matches, in addition to keyword matches.

For eg, if your dataset contains the word "Harry Potter" and the user searches for "famous boy wizard", semantic search will return the record with "Harry Potter" since it is conceptually related to the search term.

Read more about Semantic Search in the dedicated guide article here.

# Filtering results

Now, let's tweak our query to only fetch books that are published before the year 1998. To do that, we just have to add a filter_by clause to our query:

# Sample Response

# Faceting

Let's facet the search results by the authors field to see how that works. Let's also use this example to see how Typesense handles typographic errors. Let's search for experyment (notice the typo!).

As we can see in the result below, Typesense handled the typographic error gracefully and fetched the results correctly. The facet_by clause also gives us a neat break-down of the number of books written by each author in the returned search results.

# Sample Response

We've come to the end of our little walk-through. For a detailed dive into Typesense, refer to our API documentation.


We used a single node in this example, but Typesense can also run in a clustered mode. See the high availability section for more details.

Last Updated: 4/15/2024, 4:27:30 PM