# Build A Search Application
Now that you have Typesense installed and running, we're now ready to create a Typesense collection, index some documents in it and try searching for them.
# Sample Dataset
To follow along, download (opens new window) this small dataset that we've put together for this walk-through.
cd /tmp
curl -O https://dl.typesense.org/datasets/books.jsonl.gz
gunzip books.jsonl.gz
This should give you a file called books.jsonl
which we'll use below.
# Initializing the client
Let's begin by configuring the Typesense client by pointing it to a Typesense node.
- Be sure to use the same API key that you used to start the Typesense server earlier.
- Or if you're using Typesense Cloud, click on the "Generate API key" button on the cluster page. This will give you a set of hostnames and API keys to use.
That's it - we're now ready to start interacting with the Typesense server.
# Creating a "books" collection
In Typesense, a Collection
is a group of related Documents
that is roughly equivalent to a table in a relational database. When we create a collection, we give it a name and describe the fields that will be indexed when a document is added to the collection.
For each field, we define its name, type
and whether it's a facet
field. A facet field allows us to cluster the search results into categories and lets us drill into each of those categories. We will be seeing faceted results in action at the end of this guide.
We also define a default_sorting_field
that determines how the results must be sorted when no sort_by
clause is provided. In this case, books that have more ratings will be ranked higher.
Indexed fields vs un-indexed fields
You only need to include fields that you want to search / filter / facet / sort / group_by in the collection schema. We call these indexed fields. Indexed fields are stored in RAM with a backup on disk.
You can still send additional fields that you might use for display purposes (for eg: image URLs) when importing the documents into Typesense. Any fields not mentioned in the schema, but present in an imported document, will only be stored on disk and returned when the document is a hit. We call these un-indexed fields and this helps conserve memory usage and avoid wasted CPU cycles in trying to otherwise build unused indices for these fields in memory.
# Adding books to the collection
We're now ready to index some books into the collection we just created.
# Searching for books
We will start with a really simple search query - let's search for harry potter
and ask Typesense to rank books that have more ratings higher in the results.
# Sample Response
In addition to returning the matching documents, Typesense also highlights where the query terms appear in a document via the highlight
property.
Want to actually see newest harry potter
books returned first? No problem, we can change the sort_by
clause to publication_year:desc
:
# Semantic Search
Typesense also supports the ability to return semantic matches, in addition to keyword matches.
For eg, if your dataset contains the word "Harry Potter" and the user searches for "famous boy wizard", semantic search will return the record with "Harry Potter" since it is conceptually related to the search term.
Read more about Semantic Search in the dedicated guide article here.
# Filtering results
Now, let's tweak our query to only fetch books that are published before the year 1998. To do that, we just have to add a filter_by
clause to our query:
# Sample Response
# Faceting
Let's facet the search results by the authors field to see how that works. Let's also use this example to see how Typesense handles typographic errors. Let's search for experyment
(notice the typo!).
As we can see in the result below, Typesense handled the typographic error gracefully and fetched the results correctly. The facet_by
clause also gives us a neat break-down of the number of books written by each author in the returned search results.
# Sample Response
We've come to the end of our little walk-through. For a detailed dive into Typesense, refer to our API documentation.
TIP
We used a single node in this example, but Typesense can also run in a clustered mode. See the high availability section for more details.