# Documents

# Index a document

A document to be indexed in a given collection must conform to the schema of the collection.

If the document contains an id field of type string, Typesense would use that field as the identifier for the document. Otherwise, Typesense would assign an identifier of its choice to the document.

# Sample Response

# Definition

POST ${TYPESENSE_HOST}/collections/:collection/documents

# Search

In Typesense, a search consists of a query against one or more text fields and a list of filters against numerical or facet fields. You can also sort and facet your results.

# Sample Response

When a string[] field is queried, the highlights structure would include the corresponding matching array indices of the snippets. For e.g:

# Group by

You can aggregate search results into groups or buckets by specify one or more group_by fields.

Grouping hits this way is useful in:

Deduplication: By using one or more group_by fields, you can consolidate items and remove duplicates in the search results. For example, if there are multiple shoes of the same size, by doing a group_by=size&group_limit=1, you ensure that only a single shoe of each size is returned in the search results.
Correcting skew: When your results are dominated by documents of a particular type, you can use group_by and group_limit to correct that skew. For example, if your search results for a query contains way too many documents of the same brand, you can do a group_by=brand&group_limit=3 to ensure that only the top 3 results of each brand is returned in the search results.

NOTE: To group on a particular field, it must be a faceted field.

Grouping returns the hits in a nested structure, that's different from the plain JSON response format we saw earlier. Let's repeat the query we made earlier with a group_by parameter:

# Definition

GET ${TYPESENSE_HOST}/collections/:collection/documents/search

# Arguments

Parameter	Required	Description
q	yes	The query text to search for in the collection. Use * as the search string to return all documents. This is typically useful when used in conjunction with `filter_by`. For example, to return all documents that match a filter, use:`q=*&filter_by=num_employees:10`
query_by	yes	One or more `string / string[]` fields that should be queried against. Separate multiple fields with a comma: `company_name, country` The order of the fields is important: a record that matches on a field earlier in the list is considered more relevant than a record matched on a field later in the list. So, in the example above, documents that match on the `company_name` field are ranked above documents matched on the `country` field.
prefix	no	Boolean field to indicate that the last word in the query should be treated as a prefix, and not as a whole word. This is necessary for building autocomplete and instant search interfaces. Default: `true`
filter_by	no	Filter conditions for refining your search results. A field can be matched against one or more values. `country: USA` `country: [USA, UK]` Separate multiple conditions with the `&&` operator. For eg: `num_employees:>100 && country: [USA, UK]` More examples: `num_employees:10` `num_employees:<=10`
sort_by	no	A list of numerical fields and their corresponding sort orders that will be used for ordering your results. Separate multiple fields with a comma. Up to 3 sort fields can be specified in a single search query, and they'll be used as a tie-breaker - if the first value in the first `sort_by` field ties for a set of documents, the value in the second `sort_by` field is used to break the tie, and if that also ties, the value in the 3rd field is used to break the tie between documents. If all 3 fields tie, the document insertion order is used to break the final tie. E.g. `num_employees:desc,year_started:asc` The text similarity score is exposed as a special `_text_match` field that you can use in the list of sorting fields. If one or two sorting fields are specified, `_text_match` is used for tie breaking, as the last sorting field. Default: If no `sort_by` parameter is specified, results are sorted by:_text_match:desc,``default_sorting_field:desc.
facet_by	no	A list of fields that will be used for faceting your results on. Separate multiple fields with a comma.
max_facet_values	no	Maximum number of facet values to be returned.
facet_query	no	Facet values that are returned can now be filtered via this parameter. The matching facet text is also highlighted. For example, when faceting by `category`, you can set `facet_query=category:shoe` to return only facet values that contain the prefix "shoe".
num_typos	no	Number of typographical errors (1 or 2) that would be tolerated. Damerau–Levenshtein distance (opens new window) is used to calculate the number of errors. Default: `2`
page	no	Results from this specific page number would be fetched.
per_page	no	Number of results to fetch per page. Default: `10`
group_by	no	You can aggregate search results into groups or buckets by specify one or more `group_by` fields. Separate multiple fields with a comma. NOTE: To group on a particular field, it must be a faceted field. E.g. `group_by=country,company_name`
group_limit	no	Maximum number of hits to be returned for every group. If the `group_limit` is set as `K` then only the top K hits in each group are returned in the response. Default: `3`
include_fields	no	Comma-separated list of fields from the document to include in the search result.
exclude_fields	no	Comma-separated list of fields from the document to exclude in the search result.
highlight_full_fields	no	Comma separated list of fields which should be highlighted fully without snippeting. Default: all fields will be snippeted.
snippet_threshold	no	Field values under this length will be fully highlighted, instead of showing a snippet of relevant portion. Default: `30`
drop_tokens_threshold	no	If the number of results found for a specific query is less than this number, Typesense will attempt to drop the tokens in the query until enough results are found. Tokens that have the least individual hits are dropped first. Set drop_tokens_threshold to 0 to disable dropping of tokens. Default: `10`
typo_tokens_threshold	no	If the number of results found for a specific query is less than this number, Typesense will attempt to look for tokens with more typos until enough results are found. Default: `100`
pinned_hits	no	A list of records to unconditionally include in the search results at specific positions. An example use case would be to feature or promote certain items on the top of search results. A comma separated list of `record_id:hit_position`. Eg: to include a record with ID 123 at Position 1 and another record with ID 456 at Position 5, you'd specify `123:1,456:5`. You could also use the Overrides feature to override search results based on rules. Overrides are applied first, followed by pinned_hits and finally hidden_hits.
hidden_hits	no	A list of records to unconditionally hide from search results. A comma separated list of `record_ids` to hide. Eg: to hide records with IDs 123 and 456, you'd specify `123,456`. You could also use the Overrides feature to override search results based on rules. Overrides are applied first, followed by pinned_hits and finally hidden_hits.

# Retrieve a document

Fetch an individual document from a collection by using its id.

# Sample Response

# Definition

GET ${TYPESENSE_HOST}/collections/:collection/documents/:id

# Delete a document

Delete an individual document from a collection by using its id.

# Sample Response

# Definition

DELETE ${TYPESENSE_HOST}/collections/:collection/documents/:id

# Export documents

# Sample Response

# Definition

GET ${TYPESENSE_HOST}/collections/:collection/documents/export

# Import documents

The documents to be imported must be formatted in a newline delimited JSON structure. You can feed the output file from a Typesense export operation directly as import.

Here's an example file:

You can import the above documents.jsonl file like this.

# Sample Response

The response will consist of an items array that indicates the result of each document present in the request to be imported (in the same order). If the import of a single document fails, it does not affect the remaining documents

If there is a failure, the response item will include a corresponding error message. For example, the second document had an import failure in the following response:

Note: we recommend importing documents 1MB at a time, to keep import speeds fast.

# Definition

POST ${TYPESENSE_HOST}/collections/:collection/documents/import

← Collections API Keys →

This documentation site is open source. Found an issue? Edit this page (opens new window) and send us a Pull Request.

For AI Agents: View an easy-to-parse, token-efficient Markdown version of this page. You can also replace .html with .md in any docs URL. For paths ending in /, append README.md to the path.

Last Updated: 3/31/2024, 9:38:47 AM