# JOINs
Typesense supports JOINing documents from one or more collections based on a related column between them.
Note: Adding a reference field in an existing collection via alter operation is not yet supported.
# One-to-One relation
When you create a collection, you can create a field that connects a document to a field in another collection
via the reference
property.
For example, we could connect a books
collection to an authors
collection by using the id
field of the authors
collection as a reference:
{
"name": "books",
"fields": [
{"name": "title", "type": "string"},
{"name": "author_id", "type": "string", "reference": "authors.id"}
]
}
When we search the books
collection, we can fetch author fields from the authors
collection via include_fields
.
curl "http://localhost:8108/multi_search" -X POST \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
-d '{
"searches": [
{
"collection": "books",
"include_fields": "$authors(first_name,last_name)",
"q": "famous"
}
]
}'
By requesting the first_name
and last_name
via $authors(first_name,last_name)
, the response contains an
authors
object with the corresponding author information:
{
"document": {
"id": "0",
"title": "Famous Five",
"author_id": "0",
"authors": {
"first_name": "Enid",
"last_name": "Blyton"
}
}
}
To include all fields in the collection we should use an asterisk *
:
{
"collection": "books",
"include_fields": "$authors(*)",
"q": "famous"
}
We can also use a filter_by
clause to join on a collection like this:
{
"filter_by": "$authors(id: *)"
}
Fields of type int32
, int64
, and string
can be used in the case of "one-to-one" reference, i.e. one document
being related to exactly one reference document. Fields of type int32[]
, int64[]
, and string[]
can be used in
the case of multiple references, i.e. one document being related to zero or more documents in another collection.
# One-to-many relation
# Simple Example
For a simple example, let's suppose that we have a orders
collection, and we want to keep track of different orders for each customer,
i.e. each customer could have a number of different orders.
The schema of the customers
collection will look like this
{
"name": "customers",
"fields": [
{"name": "forename", "type": "string"},
{"name": "surname", "type": "string"},
{"name": "email", "type": "string"}
]
}
Let's create a orders
collection that stores a order's details, and it will
also reference the corresponding user that placed it in the user collection.
{
"name": "orders",
"fields": [
{"name": "total_price", "type": "float"},
{"name": "initial_date", "type": "int64"},
{"name": "accepted_date", "type": "int64", "optional": true},
{"name": "completed_date", "type": "int64", "optional": true},
{"name": "customer_id", "type": "string", "reference": "customers.id"}
]
}
We can now search the customers
collection, and filter for the orders for a particular customer from
the orders
collection via filter_by
:
{
"q":"*",
"collection":"customers",
"filter_by":"$orders(customer_id:=customer_a)"
}
We can filter by other fields as well, like fetching customers with orders with a total price under 100
:
{
"q": "*",
"collection": "customers",
"filter_by": "$orders(total_price:<100)"
}
By default, the above queries will include all the fields from the referenced orders
collection. To include only the total_price
field from the referenced collection, you could do:
{
"include_fields": "$orders(total_price)"
}
# Specialized Example
For a more specialized example, suppose that we have a products
collection, and we want to offer personalized pricing for customers,
i.e. each product would have a different price for every customer. The join feature comes handy here too.
The schema of the products
collection will look like this
{
"name": "products",
"fields": [
{"name": "product_id", "type": "string"},
{"name": "product_name", "type": "string"},
{"name": "product_description", "type": "string"}
]
}
Let's create a customer_product_prices
collection that stores a custom price for each customer, and it will
also reference the corresponding document in the product collection.
{
"name": "customer_product_prices",
"fields": [
{"name": "customer_id", "type": "string"},
{"name": "custom_price", "type": "float"},
{"name": "product_id", "type": "string", "reference": "products.product_id"}
]
}
We can now search the products
collection, and filter for the prices for a particular customer from
the customer_product_prices
collection via filter_by
:
{
"q":"*",
"collection":"products",
"filter_by":"$customer_product_prices(customer_id:=customer_a)"
}
Want to fetch products with price under 100
? That's easy to do too.
{
"q": "*",
"collection": "products",
"filter_by": "$customer_product_prices(customer_id:=customer_a && custom_price:<100)"
}
Similarly with the simpler example, you can include only the custom_price
field from the referenced collection:
{
"include_fields": "$customer_product_prices(custom_price)"
}
# Many-to-many relation
Consider a collection with documents that we want to provide access to users such that a document can be accessed by many users and a given user can access many documents.
To do this, we can create three collections: documents
, users
and user_doc_access
with the following schemas:
{
"name": "documents",
"fields": [
{"name": "id", "type": "string"},
{"name": "title", "type": "string"}
{"name": "content", "type": "string"}
]
}
{
"name": "users",
"fields": [
{"name": "id", "type": "string"},
{"name": "username", "type": "string"}
]
}
{
"name": "user_doc_access",
"fields": [
{"name": "user_id", "type": "string", "reference": "users.id"},
{"name": "document_id", "type": "string", "reference": "documents.id"},
]
}
To fetch all the documents accessible to a user_a
, we can query this way:
{
"q": "*",
"collection": "documents",
"filter_by": "$user_doc_access(user_id:=user_a)"
}
To get the user ids that can access a particular document:
{
"q": "*",
"collection": "documents",
"query_by": "title",
"filter_by": "$user_doc_access(id: *)",
"include_fields": "$users(id) as user_identifier"
}
# Sort by joined collection field
Following the $JoinedCollectionName( ... )
convention, we can sort_by
on a field that's present in the joined collection this way:
{
"sort_by": "$JoinedCollectionName( field_name:desc )"
}
Similarly, to specify an _eval
operation using the joined collection's field:
{
"sort_by": "$JoinedCollectionName( _eval(field_name:foo):desc )"
}
# Merging / nesting joined fields
By default, when we join on a collection, the collection's fields are returned as a nested document.
For example, when we join the books
collection with authors
collection above, notice how the fields of the
authors
collection appear as an object in the response document:
{
"document": {
"id": "0",
"title": "Famous Five",
"author_id": "0",
"authors": {
"first_name": "Enid",
"last_name": "Blyton"
}
}
}
We could instead make the fields of the authors
collection be merged with the fields of the books
document
by using the merge
strategy:
{
"collection": "books",
"include_fields": "$authors(*, strategy: merge)",
"q": "famous"
}
The default behavior is strategy: nest
.
# Forcing nested array for joined fields
In a one-to-many join query, you might want the joined collection's fields to be always represented as an array of objects, even if there is only a single match.
For example, given the following authors and books:
{"id": "0", ",first_name": "Enid", "last_name": "Blyton"}
{"id": "1", ",first_name": "JK", "last_name": "Rowling"}
{"title": "Famous Five", "author_id": "0"}
{"title": "Secret Seven", "author_id": "0"}
{"title": "Harry Potter", "author_id": "1"}
When we query the authors
collection and join on the books
collection, like this:
{
"collection": "authors",
"q": "*",
"filter_by": "$books(id:*)",
"include_fields": "$books(*)"
}
We might end up with the books being either a nested object or a nested array of objects, depending on whether there are 1 or more matched books for each author.
[
{
"document": {
"id": "1",
"first_name": "JK",
"last_name": "Rowling",
"books": {
"author_id": "1",
"id": "2",
"title": "Harry Potter"
}
}
},
{
"document": {
"id": "0",
"first_name": "Enid",
"last_name": "Blyton",
"books": [
{
"author_id": "0",
"id": "0",
"title": "Famous Five"
},
{
"author_id": "0",
"id": "1",
"title": "Secret Seven"
}
]
}
}
]
To always make the fields of the joined books
collection be an array of objects, you can use the nest_array
field merging strategy.
{
"collection": "authors",
"q": "*",
"filter_by": "$books(id:*)",
"include_fields": "$books(*, strategy: nest_array)"
}
This will always return an array of objects for the fields of books
collection.
[
{
"document": {
"id": "1",
"first_name": "JK",
"last_name": "Rowling",
"books": [
{
"author_id": "1",
"id": "2",
"title": "Harry Potter"
}
]
}
},
{
"document": {
"id": "0",
"first_name": "Enid",
"last_name": "Blyton",
"books": [
{
"author_id": "0",
"id": "0",
"title": "Famous Five"
},
{
"author_id": "0",
"id": "1",
"title": "Secret Seven"
}
]
}
}
]
# References inside an object
Let's say there is an object
field called order
in a orders
collection. We can make the order refer to a
product in a products
collection like this:
{
"name": "orders",
"fields": [
{"name": "order", "type": "object"},
{"name": "order.product_id", "type": "string", "reference": "products.id"}
]
}
Alternatively, if we had an array of order
objects with each order object containing a reference, then
the type of the reference field would have to be an array as well.
{
"name": "orders",
"fields": [
{"name": "orders", "type": "object[]"},
{"name": "orders.product_id", "type": "string[]", "reference": "products.id"}
]
}
# Using aliases with Joins
You can use a collection alias in a reference field definition. In the example below, products
could be an alias:
{"name": "product_id", "type": "string", "reference": "products.product_id"}
However, it's important to keep in mind that a collection's documents store the internal IDs of the referenced collection's documents. These internal IDs are sequential in nature and are assigned to documents based on the order in which they are indexed.
Therefore, it's crucial to treat the referenced collections as a group, i.e., if you intend to switch any one of the collections via an alias update, you must reindex all the related collections simultaneously. This will guarantee that the internal IDs remain in sync across all the collections involved in the join operation.
# Left Join
By default, Typesense performs inner join. To perform left join,
{
"filter_by": "id:* || $join_collection_name( <join_condition> )"
}
can be specified. id:*
matches all documents of the collection being searched. So the result will include the referenced documents if a reference exists otherwise the document will be returned as is.