Fuzzy Search refers to the process of approximately searching for a given search query. It may also be called as a “typo tolerant search”.
Let’s say you want to search a recipe database for szechuan noodles
but you can’t remember how exactly szechuan
is
spelt. So you can type in a query that you think is the closest to the actual query:
schezwan noodles
and it will give you results for similar words and you will end up finding the
intended recipe for Szechuan Noodles.
Implementing fuzzy search in Javascript using Fuse.js
The easiest way to set up fuzzy search capabilities in Javascript is by using one of the
most popular fuzzy search libraries: Fuse.js
.
Note that the use case for this library is for relatively small data that can be handled from within your Javascript files. For larger datasets, we need to use a full-text search engine. For completeness, we’ll cover that as well, towards the end of this article.
Installation
You can set up fuse.js
by adding it to your project using npm or yarn:
$ npm install --save fuse.js # using npm
$ yarn add fuse.js # using yarn
You can alternatively use a CDN or Deno link.
We can then import the installed library into your project:
import Fuse from 'fuse.js' // ES6
const Fuse = require('fuse.js') // ES5
Putting together some sample data
Let’s take a look at the data that needs a little searching! We will continue with the example of a recipe database. The data can be stored in an array of javascript objects or as a plain array of strings.
Here’s a sample dataset with a bunch of Harry Potter cuisine mixed with what we are actually looking for,
szechuan noodles
.
const dishes = [
{
"dish": "Bertie Bott's Beans",
"chef": "Bertie Bott",
"flavor": "Spinach"
},
{
"dish": "Pumpkin Pasty",
"chef": "Trolley Witch",
"flavor": "Pumpkiny"
},
{
"dish": "Butterbeer",
"chef": "Madam Rosmerta",
"flavor": "Fizzy Soda"
},
{
"dish": "Szechuan noodles",
"chef": "Muggle Malfoy",
"flavor": "Spicy"
},
]
We are now ready to initialize the Fuse.js client and index some data.
There are a plethora of configuration options that allow us to tweak how the search occurs.
For example, if we want to index only the dish
and chef
values in the dataset, we can do that via the keys
array.
const options = {
keys: [
"dish",
"chef"
]
};
const fuse = new Fuse(dishes, options);
Nested keys can be accessed with the dot operator. For example, if “chef” was a nested JSON object that comprised of
two keys “name” and “nationality”, then we can tell fuse.js to index just the name of the chef by defining
chef.name
inside the keys
array.
Searching for fuzzy string matches
We’re now ready to make Fuse.js search through our dataset. It’s as simple as passing a query string.
fuse.search("Schezwan")
The cool thing here is that you don’t have to provide the exact word or phrase as present in the dataset.
> fuse.search("Schezwan")
[
{
"item": {
"dish": "Szechuan noodles",
"chef": "Muggle Malfoy",
"flavor": "Spicy"
},
"refIndex": 3
}
]
As we can see, Fuse.js is able to correct identify the record containing the word Szechuan
even though our query is
Schezwan
.
Let’s look at a few more examples to get a feel for how the fuzzy searching works.
> fuse.search("Schezwan")
[
{
"item": {
"dish": "Szechuan noodles",
"chef": "Muggle Malfoy",
"flavor": "Spicy"
},
"refIndex": 3
}
]
> fuse.search("shezvan")
[
{
"item": {
"dish": "Szechuan noodles",
"chef": "Muggle Malfoy",
"flavor": "Spicy"
},
"refIndex": 3
}
]
> fuse.search("betty")
[
{
"item": {
"dish": "Bertie Bott's Beans",
"chef": "Bertie Bott",
"flavor": "Spinach"
},
"refIndex": 0
},
{
"item": {
"dish": "Butterbeer",
"chef": "Madam Rosmerta",
"flavor": "Fizzy Soda"
},
"refIndex": 2
}
]
> fuse.search("betty bear")
[
{
"item": {
"dish": "Butterbeer",
"chef": "Madam Rosmerta",
"flavor": "Fizzy Soda"
},
"refIndex": 2
},
{
"item": {
"dish": "Bertie Bott's Beans",
"chef": "Bertie Bott",
"flavor": "Spinach"
},
"refIndex": 0
}
]
Notice how the query betty
is similar to both Bertie
and Butterbeer
,
but since it is more similar to Bertie
, the dish Bertie Bott’s Beans
is ranked higher than Butterbeer
.
Similarly, betty bear
is similar to both Bertie
and Butterbeer
but Fuzzy.js is able to
correctly rank Butterbeer
first.
Tuning for relevancy
Fuse.js ranks the matching items on a relevance score calculated for each item, and is determined by three factors:
- Fuzziness score
- Key weight
- Field-length norm
Fuzziness score
The fuzziness score determines how fuzzy or exact a word in the dataset is to the query.
It is determined by location
, distance
, and threshold
parameters.
The default values for these options are:
location : 0
distance : 100
threshold : 0.6
With this default configuration, a match must be within 60 characters (0 + (100 * 0.6) = 60
) from the start of
the string (i.e. location
set to 0
).
Let’s say that you search for the term Szechuan
in the string:
Hey there! I am looking to buy some noodles and specifically the Szechuan noodles!
Even though the word is present in the string, a search with the default configuration won’t yield any results because
Szechuan
starts from the 65th
index, while the default configuration only accepts strings till the 60th
index.
If you want to match a string regardless of where it appears, you can also disable location
based matching
by setting ignoreLocation
to true
.
Key weight
You can use the key weight to boost matches on certain fields over others. It’s optional and defaulted to the value 1.
const fuse = new Fuse(list, {
keys: [
'dish',
{
name: 'flavor',
weight: 2
}
]
})
In the example above, dish
has a default weight of 1 but flavor
gets a weight of 2. So string matches on the
flavor
field value will be prioritized over matches against the dish
field value.
Field-length norm
This ensures that the relevancy of a shorter field like title
is more than that of a
field containing longer text like description
.
You can ignore this by setting ignoreFieldNorm
option to true
.
Handling larger datasets
Witht that, we’ve come to the end of our little tour of implementing fuzzy search in Javascript using the Fuse.js library.
The biggest catch with this approach is that it is viable for searcing only small amounts of data.
It would not be be wise to index and search on millions of records using Fuse.js as it will require your users to download a large index on page load. This can slow down your site and also consumes a lot of bandwidth.
For implementing fuzzy search on larger datasets, you can use Typesense and its Javascript client to seamlessly search through millions of records in no time.