While MySQL is not a fully featured text search engine, it has enough tricks up its sleeve to be useful for implementing basic search features in your application. Let’s take a quick walk through.
First, let’s open up the mysql prompt and create a new Database and call it restaurant
.
Now, we can proceed to create a TABLE
to store our records. We must be careful here as we need to clearly specify the fields that need to be ‘Full-text indexed'. Full-text search queries are only valid on fields that are full-text indexed. Full-text indexes can be created only for VARCHAR
, CHAR
, or TEXT
columns.
CREATE TABLE food
(
id INT unsigned NOT NULL AUTO_INCREMENT, # Unique ID
dish VARCHAR(120) NOT NULL, # Name of the dish
chef VARCHAR(120) NOT NULL, # Chef's name
flavor VARCHAR(100) NOT NULL, # Flavor of the dish
PRIMARY KEY (id) # Making the id the primary key
FULLTEXT (dish) # Full-text indexes dish
FULLTEXT (chef) # Full-text indexes chef
FULLTEXT (flavor) # Full-text indexes flavor
);
If FULLTEXT
indexing was not initialized along with the table, it can be enabled later using the ALTER
command.
ALTER TABLE food ADD FULLTEXT (dish,chef);
Note that
(dish,chef)
will act as a pair. This means you can only use full-text queries on both fields at the same time and not either one. You need to specify them separately for them to be indexed independently. Similarly, you cannot full-text match with multiple columns if you indexed them individually.
Feeding values to the Table
We can implement this with normal SQL, INSERT INTO
INSERT INTO food (dish, chef, flavor) VALUES
('Ramen Noodles', 'Gordon Ramensay', 'Chinesey with a hint of sweetness and with a hint of bitterness'),
('Szechuan Noodles', 'Jackie Chan', 'Extra Spicy with a tiny hint of tanginess'),
('Soupy Noodles', 'Julie Chan', 'Cheesy and extremely wet with hints of pepper');
Full-Text string matching
Full text search queries consists of two major parts:
- Querying a set of columns with
MATCH (col1, col2, ..)
- The query and the search modifier to be used against these columns with
AGAINST (query [search_modifier])
MySQL supports 3 types of searches. We will be taking a look at each of these three modes.
- Natural Language Full-Text Searches
- Boolean Full-Text searches
- Query expansion searches
Natural Language Searches
Natural language full-text search interprets the search string as a free text (in the human language) and no special operators are required. One example would be to look up Chan
in the chef
column.
SELECT * FROM food WHERE MATCH(chef) AGAINST ('Chan' IN NATURAL LANGUAGE MODE);
# Output
+----+------------------+-------------+-------------+
| id | dish | chef | flavor |
+----+------------------+-------------+-------------+
| 2 | Szechuan Noodles | Jackie Chan | Extra Spicy |
| 3 | Soupy Noodles | Julie Chan | Cheesy |
+----+------------------+-------------+-------------+
2 rows in set (0.003 sec)
It must be noted that the searches are case-insensitive.
We can also search for multiple words. For example, let’s search for with hint
against the flavor
column to identify dishes containing a hint of a sub-flavor.
SELECT * FROM food WHERE MATCH(flavor) AGAINST ('with hint' IN NATURAL LANGUAGE MODE);
# Output
+----+------------------+-----------------+-----------------------------------------------------------------+
| id | dish | chef | flavor |
+----+------------------+-----------------+-----------------------------------------------------------------+
| 1 | Ramen Noodles | Gordon Ramensay | Chinesey with a hint of sweetness and with a hint of bitterness |
| 2 | Szechuan Noodles | Jackie Chan | Extra Spicy with a tiny hint of tanginess |
+----+------------------+-----------------+-----------------------------------------------------------------+
2 rows in set (0.006 sec)
Since it is full-text matching, the record with the plural form of the word (hints) did not get matched.
The search results are sorted with a relevance value. Relevance is computed based on:
- The number of words in the row
- The number of unique words in that row
- The total number of words in the collection
- The number of documents (rows) that contain a particular word.
Zero relevance means no similarity. We can get the relevance scores to be displayed with:
SELECT id, MATCH(flavor) AGAINST ('with hint' IN NATURAL LANGUAGE MODE) AS score FROM food;
# Output
+----+----------------------------+
| id | score |
+----+----------------------------+
| 1 | 0.000000003771856604828372 |
| 2 | 0.000000001885928302414186 |
| 3 | 0 |
+----+----------------------------+
3 rows in set (0.004 sec)
The first row had a better relevance score as with hint
appears twice in the row and that is more than just once in the second row.
Boolean Searches
A boolean search interprets the search string using the rules of a special query language. The query can contain words to search for, along with special operators that modify how the search results must appear; operators that check if the word must be present or absent in the matching row, or determines the weightage of the given word.
We use BOOLEAN MODE
in MySQL to enable boolean full-text searches. The special operators are placed at the beginning or end of the words. A leading +
denotes that a word must be present in the row, whereas a leading -
denotes that it should not be present in the row. You can check out a full list of operators here.
SELECT * FROM food WHERE MATCH(chef) AGAINST ('+Chan -Julie' IN BOOLEAN MODE);
# Output
+----+------------------+-------------+-------------------------------------------+
| id | dish | chef | flavor |
+----+------------------+-------------+-------------------------------------------+
| 2 | Szechuan Noodles | Jackie Chan | Extra Spicy with a tiny hint of tanginess |
+----+------------------+-------------+-------------------------------------------+
1 row in set (0.127 sec)
The query given above, asks MySQL to search for rows containing the word chan
but not containing the word Julie
and as a result, we only get Jackie Chan
in the output.
Query Expansion Searches
Query expansion searches give you a powerful option to do a deeper search with more flexibility. In natural language mode, only the matching hits are listed. If the search query was Jackie
then only the row with Jackie Chan
as the chef is enlisted. But if we were to do a query expansion search on chef
with the query Jackie
, it would find that Jackie Chan
is the resulting match and then do another search for Chan
(and any other word appearing in the chef column if any) and adds Julie Chan
to the list as well.
The search takes place twice. In the first phase, the records matching the user-provided query are fetched. In the second phase, relevant words from the first set of records are used for another round of search.
SELECT * FROM food WHERE MATCH(chef) AGAINST ('Jackie' WITH QUERY EXPANSION);
# Output
+----+------------------+-------------+-----------------------------------------------+
| id | dish | chef | flavor |
+----+------------------+-------------+-----------------------------------------------+
| 2 | Szechuan Noodles | Jackie Chan | Extra Spicy with a tiny hint of tanginess |
| 3 | Soupy Noodles | Julie Chan | Cheesy and extremely wet with hints of pepper |
+----+------------------+-------------+-----------------------------------------------+
2 rows in set (0.015 sec)
You can see that both Jackie and Julie Chans are listed, owing to Query expansion. What’s more interesting is, if we do something like this:
SELECT * FROM food WHERE MATCH(dish,chef) AGAINST ('Jackie' WITH QUERY EXPANSION);
# Output
+----+------------------+-----------------+-----------------------------------------------------------------+
| id | dish | chef | flavor |
+----+------------------+-----------------+-----------------------------------------------------------------+
| 2 | Szechuan Noodles | Jackie Chan | Extra Spicy with a tiny hint of tanginess |
| 3 | Soupy Noodles | Julie Chan | Cheesy and extremely wet with hints of pepper |
| 1 | Ramen Noodles | Gordon Ramensay | Chinesey with a hint of sweetness and with a hint of bitterness |
+----+------------------+-----------------+-----------------------------------------------------------------+
3 rows in set (0.013 sec)
We get all three rows! This is because we are matching both dish
and chef
columns against the query. So the process behind it is:
Jackie
query yields one result: chef namedJackie Chan
with dishSzechuan Noodles
- The second search takes other words from the result of the first search like
Chan
,Szechuan
, andNoodles
and runs another search acrossdish
andchef
columns - Second search yields the other two results:
Julie Chan (Soupy Noodles)
result having bothChan
andNoodles
in the row gets higher relevancy compared toRamen Noodles
that only hasNoodles
in the row. - All three results are ranked by relevancy in displayed
Full-text searching is available on both InnoDB and MyISAM database engines. It should however be noted that the minimum number of characters for full-text searching is 3 in InnoDB and 4 in MyISAM. Certain stop words like on
, the
or it
are ignored while searching. You can find the total list of stopwords and more information about the same, here. You can also follow this process to ignore them while querying.
Full-text searches can be truly powerful when it comes to searching through large amounts of data in your database. Happy searching!