While MySQL is not a fully featured text search engine, it has enough tricks up its sleeve to be useful for implementing basic search features in your application. Let’s take a quick walk through.

First, let’s open up the mysql prompt and create a new Database and call it restaurant.

Now, we can proceed to create a TABLE to store our records. We must be careful here as we need to clearly specify the fields that need to be ‘Full-text indexed'. Full-text search queries are only valid on fields that are full-text indexed. Full-text indexes can be created only for VARCHAR, CHAR, or TEXT columns.

CREATE TABLE food
(
  id              INT unsigned NOT NULL AUTO_INCREMENT, # Unique ID
  dish            VARCHAR(120) NOT NULL,                # Name of the dish
  chef            VARCHAR(120) NOT NULL,                # Chef's name
  flavor          VARCHAR(100) NOT NULL,                # Flavor of the dish
  PRIMARY KEY     (id)                                  # Making the id the primary key

  FULLTEXT        (dish)                                # Full-text indexes dish
  FULLTEXT        (chef)                                # Full-text indexes chef
  FULLTEXT        (flavor)                              # Full-text indexes flavor
);

If FULLTEXT indexing was not initialized along with the table, it can be enabled later using the ALTER command.

ALTER TABLE food ADD FULLTEXT (dish,chef);

Note that (dish,chef) will act as a pair. This means you can only use full-text queries on both fields at the same time and not either one. You need to specify them separately for them to be indexed independently. Similarly, you cannot full-text match with multiple columns if you indexed them individually.

Feeding values to the Table

We can implement this with normal SQL, INSERT INTO

INSERT INTO food (dish, chef, flavor) VALUES
	('Ramen Noodles', 'Gordon Ramensay', 'Chinesey with a hint of sweetness and with a hint of bitterness'),
	('Szechuan Noodles', 'Jackie Chan', 'Extra Spicy with a tiny hint of tanginess'),
	('Soupy Noodles', 'Julie Chan', 'Cheesy and extremely wet with hints of pepper');

Full-Text string matching

Full text search queries consists of two major parts:

  1. Querying a set of columns with MATCH (col1, col2, ..)
  2. The query and the search modifier to be used against these columns with AGAINST (query [search_modifier])

MySQL supports 3 types of searches. We will be taking a look at each of these three modes.

  • Natural Language Full-Text Searches
  • Boolean Full-Text searches
  • Query expansion searches

Natural Language Searches

Natural language full-text search interprets the search string as a free text (in the human language) and no special operators are required. One example would be to look up Chan in the chef column.

SELECT * FROM food WHERE MATCH(chef) AGAINST ('Chan' IN NATURAL LANGUAGE MODE);

# Output
+----+------------------+-------------+-------------+
| id | dish             | chef        | flavor      |
+----+------------------+-------------+-------------+
|  2 | Szechuan Noodles | Jackie Chan | Extra Spicy |
|  3 | Soupy Noodles    | Julie Chan  | Cheesy      |
+----+------------------+-------------+-------------+
2 rows in set (0.003 sec)

It must be noted that the searches are case-insensitive.

We can also search for multiple words. For example, let’s search for with hint against the flavor column to identify dishes containing a hint of a sub-flavor.

SELECT * FROM food WHERE MATCH(flavor) AGAINST ('with hint' IN NATURAL LANGUAGE MODE);

# Output
+----+------------------+-----------------+-----------------------------------------------------------------+
| id | dish             | chef            | flavor                                                          |
+----+------------------+-----------------+-----------------------------------------------------------------+
|  1 | Ramen Noodles    | Gordon Ramensay | Chinesey with a hint of sweetness and with a hint of bitterness |
|  2 | Szechuan Noodles | Jackie Chan     | Extra Spicy with a tiny hint of tanginess                       |
+----+------------------+-----------------+-----------------------------------------------------------------+
2 rows in set (0.006 sec)

Since it is full-text matching, the record with the plural form of the word (hints) did not get matched.

The search results are sorted with a relevance value. Relevance is computed based on:

  • The number of words in the row
  • The number of unique words in that row
  • The total number of words in the collection
  • The number of documents (rows) that contain a particular word.

Zero relevance means no similarity. We can get the relevance scores to be displayed with:

SELECT id, MATCH(flavor) AGAINST ('with hint' IN NATURAL LANGUAGE MODE) AS score FROM food;

# Output
+----+----------------------------+
| id | score                      |
+----+----------------------------+
|  1 | 0.000000003771856604828372 |
|  2 | 0.000000001885928302414186 |
|  3 |                          0 |
+----+----------------------------+
3 rows in set (0.004 sec)

The first row had a better relevance score as with hint appears twice in the row and that is more than just once in the second row.

Boolean Searches

A boolean search interprets the search string using the rules of a special query language. The query can contain words to search for, along with special operators that modify how the search results must appear; operators that check if the word must be present or absent in the matching row, or determines the weightage of the given word.

We use BOOLEAN MODE in MySQL to enable boolean full-text searches. The special operators are placed at the beginning or end of the words. A leading + denotes that a word must be present in the row, whereas a leading - denotes that it should not be present in the row. You can check out a full list of operators here.

SELECT * FROM food WHERE  MATCH(chef) AGAINST ('+Chan -Julie' IN BOOLEAN MODE);

# Output
+----+------------------+-------------+-------------------------------------------+
| id | dish             | chef        | flavor                                    |
+----+------------------+-------------+-------------------------------------------+
|  2 | Szechuan Noodles | Jackie Chan | Extra Spicy with a tiny hint of tanginess |
+----+------------------+-------------+-------------------------------------------+
1 row in set (0.127 sec)

The query given above, asks MySQL to search for rows containing the word chan but not containing the word Julie and as a result, we only get Jackie Chan in the output.

Query Expansion Searches

Query expansion searches give you a powerful option to do a deeper search with more flexibility. In natural language mode, only the matching hits are listed. If the search query was Jackie then only the row with Jackie Chan as the chef is enlisted. But if we were to do a query expansion search on chef with the query Jackie, it would find that Jackie Chan is the resulting match and then do another search for Chan (and any other word appearing in the chef column if any) and adds Julie Chan to the list as well.

The search takes place twice. In the first phase, the records matching the user-provided query are fetched. In the second phase, relevant words from the first set of records are used for another round of search.

SELECT * FROM food WHERE  MATCH(chef) AGAINST ('Jackie' WITH QUERY EXPANSION);

# Output
+----+------------------+-------------+-----------------------------------------------+
| id | dish             | chef        | flavor                                        |
+----+------------------+-------------+-----------------------------------------------+
|  2 | Szechuan Noodles | Jackie Chan | Extra Spicy with a tiny hint of tanginess     |
|  3 | Soupy Noodles    | Julie Chan  | Cheesy and extremely wet with hints of pepper |
+----+------------------+-------------+-----------------------------------------------+
2 rows in set (0.015 sec)

You can see that both Jackie and Julie Chans are listed, owing to Query expansion. What’s more interesting is, if we do something like this:

SELECT * FROM food WHERE  MATCH(dish,chef) AGAINST ('Jackie' WITH QUERY EXPANSION);

# Output
+----+------------------+-----------------+-----------------------------------------------------------------+
| id | dish             | chef            | flavor                                                          |
+----+------------------+-----------------+-----------------------------------------------------------------+
|  2 | Szechuan Noodles | Jackie Chan     | Extra Spicy with a tiny hint of tanginess                       |
|  3 | Soupy Noodles    | Julie Chan      | Cheesy and extremely wet with hints of pepper                   |
|  1 | Ramen Noodles    | Gordon Ramensay | Chinesey with a hint of sweetness and with a hint of bitterness |
+----+------------------+-----------------+-----------------------------------------------------------------+
3 rows in set (0.013 sec)

We get all three rows! This is because we are matching both dish and chef columns against the query. So the process behind it is:

  1. Jackie query yields one result: chef named Jackie Chan with dish Szechuan Noodles
  2. The second search takes other words from the result of the first search like Chan, Szechuan, and Noodles and runs another search across dish and chef columns
  3. Second search yields the other two results: Julie Chan (Soupy Noodles) result having both Chan and Noodles in the row gets higher relevancy compared to Ramen Noodles that only has Noodles in the row.
  4. All three results are ranked by relevancy in displayed

Full-text searching is available on both InnoDB and MyISAM database engines. It should however be noted that the minimum number of characters for full-text searching is 3 in InnoDB and 4 in MyISAM. Certain stop words like on, the or it are ignored while searching. You can find the total list of stopwords and more information about the same, here. You can also follow this process to ignore them while querying.

Full-text searches can be truly powerful when it comes to searching through large amounts of data in your database. Happy searching!