1/3 Implementing an AutoSuggest feature using MySQL fulltext indices

The MySQL full-text index

Current MySQL versions provide a full-text index (FTI) which is generally used to index and search MyISAM (the default storage engine in MySQL) tables like this:

SELECT id, content FROM documents WHERE MATCH(content) AGAINST ("tes*" IN BOOLEAN MODE)

Internally, every indexed (text) column of a row is splitted into its words. Then the full-text index, which basically is a hash {word, count} => {weight, record id}, is updated. For instance, if you have a table “documents” with columns “id” and “content”, filled with two records: { 1, “a test row” } and { 2, “another testing row” }, the index keys would be “another”, “row, “test”, “testing”. “a” is ignored because it is a stopword. For more details about the index internals, see the excellent article The Full-Text Stuff That We Didn’t Put In The Manual.

AutoSuggest

Many Web sites have a search field that lets users initiate a full-text search on the Web site’s database. Some of these search fields implement a so-called AutoSuggest or AutoComplete feature. An early and widely known implemention is Google Suggest. If you enter the first characters of a search query, for instance “goo”, a list with possible query strings (“Google”, …) appears.

It’s not possible to utilize the MySQL FTI for such a feature only using SQL because the SQL statements always return the full rows where a search string matches. In our SQL example from above, we would get all rows of the table “documents” that contain at least one column matching to “tes*”. However, for AutoSuggest, we don’t need the whole column but only the indexed words that begin with “tes”. The result rows itself are not needed for AutoSuggest (only for the search that is initiated after the user selected a search phrase and clicked OK).

A solution

One solution is to access the FTI directly like the myisamchk or myisam_ftdump tools that come with MySQL do. For instance, if our user enters “tes” in the search field, this tool would have to:

  1. Open/lock the table that contains the full-text index (in our example, “documents”).
  2. Traverse the FTI until “tes”. Because the index has a tree structure (B-Tree), this is very fast (no scanning).
  3. Read the index keys (= the indexed words) until a key doesn’t match to “tes” anymore.
  4. Close/unlock the table.

For our example table and query, this procedure would return “test” and “testing” — exactly the words that should be suggested for “tes”.

For a working implementation, see “2/3 myisam_suggest: an AutoComplete tool for MySQL fulltext indices”.
For a working integration with an AJAX AutoSuggest script, see “3/3 Populate an AJAX AutoSuggest field on your Web site with myisam_suggest”.

2 Responses to “1/3 Implementing an AutoSuggest feature using MySQL fulltext indices”

  1. dotmusik said:

    Jan 21, 09 at 22:43

    Is there a working example of this? Looks very interesting, but I’m just concerned how resource intensive it might be.

  2. rfc2822 said:

    Jan 21, 09 at 22:48

    It shouldn’t be resource intensive because it uses the index that already exists (when you use the MyISAM full-text index feature). When searching the index, it uses the tree structure so it doesn’t need to scan all rows.

    As mentioned in the third article of this series (3/3), you can view a demo on http://www.gimpusers.com (use the search field)


Leave a Reply


blog.dev001.net is Digg proof thanks to caching by WP Super Cache