Simple fuzzy search with Laravel and PostgreSQL

Published by at 3rd October 2017 10:56 pm

When implementing fuzzy search, many developers reach straight for specialised tools like Elasticsearch. However, for simple implementations, this is often overkill. PostgreSQL, my relational database of choice, can natively handle fuzzy search quite easily if you know how. Here's how you might use this with Laravel.

Suppose we have the following migration to create a locations table, storing towns, cities and villages:

1<?php
2
3use Illuminate\Database\Schema\Blueprint;
4use Illuminate\Database\Migrations\Migration;
5
6class CreateLocations extends Migration
7{
8 /**
9 * Run the migrations.
10 *
11 * @return void
12 */
13 public function up()
14 {
15 // Create locations table
16 Schema::create('locations', function (Blueprint $table) {
17 $table->increments('id')->unsigned();
18 $table->string('name');
19 $table->timestamps();
20 });
21 }
22
23 /**
24 * Reverse the migrations.
25 *
26 * @return void
27 */
28 public function down()
29 {
30 // Drop locations table
31 Schema::drop('locations');
32 }
33}

The key to this implementation of fuzzy search is trigrams. A trigram is a group of three consecutive characters taken from a string. Using the pg_trgm module, which comes with PostgreSQL, we can break a string into as many trigrams as possible, and then return the strings with the most matching trigrams.

We can ensure that pg_trgm is set up on the database by creating a migration:

1<?php
2
3use Illuminate\Support\Facades\Schema;
4use Illuminate\Database\Schema\Blueprint;
5use Illuminate\Database\Migrations\Migration;
6
7class AddTrgmExtension extends Migration
8{
9 /**
10 * Run the migrations.
11 *
12 * @return void
13 */
14 public function up()
15 {
16 DB::statement('CREATE EXTENSION IF NOT EXISTS pg_trgm');
17 }
18
19 /**
20 * Reverse the migrations.
21 *
22 * @return void
23 */
24 public function down()
25 {
26 DB::statement('DROP EXTENSION IF EXISTS pg_trgm');
27 }
28}

Make sure you run the migration as well. Once that is done, we can make a raw fuzzy query against the name field as follows:

SELECT * FROM locations WHERE 'burgh' % name;

Translating that to work with the Eloquent ORM, we can perform fuzzy queries against the name field as follows:

$location = Location::whereRaw("'burgh' % name")->get();

This query might match both Aldeburgh and Edinburgh. It's also able to handle slight misspellings, as in this example:

$location = Location::whereRaw("'hendrad' % name")->get();

This query will match East Hendred or West Hendred successfully. As you can see, we can match strings at any point in the name string, and handle slight mis-spellings without any problems.

In practice, rather than using whereRaw() every time, you'll probably want to create a local scope that accepts the name you want to match against. You'll also want to use query parameters to prevent SQL injection:

$location = Location::whereRaw("? % name", [$name])->get();

Improving performance with an index

The performance of these queries isn't that great out of the box. We can improve them by creating an index:

1<?php
2
3use Illuminate\Support\Facades\Schema;
4use Illuminate\Database\Schema\Blueprint;
5use Illuminate\Database\Migrations\Migration;
6
7class AddTrgmExtension extends Migration
8{
9 /**
10 * Run the migrations.
11 *
12 * @return void
13 */
14 public function up()
15 {
16 DB::statement('CREATE EXTENSION IF NOT EXISTS pg_trgm');
17 DB::statement('CREATE INDEX locations_name_trigram ON locations USING gist(name gist_trgm_ops);');
18 }
19
20 /**
21 * Reverse the migrations.
22 *
23 * @return void
24 */
25 public function down()
26 {
27 DB::statement('DROP INDEX IF EXISTS locations_name_trigram');
28 DB::statement('DROP EXTENSION IF EXISTS pg_trgm');
29 }
30}

Adding an index should produce a noticeable improvement in the response time.

Final thoughts

PostgreSQL's pg_trgm module is a fairly straightforward way of implementing fuzzy search. It's not much more involved than a LIKE or ILIKE clause in your query, and for many use cases, it's more than sufficient. If you don't have a huge number of records, it's probably a more appropriate choice than something like Elasticsearch, and has the advantage of a simpler stack. However, if you have a larger dataset, you may be better off with a dedicated search solution. As always, if you're unsure it's a good idea to try both and see what works best for that particular use case.