Simple fuzzy search with Laravel and PostgreSQL
Published by Matthew Daly at 3rd October 2017 10:56 pm
When implementing fuzzy search, many developers reach straight for specialised tools like Elasticsearch. However, for simple implementations, this is often overkill. PostgreSQL, my relational database of choice, can natively handle fuzzy search quite easily if you know how. Here's how you might use this with Laravel.
Suppose we have the following migration to create a locations
table, storing towns, cities and villages:
1<?php23use Illuminate\Database\Schema\Blueprint;4use Illuminate\Database\Migrations\Migration;56class CreateLocations extends Migration7{8 /**9 * Run the migrations.10 *11 * @return void12 */13 public function up()14 {15 // Create locations table16 Schema::create('locations', function (Blueprint $table) {17 $table->increments('id')->unsigned();18 $table->string('name');19 $table->timestamps();20 });21 }2223 /**24 * Reverse the migrations.25 *26 * @return void27 */28 public function down()29 {30 // Drop locations table31 Schema::drop('locations');32 }33}
The key to this implementation of fuzzy search is trigrams. A trigram is a group of three consecutive characters taken from a string. Using the pg_trgm
module, which comes with PostgreSQL, we can break a string into as many trigrams as possible, and then return the strings with the most matching trigrams.
We can ensure that pg_trgm
is set up on the database by creating a migration:
1<?php23use Illuminate\Support\Facades\Schema;4use Illuminate\Database\Schema\Blueprint;5use Illuminate\Database\Migrations\Migration;67class AddTrgmExtension extends Migration8{9 /**10 * Run the migrations.11 *12 * @return void13 */14 public function up()15 {16 DB::statement('CREATE EXTENSION IF NOT EXISTS pg_trgm');17 }1819 /**20 * Reverse the migrations.21 *22 * @return void23 */24 public function down()25 {26 DB::statement('DROP EXTENSION IF EXISTS pg_trgm');27 }28}
Make sure you run the migration as well. Once that is done, we can make a raw fuzzy query against the name
field as follows:
SELECT * FROM locations WHERE 'burgh' % name;
Translating that to work with the Eloquent ORM, we can perform fuzzy queries against the name
field as follows:
$location = Location::whereRaw("'burgh' % name")->get();
This query might match both Aldeburgh
and Edinburgh
. It's also able to handle slight misspellings, as in this example:
$location = Location::whereRaw("'hendrad' % name")->get();
This query will match East Hendred
or West Hendred
successfully. As you can see, we can match strings at any point in the name string, and handle slight mis-spellings without any problems.
In practice, rather than using whereRaw()
every time, you'll probably want to create a local scope that accepts the name you want to match against. You'll also want to use query parameters to prevent SQL injection:
$location = Location::whereRaw("? % name", [$name])->get();
Improving performance with an index
The performance of these queries isn't that great out of the box. We can improve them by creating an index:
1<?php23use Illuminate\Support\Facades\Schema;4use Illuminate\Database\Schema\Blueprint;5use Illuminate\Database\Migrations\Migration;67class AddTrgmExtension extends Migration8{9 /**10 * Run the migrations.11 *12 * @return void13 */14 public function up()15 {16 DB::statement('CREATE EXTENSION IF NOT EXISTS pg_trgm');17 DB::statement('CREATE INDEX locations_name_trigram ON locations USING gist(name gist_trgm_ops);');18 }1920 /**21 * Reverse the migrations.22 *23 * @return void24 */25 public function down()26 {27 DB::statement('DROP INDEX IF EXISTS locations_name_trigram');28 DB::statement('DROP EXTENSION IF EXISTS pg_trgm');29 }30}
Adding an index should produce a noticeable improvement in the response time.
Final thoughts
PostgreSQL's pg_trgm
module is a fairly straightforward way of implementing fuzzy search. It's not much more involved than a LIKE
or ILIKE
clause in your query, and for many use cases, it's more than sufficient. If you don't have a huge number of records, it's probably a more appropriate choice than something like Elasticsearch, and has the advantage of a simpler stack. However, if you have a larger dataset, you may be better off with a dedicated search solution. As always, if you're unsure it's a good idea to try both and see what works best for that particular use case.