Matthew Daly's Blog

I'm a web developer in Norfolk. This is my blog...

28th July 2019 8:55 pm

Skipping Environment Specific Phpunit Tests

If you’re doing client work, you don’t generally have to worry too much about working with any services other than those that will be installed in your production environment. For instance, if you’re using Memcached as your cache backend, you needn’t go to the trouble of checking that it works with Redis too unless the project actively switches. However, for more general purpose software that may be deployed to a variety of different environments, you may have to test it in all of those environments, which can be a chore.

Lately I’ve been working on a micro CMS for a personal project, and ran into a bit of an issue. This CMS uses the Stash caching library, and I wanted it to actively support all of the cache backends Stash provides. The CMS is configured using YAML, and I’d written a factory class that takes in the cache configuration and returns an adapter. The problem was that there are three adapters that require additional software to be installed, namely the APC, Redis and Memcached adapters. Installing all the packages to use all three of the adapters is onerous, and while it’s a good idea to test them all, it’s generally not worth the bother of adding all of them to your local development environment where you need your tests to run as fast as possible. Instead you’re better off deferring those tests that require additional dependencies to your continuous integration server, which can afford to be a lot slower.

Fortunately, PHPUnit allows you to mark a test as skipped by calling markTestSkipped(). In the past I’ve used this or the similar markTestIncomplete() method when a test wasn’t finished, but it’s also useful for skipping tests based on the environment. We can either test for the presence of the dependency and mark the test as skipped if it’s not present, or set the test up inside a try…catch block and call markTestSkipped() if the test throws an exception due to a missing dependency, as in this example:

<?php declare(strict_types = 1);
namespace Tests\Unit\Factories;
use Tests\TestCase;
use App\Factories\CacheFactory;
use Stash\Exception\RuntimeException;
use Mockery as m;
final class CacheFactoryTest extends TestCase
{
public function testRedis()
{
$factory = new CacheFactory;
try {
$pool = $factory->make([
'driver' => 'redis',
'servers' => [[
'127.0.0.1',
'6379'
]]
]);
} catch (RuntimeException $e) {
$this->markTestSkipped('Dependency not installed');
}
$this->assertInstanceOf('Stash\Pool', $pool);
$this->assertInstanceOf('Stash\Driver\Redis', $pool->getDriver());
}
}

As a general rule of thumb, when running your tests locally, it’s more important that your test suite run quickly than provide 100% coverage. Tests that are slower or require multiple services to be installed can still be run by your continuous integration server, which can afford to be slower since it’s not a blocker in the same way. In addition, I’m only ever really interested in coverage stats on the CI server, since enabling that slows PHPUnit down a lot, so since coverage is a non-issue locally we can happily leave covering that dependency to our CI server. In this case, the project is hosted on Github and uses Travis CI for running the tests and Coveralls for recording coverage, so we can leave the full test suite to be run on Travis CI, ensuring full coverage, while skipping those tests that require Redis, Memcached or APC locally.

Having a comprehensive test suite, and running it regularly during development, is important, but that doesn’t mean it’s compulsory you run every test regularly. In a case like this, where there are multiple adapters for the same basic functionality, you can often afford to avoid running those that test adapters with more exacting requirements.

19th June 2019 10:00 pm

Powering Up Git Bisect With the Run Command

The bisect command in Git can be very useful when trying to catch any regressions. If you know that a bug was not present at some point in the past, and now is, you can often use bisect to track it down quickly and easily.

The basic functionality is fairly simple. You start the process by tracking down a known “good” commit in the past, and a known “bad” commmit, which will usually be the head of the branch. Then, you start bisecting:

$ git bisect start

You then specify your bad commit:

$ git bisect bad HEAD

And your good commit

$ git bisect good fe0616f0cd523455a0e5bc536c09bfb1d8fd0c3f

And it will then step through the commits in between. Note that not every commit is loaded - it instead picks a commit between those you entered, and from there quickly narrows down the range. For each commit, you test it and mark it as good or bad with git bisect good or git bisect bad as appropriate. Once it’s tracked down the commit that introduced the problem, it will tell you what that commit was, making any remaining debugging much easier. There are situations that are more difficult to handle, such as when database migrations have been created and run in the intervening period, but for many cases bisect can be a very valuable tool.

However, it can still be a chore to step through those commits manually. Fortunately, in situations where you can produce some sort of script to determine if the issue is present or not, there’s an easy way to automate it with the bisect run command.

One of the personal projects I have on the go right now is a micro-CMS intended primarily for brochure-style sites. It includes an AJAX search that uses Fuse.js on the front end, the index for which is generated by a console task built on top of the Symfony Console component. Recently I noticed that although the unit tests still passed, the console task to generate the index no longer worked as expected due to an issue with Flysystem. Since it threw an error in the console, that could be used as input to git bisect. I was therefore able to automate the process of finding the bug by running this command:

$ git bisect run php console index:generate

This was pretty rare in that it was an ideal situation - the problem was the console command throwing an explicit error, which was perfect as input to bisect run. A more likely scenario in many cases is that if you want to automate catching the error, you’ll need to create an automated test to reproduce that error, and run that test with git bisect run. Given that TDD already recommends writing a test to reproduce a bug before fixing it, it’s prudent to write the test first, then use it to run the bisect command, before fixing the bug and committing both the fix and the new test, so as to not only minimise the manual work required, but also ensure it won’t crop up again.

Certain classes of issues are more difficult to automate in this way - for example, visual regressions in CSS. If you’re using a library like React or Vue, snapshot testing may be a good way to automate the bisect process for HTML rendered by components, or you could try the approach I’ve mentioned before for snapshot testing PHP applications. For legacy applications that can’t create and tear down a database for testing purposes due to gaps in the migration history, it can also be tricky and time-consuming to ensure consistency between runs. However, if you can do it, automating the bisect command makes it much quicker, and leaves you with a test you can retain to ensure that bug never returns again.

14th May 2019 12:15 pm

Writing Golden Master Tests for Laravel Applications

Last year I wrote a post illustrating how to write golden master tests for PHP applications in general. This approach works, but has a number of issues:

  • Because it uses a headless browser such as Goutte, it’s inevitably slow (a typical test run for the legacy application I wrote those tests for is 3-4 minutes)
  • It can’t allow for differing content, so any changes to the content will break the tests

These factors limit its utility for many PHP applications. However, for a Laravel application you’re in a much better position:

  • You can use Browserkit rather than a headless browser, resulting in much faster response times
  • You can set up a testing database, and populate it with the same data each time, ensuring that the only thing that can change is how that data is processed to create the required HTML

Here I’ll show you how to adapt that approach to work with a Laravel application.

We rely on Browserkit testing for this approach, so you need to install that:

$ composer require --dev laravel/browser-kit-testing

Next, we need to create our base golden master test case:

<?php
namespace Tests;
use Tests\BrowserTestCase;
class GoldenMasterTestCase extends BrowserTestCase
{
use CreatesApplication;
public $baseUrl = 'http://localhost';
protected $snapshotDir = "tests/snapshots/";
protected $response;
protected $path;
public function goto($path)
{
$this->path = $path;
$this->response = $this->call('GET', $path);
$this->assertNotEquals(404, $this->response->status());
return $this;
}
public function saveHtml()
{
if (!$this->snapshotExists()) {
$this->saveSnapshot();
}
return $this;
}
public function assertSnapshotsMatch()
{
$path = $this->getPath();
$newHtml = $this->processHtml($this->getHtml());
$oldHtml = $this->getOldHtml();
$diff = "";
if (function_exists('xdiff_string_diff')) {
$diff = xdiff_string_diff($oldHtml, $newHtml);
}
$message = "The path $path does not match the snapshot\n$diff";
self::assertThat($newHtml == $oldHtml, self::isTrue(), $message);
}
protected function getHtml()
{
return $this->response->getContent();
}
protected function getPath()
{
return $this->path;
}
protected function getEscapedPath()
{
return $this->snapshotDir.str_replace('/', '_', $this->getPath()).'.snap';
}
protected function snapshotExists()
{
return file_exists($this->getEscapedPath());
}
protected function processHtml($html)
{
return preg_replace('/(<input type="hidden"[^>]+\>|<meta name="csrf-token" content="([a-zA-Z0-9]+)">)/i', '', $html);
}
protected function saveSnapshot()
{
$html = $this->processHtml($this->getHtml());
file_put_contents($this->getEscapedPath(), $html);
}
protected function getOldHtml()
{
return file_get_contents($this->getEscapedPath());
}
}

The goto() method sets the current path on the object, then fetches the page. It verifies the page was found, and then returns an instance of the object, to allow for method chaining.

Another method of note is the saveHtml() method. This checks to see if the snapshot exists - if not, it saves it. The snapshot is essentially just the HTML returned from that route, but certain content may need to be stripped out, which is done in the processHtml() method. In this case we’ve stripped out hidden fields and the CSRF token meta tag, as CSRF tokens are generated anew each time and will break the snapshots.

The last method we’ll look at is the assertSnapshotsMatch() method. This will get the current HTML, and that for any snapshot for that route, and then compare them. If they differ, it will fail the assertion. In addition, if xdiff_string_diff is available, it will show a diff of the two files - be warned, these can sometimes be large, but they can be helpful in debugging.

Also, note our snapshots directory - tests/snapshots. If you do make a breaking change and want to delete a snapshot, then you can find it in there - the format replaces forward slashes with underscores, and appends a file extension of .snap, but feel free to customise this to your needs.

Next, we’ll create a test for routes that don’t require authentication, at tests/GoldenMaster/ExampleTest.php:

<?php
namespace Tests\GoldenMaster;
use Tests\GoldenMasterTestCase;
use Illuminate\Foundation\Testing\RefreshDatabase;
use App\User;
class ExampleTest extends GoldenMasterTestCase
{
use RefreshDatabase;
/**
* @dataProvider nonAuthDataProvider
*/
public function testNonAuthPages($data)
{
$this->goto($data)
->saveHtml()
->assertSnapshotsMatch();
}
public function nonAuthDataProvider()
{
return [
['/register'],
['/login'],
];
}
}

Note the use of the data provider. We want to be able to step through a list of routes, and verify each in turn, so it makes sense to set up a data provider method as nonAuthDataProvider(), which will return an array of routes. If you haven’t used data providers before, they are an easy way to reduce boilerplate in your tests when you need to test the same thing over and over with different data, and you can learn more here.

Now, having seen the methods used, it should be easy to understand testNonAuthPages(). It goes through the following steps:

  • Visit the route passed through, eg /register
  • Save the HTML to a snapshot, if not already saved
  • Assert that the current content matches the snapshot

Using this method, you can test a lot of routes for unexpected changes quite easily. If you’ve used snapshot tests with something like Jest, this is a similar approach.

Authenticated routes

This won’t quite work with authenticated routes, so a few more changes are required. You’ll get a response, but if you look at the HTML it will clearly show the user is being redirected for all of them, so there’s not much point in testing them.

If your content does not differ between users, you can add the trait Illuminate\Foundation\Testing\WithoutMiddleware to your test to disable the authentication and allow the test to get the content without being redirected.

If, however, your content does differ between users, you need to instead create a user object, and use the actingAs() method already available in Laravel tests to set the user, as follows:

<?php
namespace Tests\GoldenMaster;
use Tests\GoldenMasterTestCase;
use Illuminate\Foundation\Testing\RefreshDatabase;
use App\User;
class ExampleTest extends GoldenMasterTestCase
{
use RefreshDatabase;
/**
* @dataProvider authDataProvider
*/
public function testAuthPages($data)
{
$user = factory(User::class)->create([
'email' => 'eric@example.com',
'name' => 'Eric Smith',
'password' => 'password'
]);
$this->actingAs($user)
->goto($data)
->saveHtml()
->assertSnapshotsMatch();
}
public function authDataProvider()
{
return [
['/'],
];
}
}

This will allow us to visit a specific page as a user, without being redirected.

Summary

This can be a useful technique to catch unexpected breakages in applications, particularly ones which have little or no conventional test coverage. While I originated this technique on a Zend 1 legacy code base, leveraging the tools available in Laravel makes this technique much faster and more useful. If your existing Laravel application is not as well tested as you’d like, and you have some substantial changes to make that risk breaking some of the functionality, having these sorts of golden master tests set up can be a quick and easy way of catching any problems as soon as possible.

4th March 2019 9:26 pm

How Much Difference Does Adding An Index to a Database Table Make?

For the last few weeks, I’ve been kept busy at work building out a new homepage for the legacy intranet system I maintain. The new homepage is built virtually from scratch with React, and has a completely new set of queries. In addition, I’ve also rebuilt the UI for the navigation to use React too. This has allowed me to bypass a lot of the worst code in the whole code base with the intent to get rid of it once the new home page is live - something I’m very pleased about!

As part of this, I built some new functionality to show items added in the last seven days. This section of the home page can be sorted by several parameters, including popularity. I also added the facility to expand that to 31 days via an AJAX request. However, the AJAX request was painfully slow, often taking 20-30 seconds. Also, the home page was quite slow to load in the first place, and examining the query time in Clockwork indicated that the culprit was the query for the new items.

Further examination of the query behind the new items (both on initial page load and the 31 day AJAX request) indicated that the problem was a join. Last year, one of my first tasks had been to add the facility to record a track for any media item when it was visited. This was accomplished using a polymorphic relationship. While Zend 1 doesn’t have the kind of out-of-the-box support for polymorphic relationships that Laravel has, it’s possible to fake it so I created a tracks table whose columns included trackable_id for the primary key of the tracked object, trackable_type for its class, and user_id for the ID of the user who visited it. Now, I was using that same table to determine the number of times each item had been viewed by joining it on each of the media items, which was the first time it was being read for anything other than a report generated in the admin, and performance was dog slow.

Once I’d established that removing that join from the query removed the performance issue, then it became apparent I was going to need to add an index to the tracks table. The table had got fairly large (low hundreds of thousands), so it had a lot to sort through. As the join used the trackable_id field to join onto the items added, that seemed like a good candidate, so I added the index there.

The results were dramatic, to put it mildly. The initial page load time dropped from 4.44s to 1.29s - around a third of the previous amount. For the AJAX request to fetch the last 31 day’s new items, the results were even more impressive - the loading time dropped from 22.44s to 1.61s. Overall, figuring out which part of the query was causing the poor performance and resolving it took about ten minutes, and resulted in a staggering improvement.

If you don’t have a particularly strong theoretical background with relational databases, knowledge of indices can fall by the wayside somewhat. However, as you can see from this example, if you have a particularly slow query, then adding an index can make a staggering difference, so it’s really worth taking the time to understand a bit more about indices and when they can be useful.

20th February 2019 5:25 pm

Searching Content With Fuse.js

Search is a problem I’m currently taking a big interest in. The legacy project I maintain has an utterly abominable search facility, one that I’m eager to replace with something like Elasticsearch. But smaller sites that are too small for Elasticsearch to be worth the bother can still benefit from having a decent search implementation. Despite some recent improvements, relational databases aren’t generally that good a fit for search because they don’t really understand the concept of relevance - you can’t easily order something by how good a match it is, and your database may not deal with fuzzy matching well.

I’m currently working on a small flat-file CMS as a personal project. It’s built with PHP, but it’s intended to be as simple as possible, with no database, no caching service, and certainly no search service, so it needs something small and simple, but still effective for search.

In the past I’ve used Lunr.js on my own site, and it works very well for this use case. However, it’s problematic for this case as the index needs to be generated in Javascript on the server side, and adding Node.js to the stack for a flat-file PHP CMS is not really an option. What I needed was something where I could generate the index in any language I chose, load it via AJAX, and search it on the client side. I recently happened to stumble across Fuse.js, which was pretty much exactly what I was after.

Suppose we have the following index:

[
{
"title":"About me",
"path":"about/"
},
{
"title":"Meet the team",
"path":"about/meet-the-team/"
},
{
"title":"Alice",
"path":"about/meet-the-team/alice/"
},
{
"title":"Bob",
"path":"about/meet-the-team/bob/"
},
{
"title":"Chris",
"path":"about/meet-the-team/chris/"
},
{
"title":"Home",
"path":"index/"
}
]

This index can be generated in any way you see fit. In this case, the page content is stored in Markdown files with YAML front matter, so I wrote a Symfony console command which gets all the Markdown files in the content folder, parses them to get the titles, and retrieves the path. You could also retrieve other items in front matter such as categories or tags, and the page content, and include that in the index. The data then gets converted to JSON and saved to the index file. As you can see, there’s nothing special about this JSON - these two fields happen to be the ones I’ve chosen.

Now we can load the JSON file via AJAX, and pass it to a new Fuse instance. You can search the index using the .search() method, as shown below:

import Fuse from 'fuse.js';
window.$ = window.jQuery = require('jquery');
$(document).ready(function () {
window.$.getJSON('/storage/index.json', function (response) {
const fuse = new Fuse(response, {
keys: ['title'],
shouldSort: true
});
$('#search').on('keyup', function () {
let result = fuse.search($(this).val());
// Output it
let resultdiv = $('ul.searchresults');
if (result.length === 0) {
// Hide results
resultdiv.hide();
} else {
// Show results
resultdiv.empty();
for (let item in result.slice(0,4)) {
let searchitem = '<li><a href="/' + result[item].path + '">' + result[item].title + '</a></li>';
resultdiv.append(searchitem);
}
resultdiv.show();
}
});
});
});

The really great thing about Fuse.js is that it can search just about any JSON content, making it extremely flexible. For a site with a MySQL database, you could generate the JSON from one or more tables in the database, cache it in Redis or Memcached indefinitely until such time as the content changes again, and only regenerate it then, making for an extremely efficient client-side search that doesn’t need to hit the database during normal operation. Or you could generate it from static files, as in this example. It also means the backend language is not an issue, since you can easily generate the JSON file in PHP, Javascript, Python or any other language.

As you can see, it’s pretty straightforward to use Fuse.js to create a working search field out of the box, but the website lists a number of options allowing you to customise the search for your particular use case, and I’d recommend looking through these if you’re planning on using it on a project.

Recent Posts

Skipping Environment Specific Phpunit Tests

Powering Up Git Bisect With the Run Command

Writing Golden Master Tests for Laravel Applications

How Much Difference Does Adding An Index to a Database Table Make?

Searching Content With Fuse.js

About me

I'm a web and mobile app developer based in Norfolk. My skillset includes Python, PHP and Javascript, and I have extensive experience working with CodeIgniter, Laravel, Zend Framework, Django, Phonegap and React.js.