An approach to writing golden master tests for PHP web applications

Published by at 8th October 2018 10:20 am

Apologies if some of the spelling or formatting on this post is off - I wrote it on a long train journey down to London, with sunlight at an inconvenient angle.

Recently I had to carry out some substantial changes to the legacy web app I maintain as the lion's share of my current job. The client has several channels that represent different parts of the business that would expect to see different content on the home page, and access to content is limited first by channel, and then by location. The client wanted an additional channel added. Due to bad design earlier in the application's lifetime that isn't yet practical to refactor away, each type of location has its own model, so it was necessary to add a new location model. It also had to work seamlessly, in the same way as the other location types. Unfortunately, these branch types didn't use polymorphism, and instead used large switch statements, and it wasn't practical to refactor all that away in one go. This was therefore quite a high-risk job, especially considering the paucity of tests on a legacy code base.

I'd heard of the concept of a golden master test before. If you haven't heard of it before, the idea is that it works by running a process, capturing the output, and then comparing the output of that known good version against future runs. It's very much a test of last resort since, in the context of a web app, it's potentially very brittle since it depends on the state of the application remaining the same between runs to avoid false positives. I needed a set of simple "snapshot tests", similar to how snapshot testing works with Jest, to catch unexpected breakages in a large number of pages, and this approach seemed to fit the bill. Unfortunately, I hadn't been able to find a good example of how to do this for PHP applications, so it took a while to figure out something that worked.

Here is an example base test case I used for this approach:

1<?php
2
3namespace Tests;
4
5use PHPUnit_Framework_TestCase as BaseTestCase;
6use Behat\Mink\Driver\GoutteDriver;
7use Behat\Mink\Session;
8
9class GoldenMasterTestCase extends BaseTestCase
10{
11 protected $driver;
12
13 protected $session;
14
15 protected $baseUrl = 'http://localhost:8000';
16
17 protected $snapshotDir = "tests/snapshots/";
18
19 public function setUp()
20 {
21 $this->driver = new GoutteDriver();
22 $this->session = new Session($this->driver);
23 }
24
25 public function tearDown()
26 {
27 $this->session = null;
28 $this->driver = null;
29 }
30
31 public function loginAs($username, $password)
32 {
33 $this->session->visit($this->baseUrl.'/login');
34 $page = $this->session->getPage();
35 $page->fillField("username", $username);
36 $page->fillField("password", $password);
37 $page->pressButton("Sign In");
38 return $this;
39 }
40
41 public function goto($path)
42 {
43 $this->session->visit($this->baseUrl.$path);
44 $this->assertNotEquals(404, $this->session->getStatusCode());
45 return $this;
46 }
47
48 public function saveHtml()
49 {
50 if (!$this->snapshotExists()) {
51 $this->saveSnapshot();
52 }
53 return $this;
54 }
55
56 public function assertSnapshotsMatch()
57 {
58 $path = $this->getPath();
59 $newHtml = $this->processHtml($this->getHtml());
60 $oldHtml = $this->getOldHtml();
61 $diff = "";
62 if (function_exists('xdiff_string_diff')) {
63 $diff = xdiff_string_diff($oldHtml, $newHtml);
64 }
65 $message = "The path $path does not match the snapshot\n$diff";
66 self::assertThat($newHtml == $oldHtml, self::isTrue(), $message);
67 }
68
69 protected function getHtml()
70 {
71 return $this->session->getPage()->getHtml();
72 }
73
74 protected function getPath()
75 {
76 $url = $this->session->getCurrentUrl();
77 $path = parse_url($url, PHP_URL_PATH);
78 $query = parse_url($url, PHP_URL_QUERY);
79 $frag = parse_url($url, PHP_URL_FRAGMENT);
80 return $path.$query.$frag;
81 }
82
83 protected function getEscapedPath()
84 {
85 return $this->snapshotDir.str_replace('/', '_', $this->getPath()).'.snap';
86 }
87
88 protected function snapshotExists()
89 {
90 return file_exists($this->getEscapedPath());
91 }
92
93 protected function processHtml($html)
94 {
95 return preg_replace('/<input type="hidden"[^>]+\>/i', '', $html);
96 }
97
98 protected function saveSnapshot()
99 {
100 $html = $this->processHtml($this->getHtml());
101 file_put_contents($this->getEscapedPath(), $html);
102 }
103
104 protected function getOldHtml()
105 {
106 return file_get_contents($this->getEscapedPath());
107 }
108}

Because this application is built with Zend 1 and doesn't have an easy way to get the HTML response without actually running the application, I was forced to use an actual HTTP client to fetch the content while the web server is running. I've used Mink together with Behat many times in the past, and the Goutte driver is fast and doesn't rely on Javascript, so that was the best bet for a simple way of retrieving the HTML. Had I been taking this approach with a Laravel application, I could have populated the testing database with a common set of fixtures, and passed a request object through the application and captured the response object's output rather than using an HTTP client, thereby eliminating the need to run a web server and making the tests faster and less brittle.

Another issue was CSRF handling. A CSRF token is, by definition, generated randomly each time the page is loaded, and so it broke those pages that had forms with CSRF tokens. The solution I came up with was to strip out the hidden input fields.

When each page is tested, the first step is to fetch the content of that page. The test case then checks to see if there's an existing snapshot. If not, the content is saved as a new snapshot file. Otherwise, the two snapshots are compared, and the test fails if they do not match.

Once that base test case was in place, it was then straightforward to extend it to test multiple pages. I wrote one test to check pages that did not require login, and another to check pages that did require login, and the paths for those pages were passed through using a data provider method, as shown below:

1<?php
2
3namespace Tests\GoldenMaster;
4
5use Tests\GoldenMasterTestCase;
6
7class GoldenMasterTest extends GoldenMasterTestCase
8{
9 /**
10 * @dataProvider nonAuthDataProvider
11 */
12 public function testNonAuthPages($data)
13 {
14 $this->goto($data)
15 ->saveHtml()
16 ->assertSnapshotsMatch();
17 }
18
19 public function nonAuthDataProvider()
20 {
21 return [
22 ['/login'],
23 ];
24 }
25
26 /**
27 * @dataProvider dataProvider
28 */
29 public function testPages($data)
30 {
31 $this->loginAs('foo', 'bar')
32 ->goto($data)
33 ->saveHtml()
34 ->assertSnapshotsMatch();
35 }
36
37 public function dataProvider()
38 {
39 return [
40 ['/foo'],
41 ['/bar'],
42 ];
43 }
44}

Be warned, this is not an approach I would advocate as a matter of course, and it should only ever be a last resort as an alternative to onerous manual testing for things that can't be tested in their current form. It's extremely brittle, and I've had to deal with a lot of false positives, although that would be easier if I could populate a testing database beforehand and use that as the basis of the tests. It's also very slow, with each test taking three or four seconds to run, although again this would be less of an issue if I could pass through a request object and get the response HTML directly. Nonetheless, I've found it to be a useful technique as a test of last resort for legacy applications.