Matthew Daly's Blog

I'm a web developer in Norfolk. This is my blog...

27th January 2019 11:10 pm

Understanding Query Objects

The project I’ve been maintaining for the last year has inherited a rather dubious database structure that would currently be very difficult to refactor, which also makes many queries more convoluted than they should be. At present, I’m involved in building a whole new home page, which has necessitated adding some new queries. Since some of these involve carrying out unions between several similar tables (that should have been one table, grr…), they can involve some quite large chunks for each query.

As a result, it’s made sense to break those queries down further. Since Zend 1 doesn’t have anything analogous to scopes in Eloquent, I don’t really have an easy way to break these queries up in the models (and I’m trying to get the query logic out of the models at present anyway), so I opted to make them into query objects instead, which is a pattern I hadn’t used before (but probably should have).

A query object is pretty much what it says on the tin - it’s a PHP object that executes a single, very specific query. This may seem like overkill, but it’s only really useful for the most complex and convoluted of queries. It can accept parameters, as you’d expect, and some parts of the query may be optional based on that, but fundamentally it should build and run only one single query.

In this post I’ll go through how you might create one, how it relates to the repository pattern, and when to create one.

Creating a query object class

I’m a big fan of the __invoke() magic method in PHP. For the uninitiated, it lets you instantiate the class, and then use it in the same way you would a function, making it very useful for callbacks. This also brings some other advantages:

  • Unlike with a function, you can create private methods to do other parts of the work, making it easier to understand the main method.
  • It can have a constructor, and can therefore both accept dependencies via the constructor, and be instantiated via dependency injection, simplifying setup and testing when compared to using a callback.
  • Since __invoke() is an innate part of the PHP language, it makes more sense for classes that have a single responsibility to use that method name to do that, rather than picking something like handle() or run().

As a result, my query objects generally use the __invoke() method to trigger the query.

Since Zend 1 is no longer supported, I won’t bother displaying how I’d write the query in that specific context. I have yet to use this pattern with Laravel, but if I did, it would look something like this:

<?php
namespace App\Queries;
use Illuminate\Database\DatabaseManager;
final class DashboardItems
{
protected $db;
public function __construct(DatabaseManager $db)
{
$this->db = $db;
}
public function __invoke(int $days = 7)
{
return $this->fooTable()
->union($this->barTable())
->whereRaw('start_date >= (NOW() - INTERVAL ? DAY)', [$days]);
->get();
}
private function fooTable()
{
return $this->db->table('foo')
->where('type', '=', 'fooType');
}
private function barTable(int $days)
{
return $this->db->table('bar')
->where('type', '=', 'barType');
}
}

Note that we break each one of the tables we want to perform a UNION on into a private method. This is probably the biggest advantage of query objects - it lets you break particularly unwieldy queries up into logical steps, making them more readable. You could do this by adding private methods on a repository class too, but I’d be reluctant to add private methods to a repository that were only used in one query - to my mind, a query object is a better home for that.

What about repositories?

I regularly use the repository pattern in my code bases, whether that’s for Laravel projects or the current Zend 1-based legacy project. It’s an ongoing effort to refactor it so that all the queries are called from repository classes, leaving the models to act as containers for the data. So how do query objects fit in here?

It’s important to note that while a repository represents all queries relating to a table, a query object represents only a single query, and so the repository should still be the place where the query is called from. However, the repository should just defer the actual querying to the query object. The relevant parts of the application structure for my current application look a bit like this:

└── app
├── Queries
│ └── DashboardItems.php
└── Repositories
└── DashboardRepository.php

And the repository might call the query object as follows:

<?php
namespace App\Repositories;
use App\Queries\DashboardItems;
final class DashboardRepository
{
public static function dashboardItems(int $days = 7)
{
$query = new DashboardItems;
return $query($days);
}
}

At present my repositories all use static methods as I’m still in the process of migrating the queries over to the repository classes. That also means I can’t easily use dependency injection. For a Laravel application, a similar call might look like this:

<?php
namespace App\Repositories;
use App\Queries\DashboardItems;
final class DashboardRepository
{
protected $dashboardQuery;
public function __construct(DashboardItems $dashboardQuery)
{
$this->dashboardQuery = $dashboardQuery;
}
public function dashboardItems(int $days = 7)
{
return $this->dashboardQuery($days);
}
}

The only real difference is that we can instantiate the query object out of the container, simplifying setup.

When to use query objects

I think it probably goes without saying, but it should be a rare query that actually needs to be implemented as a query object, especially if you’re using an ORM like Eloquent that provides features like scopes, and as yet I only have two using this pattern, as well as two others that were implemented as “reporter” classes, but could be query objects instead. So far, my experience has been that the sort of queries that are large enough to be worth considering include:

  • Queries that generate reports, particularly if they have various options
  • Queries that use unions, as in the above example, since it makes sense to use a private method to fetch each table
  • Queries with multiple complex joins

Smaller queries will typically fit happily inside a single method in your repository classes. If that’s the case, then they can live there without trouble. However, if you have a query that’s becoming too big to fit inside a single method, rather than adding private methods to your repository class, it may make more sense to refactor it out into a query object in its own right. You can still call it via the same method on your repository class, but the repository can just defer to the query object. As I usually use decorators to cache the responses from my repository classes anyway, then it makes sense to stick with this approach to keep caching consistent too.

Query objects only really offer any value for particularly large queries. However, they can be invaluable in those circumstances. By enabling you to break those big queries up into a series of steps, they help make them easier to understand.

13th January 2019 6:50 pm

Writing a Custom Sniff for PHP Codesniffer

I’ve recently come around to the idea that in PHP all classes should be final by default, and have started doing so as a matter of course. However, when you start doing something like this it’s easy to miss a few files that haven’t been updated, or forget to do it, so I wanted a way to detect PHP classes that are not set as either abstract or final, and if possible, set them as final automatically. I’ve mentioned before that I use PHP CodeSniffer extensively, and that has the capability to both find and resolve deviations from a coding style, so last night I started looking into the possibility of creating a coding standard for this. It took a little work to understand how to do this so I thought I’d use this sniff as a simple example.

The first part is to set out the directory structure. There’s a very specific layout you have to follow for PHP CodeSniffer:

  • The folder for the standard must have the name of the standard, and be in the source folder set by Composer (in this case, src/AbstractOrFinalClassesOnly.
  • This folder must contain a ruleset.xml file defining the name and description of the standard, and any other required content.
  • Any defined sniffs must be in a Sniffs folder.

The ruleset.xml file was fairly simple in this case, as this is a very simple standard:

<?xml version="1.0"?>
<ruleset name="AbstractOrFinalClassesOnly">
<description>Checks all classes are marked as either abstract or final.</description>
</ruleset>

The sniff is intended to do the following:

  • Check all classes have either the final keyword or the abstract keyword set
  • When running the fixer, make all classes without the abstract keyword final

First of all, our class must implement the interface PHP_CodeSniffer\Sniffs\Sniff, which requires the following methods:

public function register(): array;
public function process(File $file, $position): void;

Note that File here is an instance of PHP_CodeSniffer\Files\File. The first method registers the code the sniff should operate on. Here we’re only interested in classes, so we return an array containing T_CLASS. This is defined in the list of parser tokens used by PHP, and represents classes and objects:

public function register(): array
{
return [T_CLASS];
}

For the process() method, we receive two arguments, the file itself, and the position. We need to keep a record of the tokens we check for, so we do so in a private property:

private $tokens = [
T_ABSTRACT,
T_FINAL,
];

Then, we need to find the error:

if (!$file->findPrevious($this->tokens, $position)) {
$file->addFixableError(
'All classes should be declared using either the "abstract" or "final" keyword',
$position - 1,
self::class
);
}

We use $file to get the token before class, and pass the $tokens property as a list of acceptable values. If the preceding token is not either abstract or final, we add a fixable error. The first argument is the string error message, the second is the location, and the third is the class of the sniff that has failed.

That will catch the issue, but won’t actually fix it. To do that, we need to get the fixer from the file object, and call its addContent() method to add the final keyword. We amend process() to extract the fixer, add it as a property, and then call the fix() method when we come across a fixable error:

public function process(File $file, $position): void
{
$this->fixer = $file->fixer;
$this->position = $position;
if (!$file->findPrevious($this->tokens, $position)) {
$file->addFixableError(
'All classes should be declared using either the "abstract" or "final" keyword',
$position - 1,
self::class
);
$this->fix();
}
}

Then we define the fix() method:

private function fix(): void
{
$this->fixer->addContent($this->position - 1, 'final ');
}

Here’s the finished class:

<?php declare(strict_types=1);
namespace Matthewbdaly\AbstractOrFinalClassesOnly\Sniffs;
use PHP_CodeSniffer\Sniffs\Sniff;
use PHP_CodeSniffer\Files\File;
/**
* Sniff for catching classes not marked as abstract or final
*/
final class AbstractOrFinalSniff implements Sniff
{
private $tokens = [
T_ABSTRACT,
T_FINAL,
];
private $fixer;
private $position;
public function register(): array
{
return [T_CLASS];
}
public function process(File $file, $position): void
{
$this->fixer = $file->fixer;
$this->position = $position;
if (!$file->findPrevious($this->tokens, $position)) {
$file->addFixableError(
'All classes should be declared using either the "abstract" or "final" keyword',
$position - 1,
self::class
);
$this->fix();
}
}
private function fix(): void
{
$this->fixer->addContent($this->position - 1, 'final ');
}
}

I’ve made the resulting standard available via Github.

This is a bit rough and ready and I’ll probably refactor it a bit when I have time. In addition, it’s not quite displaying the behaviour I want as it should, since ideally it should only be looking for the abstract and final keywords in classes that implement an interface. However, it’s proven fairly easy to create this sniff, except for the fact I had to go rooting around various tutorials that weren’t all that clear. Hopefully this example is a bit simpler and easier to follow.

3rd January 2019 11:55 pm

You Don't Need That Module Package

Lately I’ve seen a number of Laravel packages being posted on places like Reddit that offer ways to make your project more modular by letting you break their classes out of the usual structure and place them in a separate folder called something like packages/ or modules/. However, these packages are completely redundant, and it requires very little work to achieve the same thing with Composer alone. In addition, much of it is not specific to Laravel and can also be applied to any other framework that uses Composer.

There are two main approaches I’m aware of - keeping it in a single project, and moving the modules to separate Composer packages.

Single project

Suppose we have a brand new Laravel project with the namespace left as the default App. This is what the autoload section of the composer.json file will look like:

"autoload": {
"psr-4": {
"App\\": "app/"
},
"classmap": [
"database/seeds",
"database/factories"
]
},

Composer allows for numerous ways to autoload classes and you can add additional namespaces as you wish. Probably the best approach is to use PSR-4 autoloading, as in this example:

"autoload": {
"psr-4": {
"App\\": "app/",
"Packages\\": "packages"
},
"classmap": [
"database/seeds",
"database/factories"
]
},

Now, if you put the model Post.php in the folder, packages/Blog/Models/, then this will map to the namespace Packages\Blog\Models\Post, and if you set the namespace to this in the file, and run composer dump-autoload, you should be able to import it from that namespace without trouble. As with the App\ namespace, because it’s using PSR-4 you’re only specifying the top-level namespace and the folders and files underneath have to mirror the namespace, so for instance, Packages\Foo\Bar maps to packages/Foo/Bar.php. If for some reason PSR-4 autoloading doesn’t map well to what you want to do, then there are other methods you can use - refer to the relevant section of the Composer documentation for the other methods available.

The controllers are the toughest part, because by default Laravel’s routing works on the assumption that the controllers are all under the App\Http\Controllers namespace, so you can shorten the namespace used. There are two ways around this I’m aware of. One is to specify the full namespace when referencing each controller:

Route::get('/', '\App\Modules\Http\Controllers\FooController@index');

The other option is to update the RouteServiceProvider.php‘s namespace property. It defaults to this:

protected $namespace = 'App\Http\Controllers';

If there’s a more convenient namespace you want to place all your controllers under, then you can replace this, and it will become the default namespace applied in your route files.

Other application components such as migrations, routes and views can be loaded from a service provider very easily. Just create a service provider for your module, register it in config/app.php, and set up the boot() method to load whichever components you want from the appropriate place, as in this example:

$this->loadMigrationsFrom(__DIR__.'/../database/migrations');
$this->loadRoutesFrom(__DIR__.'/../routes.php');
$this->loadViewsFrom(__DIR__.'/../views', 'comments');

Separate packages

The above approach works particularly well in the initial stages of a project, when you may need to jump around a lot to edit different parts of the project. However, later on, once many parts of the project have stabilised, it may make more sense to pull the modules out into separate repositories and use Composer to pull them in as dependencies, using its support for private repositories. I’ve also often taken this approach right from the start without issue.

This approach has a number of advantages. It makes it easier to reuse parts of the project in other projects if need be. Also, if you put your tests in the packages containing the components they test, it means that rather than running one monolithic test suite for the whole project, you can instead run each module’s tests each time you change it, and limit the test suite of the main project to those integration and acceptance tests that verify the whole thing, along with any unit tests for code that remains in the main repository, resulting in quicker test runs.

Don’t get me wrong, making your code more modular is definitely a good thing and I’m wholly in favour of it. However, it only takes a little knowledge of Composer to be able to achieve this without any third party package at all, which is good because you’re no longer dependent on a package that may at any time fall behind the curve or be abandoned.

2nd January 2019 11:00 pm

Why Bad Code Is Bad

This may sound a little trite, but why is it bad to write bad code?

Suppose you’re a client, or a line manager for a team of developers. You work with developers regularly, but when they say that a code base is bad, what are the consequences of that, and how can you justify spending time and money to fix it? I’ve often heard the refrain “If it works, it doesn’t matter”, which may have a grain of truth, but is somewhat disingenuous. In this post, I’ll explain some of the consequences when your code base is bad. It can be hard to put a definitive price tag on the costs associated with delivering bad code, but this should give some idea of the sort of issues you should take into account.

Bad code kills developer productivity

Bad code is harder to understand, navigate and reason about than good code. Developers are not superhuman, and we can only hold so much in our heads at one time, which is why many of the principles behind a clean and maintainable code base can essentially be boiled down to “break it into bite-sized chunks so developers can understand each one in isolation before seeing how they fit together”.

If one particular class or function gets too big and starts doing too much, it quickly becomes very, very hard to get your head around what that code does. Developers typically have to build a mental model of how a class or function works before they can use it effectively, and the smaller and simpler you can keep each unit of code, the less time and effort it takes to do so. The mark of a skilled developer is not the complexity of their code bases, but their simplicity - they’ve learned to make their code as small, simple, and readable as possible. A clean and well laid-out code base makes it easy for developers to get into the mental state called “flow” that is significantly more productive.

In addition, if an application doesn’t conform to accepted conventions in some way, such as using inappropriate HTTP verbs (eg GET to change the state of something), then quite apart from the fact that it won’t play well with proxy servers, it imposes an additional mental load on developers by forcing them to drop a reasonable set of assumptions about how the application works. If the application used the correct HTTP verbs, experienced developers would know without being told that to create a new report, you’d send a POST request to the reports API endpoint.

During the initial stages of a project, functionality can be delivered quite quickly, but if the code quality is poor, then over time developer velocity can decrease. Ensuring a higher quality code base helps to maintain velocity at a consistent level as it gets bigger. This also means estimates will be more accurate, so if you quote a given number of hours for a feature, you’re more likely to deliver inside that number of hours.

Bad code is bad for developer welfare

A code base that’s repetitive, badly organised, overly complex and hard to read is a recipe for stressed developers, making burnout more likely. If a developer suffers burnout, their productivity will drop substantially.

In the longer term, if developer burnout isn’t managed correctly, it could easily increase developer turnover as stressed developers quit. It’s also harder to recruit new developers if they’re faced with the prospect of dealing with a messy, stressful code base.

Bad code hampers your ability to pivot

If the quality of your code base is poor, it can mean that if functionality needs to be changed or added, then more work is involved. Repetitive code can mean something has to be updated in more than one place, and if it becomes too onerous, it can make it too time-consuming or expensive to justify the changes.

Bad code may threaten the long-term viability of your project

One thing that is certain in our industry is that things change. Libraries, languages and frameworks are constantly being updated, and sometimes there will be potentially breaking changes to some of these. On occasion, a library or framework will be discontinued, making it necessary to migrate to a replacement.

Bad code is often tightly coupled to a particular framework or library, and sometimes even to a particular version, making it harder to migrate if it becomes necessary. If a project was written with a language or framework version that had a serious issue, and was too tightly coupled to migrate to a newer version, it might be too risky to keep it running, or it might be necessary to run an insecure application in spite of the risks it posed.

Bad code is more brittle

A poor code base will break, a lot, and often in ways that are clearly visible to end users. Duplicate code makes it easy to miss cases where something needs to be updated in more than one place, and if the code base lacks tests, a serious error may not be noticed for a long time, especially if it’s something comparatively subtle.

Bad code is hard, if not impossible, to write automated tests for

If a particular class or function does too much, it becomes much harder to write automated tests for it because there are more variables going in and more expected outcomes. A sufficiently messy code base may only really be testable by automating the browser, which tends to be very slow and brittle, making test-driven development impractical. Manual testing is no substitute for a proper suite of automated tests, since it’s slower, less consistent and not repeatable in the same way, and it’s only sufficient by itself for the most trivial of web apps.

Bad code is often insecure

A bad code base may inadvertently expose user’s data, or be at risk from all kinds of attacks such as cross-site scripting and SQL injection attacks that can also potentially expose too much data.

For any business with EU-based users, the risks of exposing user’s data are very serious. Under the GDPR, there’s a potential fine of up to €20 million, or 4% of turnover. That’s potentially an existential risk for many companies.

In addition, a bad code base is often more vulnerable to denial-of-service attacks. If it has poor or no caching, excessive queries, or inefficient queries, then every time a page loads it will carry out more queries than a more optimised site would. Given the same server specs, the inefficient site will be overwhelmed quicker than the efficient one.

Summary

It’s all too easy to focus solely on delivering a working product and not worry about the quality of the code base when time spent cleaning it up doesn’t pay the bills, and it can be hard to justify the cost of cleaning it up later to clients.

There are tools you can use to help keep up code quality, such as linters and static analysers, and it’s never a bad idea to investigate the ones available for the language(s) you work in. For best results they should form part of your continuous integration pipeline, so you can monitor changes over time and prompt developers who check in problematic code to fix the issues. Code reviews are another good way to avoid bad code, since they allow developers to find problematic code and offer more elegant solutions.

I’m not suggesting that a code base that has a few warts has no value, or that you should sink huge amounts of developer time into refactoring messy code when money is tight, as commercial concerns do have to come first. But a bad code base does cause serious issues that have financial implications, and it’s prudent to recognise the problems it could cause, and take action to resolve them, or better yet, prevent them occurring in the first place.

27th December 2018 6:37 pm

Improving Search in Vim and Neovim With FZF and Ripgrep

A while back I was asked to make some changes to a legacy project that was still using Subversion. This was troublesome because my usual method of searching in files is to use Tim Pope’s Fugitive Vim plugin as a frontend for git grep, and so it would be harder than usual to navigate the project. I therefore started looking around for alternative search systems, and one combination that kept on coming up was FZF and Ripgrep, so I decided to give them a try. FZF is a fuzzy file finder, written in Go, while Ripgrep is an extremely fast grep, written in Rust, that respects gitignore rules by default. Both have proven so useful they’re now a permanent part of my setup.

On Mac OS X, both are available via Homebrew, so they’re easy to install. On Ubuntu, Ripgrep is in the repositories, but FZF isn’t, so it was necessary to install it in my home directory. There’s a Vim plugin for FZF and Ripgrep integration, which, since I use vim-plugged, I could install by adding the following to my init.vim, then running PlugUpdate from Neovim:

" Search
Plug '~/.fzf'
Plug 'junegunn/fzf.vim'

The plugin exposes a number of commands that are very useful, and I’ll go through the ones I use most often:

  • :Files is for finding files by name. I used to use Ctrl-P for this, but FZF is so much better and quicker that I ditched Ctrl-P almost immediately (though you can map :Files to it if you want to use the same key).
  • :Rg uses Ripgrep to search for content in files, so you can search for a specific string. This makes it an excellent replacement for the Ggrep command from Fugitive.
  • :Snippets works with Ultisnips to provide a filterable list of available snippets you can insert, making it much more useful
  • :Tags allows you to filter and search tags in the project as a whole
  • :BTags does the same, but solely in the current buffer
  • :Lines allows you to find lines in the project and navigate to them.
  • :BLines does the same, but solely in the current buffer.

In addition to being useful in Neovim, FZF can also be helpful in Bash. You can use Ctrl-T to find file paths, and it enhances the standard Ctrl-R history search, making it faster and more easily navigable. The performance of both is also excellent - they work very fast, even on the very large legacy project I maintain, or on slower machines, and I never find myself waiting for them to finish. Both tools have quickly become an indispensible part of my workflow.

Recent Posts

Writing Golden Master Tests for Laravel Applications

How Much Difference Does Adding An Index to a Database Table Make?

Searching Content With Fuse.js

Higher-order Components in React

Creating Your Own Dependency Injection Container in PHP

About me

I'm a web and mobile app developer based in Norfolk. My skillset includes Python, PHP and Javascript, and I have extensive experience working with CodeIgniter, Laravel, Zend Framework, Django, Phonegap and React.js.