Understanding query objects

Published by at 27th January 2019 11:10 pm

The project I've been maintaining for the last year has inherited a rather dubious database structure that would currently be very difficult to refactor, which also makes many queries more convoluted than they should be. At present, I'm involved in building a whole new home page, which has necessitated adding some new queries. Since some of these involve carrying out unions between several similar tables (that should have been one table, grr...), they can involve some quite large chunks for each query.

As a result, it's made sense to break those queries down further. Since Zend 1 doesn't have anything analogous to scopes in Eloquent, I don't really have an easy way to break these queries up in the models (and I'm trying to get the query logic out of the models at present anyway), so I opted to make them into query objects instead, which is a pattern I hadn't used before (but probably should have).

A query object is pretty much what it says on the tin - it's a PHP object that executes a single, very specific query. This may seem like overkill, but it's only really useful for the most complex and convoluted of queries. It can accept parameters, as you'd expect, and some parts of the query may be optional based on that, but fundamentally it should build and run only one single query.

In this post I'll go through how you might create one, how it relates to the repository pattern, and when to create one.

Creating a query object class

I'm a big fan of the __invoke() magic method in PHP. For the uninitiated, it lets you instantiate the class, and then use it in the same way you would a function, making it very useful for callbacks. This also brings some other advantages:

  • Unlike with a function, you can create private methods to do other parts of the work, making it easier to understand the main method.
  • It can have a constructor, and can therefore both accept dependencies via the constructor, and be instantiated via dependency injection, simplifying setup and testing when compared to using a callback.
  • Since __invoke() is an innate part of the PHP language, it makes more sense for classes that have a single responsibility to use that method name to do that, rather than picking something like handle() or run().

As a result, my query objects generally use the __invoke() method to trigger the query.

Since Zend 1 is no longer supported, I won't bother displaying how I'd write the query in that specific context. I have yet to use this pattern with Laravel, but if I did, it would look something like this:

1<?php
2
3namespace App\Queries;
4
5use Illuminate\Database\DatabaseManager;
6
7final class DashboardItems
8{
9 protected $db;
10
11 public function __construct(DatabaseManager $db)
12 {
13 $this->db = $db;
14 }
15
16 public function __invoke(int $days = 7)
17 {
18 return $this->fooTable()
19 ->union($this->barTable())
20 ->whereRaw('start_date >= (NOW() - INTERVAL ? DAY)', [$days]);
21 ->get();
22 }
23
24 private function fooTable()
25 {
26 return $this->db->table('foo')
27 ->where('type', '=', 'fooType');
28 }
29
30 private function barTable(int $days)
31 {
32 return $this->db->table('bar')
33 ->where('type', '=', 'barType');
34 }
35}

Note that we break each one of the tables we want to perform a UNION on into a private method. This is probably the biggest advantage of query objects - it lets you break particularly unwieldy queries up into logical steps, making them more readable. You could do this by adding private methods on a repository class too, but I'd be reluctant to add private methods to a repository that were only used in one query - to my mind, a query object is a better home for that.

What about repositories?

I regularly use the repository pattern in my code bases, whether that's for Laravel projects or the current Zend 1-based legacy project. It's an ongoing effort to refactor it so that all the queries are called from repository classes, leaving the models to act as containers for the data. So how do query objects fit in here?

It's important to note that while a repository represents all queries relating to a table, a query object represents only a single query, and so the repository should still be the place where the query is called from. However, the repository should just defer the actual querying to the query object. The relevant parts of the application structure for my current application look a bit like this:

1└── app
2 ├── Queries
3 │   └── DashboardItems.php
4 └── Repositories
5 └── DashboardRepository.php

And the repository might call the query object as follows:

1<?php
2
3namespace App\Repositories;
4
5use App\Queries\DashboardItems;
6
7final class DashboardRepository
8{
9 public static function dashboardItems(int $days = 7)
10 {
11 $query = new DashboardItems;
12 return $query($days);
13 }
14}

At present my repositories all use static methods as I'm still in the process of migrating the queries over to the repository classes. That also means I can't easily use dependency injection. For a Laravel application, a similar call might look like this:

1<?php
2
3namespace App\Repositories;
4
5use App\Queries\DashboardItems;
6
7final class DashboardRepository
8{
9 protected $dashboardQuery;
10
11 public function __construct(DashboardItems $dashboardQuery)
12 {
13 $this->dashboardQuery = $dashboardQuery;
14 }
15
16 public function dashboardItems(int $days = 7)
17 {
18 return $this->dashboardQuery($days);
19 }
20}

The only real difference is that we can instantiate the query object out of the container, simplifying setup.

When to use query objects

I think it probably goes without saying, but it should be a rare query that actually needs to be implemented as a query object, especially if you're using an ORM like Eloquent that provides features like scopes, and as yet I only have two using this pattern, as well as two others that were implemented as "reporter" classes, but could be query objects instead. So far, my experience has been that the sort of queries that are large enough to be worth considering include:

  • Queries that generate reports, particularly if they have various options
  • Queries that use unions, as in the above example, since it makes sense to use a private method to fetch each table
  • Queries with multiple complex joins

Smaller queries will typically fit happily inside a single method in your repository classes. If that's the case, then they can live there without trouble. However, if you have a query that's becoming too big to fit inside a single method, rather than adding private methods to your repository class, it may make more sense to refactor it out into a query object in its own right. You can still call it via the same method on your repository class, but the repository can just defer to the query object. As I usually use decorators to cache the responses from my repository classes anyway, then it makes sense to stick with this approach to keep caching consistent too.

Query objects only really offer any value for particularly large queries. However, they can be invaluable in those circumstances. By enabling you to break those big queries up into a series of steps, they help make them easier to understand.