Matthew Daly's Blog

I'm a web developer in Norfolk. This is my blog...

19th September 2015 7:42 pm

A Quick and Easy Varnish Primer

As I mentioned in an earlier post, I recently had the occasion to use Varnish to improve the performance of a website that otherwise would have been unreliable and unusably slow due to WordPress making an excessive number of queries. The difference it made was nothing short of staggering, and I’m not exaggerating when I say it saved the day. I now use Ansible for provisioning new WordPress sites, and Varnish is now a standard part of my WordPress site setup playbook.

However, Varnish can be quite fiddly to configure, and it was something of a baptism of fire for me to learn how to configure it appropriately for this use case. I did make a few mistakes that caused problems down the line, so I thought I’d share the details of how I got it working for that particular site.

What is Varnish?

From the website:

Varnish Cache is a web application accelerator also known as a caching HTTP reverse proxy. You install it in front of any server that speaks HTTP and configure it to cache the contents. Varnish Cache is really, really fast. It typically speeds up delivery with a factor of 300 - 1000x, depending on your architecture.

In other words, you run it on the usual HTTP or HTTPS port, move your usual web server to a different port, and configure it, and it will cache web pages so they can be served more quickly to subsequent visitors.

Be warned - Varnish is not something where you can generally stick with the default settings. The default behaviour does make a lot of sense, but in practice almost no-one will be able to get away with leaving the configuration unchanged.

Installing Varnish

If you’re using Debian or a derivative such as Ubuntu, Varnish is available via apt-get:

$ sudo apt-get install varnish

You may also want to install the documentation:

$ sudo apt-get install varnish-doc

If you’re using Apache I’d also recommend installing libapache2-mod-rpaf and enabling it with sudo a2enmod rpaf - without this, Apache will log all incoming requests as coming from the same server.

I’m assuming you already have a normal web server installed. I’ll assume you’re using Apache, but it shouldn’t be hard to adapt these instructions to work with Nginx. I’m also assuming that the site you want to use Varnish for is a WordPress site with WooCommerce and W3 Total Cache installed. However, this is only for example purposes. If you want to use Varnish for a different web app, you’ll need to plan your caching strategy around that web app yourself.

Please also note that this is using Varnish 4.0, which is the version available with Debian Jessie. If you’re using an older operating system, you may have Varnish 3.0 in the repositories - be warned, the configuration language changed in Varnish 4.0, so the examples here will not work with older versions of Varnish.

By default, Varnish runs on port 6081, which is fine for testing it out, but once you want to go live it’s not what you want. When it’s time to go live, you’ll need to open up /etc/default/varnish and edit the value of DAEMON_OPTS to something like this:

DAEMON_OPTS="-a :80 \
-T localhost:6082 \
-f /etc/varnish/default.vcl \
-S /etc/varnish/secret \
-s malloc,256m"

Note that the -a flag represents the port Varnish is running on.

If you’re using an operating system that uses systemd, such as Debian Jessie, this alone won’t be sufficient. Create a new file at /etc/systemd/system/varnish.service and enter the following:

[Unit]
Description=Varnish HTTP accelerator
[Service]
Type=forking
LimitNOFILE=131072
LimitMEMLOCK=82000
ExecStartPre=/usr/sbin/varnishd -C -f /etc/varnish/default.vcl
ExecStart=/usr/sbin/varnishd -a :80 -T localhost:6082 -f /etc/varnish/default.vcl -S /etc/varnish/secret -s malloc,256m
ExecReload=/usr/share/varnish/reload-vcl
[Install]
WantedBy=multi-user.target

Next, we need to move our web server to a different port. We’ll use port 8080. Replace the contents of /etc/apache2/ports.conf with this:

# If you just change the port or add more ports here, you will likely also
# have to change the VirtualHost statement in
# /etc/apache2/sites-enabled/000-default
# This is also true if you have upgraded from before 2.2.9-3 (i.e. from
# Debian etch). See /usr/share/doc/apache2.2-common/NEWS.Debian.gz and
# README.Debian.gz
NameVirtualHost *:8080
Listen 8080
<IfModule mod_ssl.c>
# If you add NameVirtualHost *:443 here, you will also have to change
# the VirtualHost statement in /etc/apache2/sites-available/default-ssl
# to <VirtualHost *:443>
# Server Name Indication for SSL named virtual hosts is currently not
# supported by MSIE on Windows XP.
Listen 443
</IfModule>
<IfModule mod_gnutls.c>
Listen 443
</IfModule>

You’ll also need to change the ports for the individual site files under /etc/apache2/sites-available, as in this example:

<VirtualHost *:8080>
ServerAdmin webmaster@localhost
DocumentRoot /var/www
<Directory />
Options FollowSymLinks
AllowOverride All
</Directory>
<Directory /var/www/>
Options FollowSymLinks MultiViews
AllowOverride All
Order allow,deny
allow from all
</Directory>
ScriptAlias /cgi-bin/ /usr/lib/cgi-bin/
<Directory "/usr/lib/cgi-bin">
AllowOverride None
Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch
Order allow,deny
Allow from all
</Directory>
ErrorLog ${APACHE_LOG_DIR}/error.log
# Possible values include: debug, info, notice, warn, error, crit,
# alert, emerg.
LogLevel warn
CustomLog ${APACHE_LOG_DIR}/access.log combined
</VirtualHost>

Writing our VCL file

Next, we come to our Varnish configuration proper, which resides at /etc/varnish/default.vcl. The vcl stands for Varnish Configuration Language, and it has a syntax somewhat reminiscent of C.

The default behaviour for Varnish is as follows:

  • It does not cache requests that contain cookie or authorization headers
  • It does not cache requests which the backend HTTP server indicates should not be cached
  • It will only cache GET and HEAD requests

This behaviour is unlikely to meet your needs. We’ll therefore work through the Varnish config file I wrote for this WordPress site in the hope that it will teach you enough to adapt it to your own needs.

vcl 4.0;
backend default {
.host = "127.0.0.1";
.port = "8080";
}
acl purge {
"127.0.0.1";
"localhost";
}
sub vcl_recv {
# Never cache PUT, PATCH, DELETE or POST requests
if (req.method == "PUT" || req.method == "PATCH" || req.method == "DELETE" || req.method == "POST") {
return (pass);
}
# Never cache cart, account, checkout or addons
if (req.url ~ "^/(cart|my-account|checkout|addons)") {
return (pass);
}
# Never cache adding to cart
if ( req.url ~ "\?add-to-cart=" ) {
return (pass);
}
# Never cache admin or login
if ( req.url ~ "^/wp-(admin|login|cron)" ) {
return (pass);
}
# Never cache WooCommerce API
if ( req.url ~ "wc-api" ) {
return (pass);
}
# Remove has_js and CloudFlare/Google Analytics __* cookies and statcounter is_unique
set req.http.Cookie = regsuball(req.http.Cookie, "(^|;\s*)(_[_a-z]+|has_js|is_unique)=[^;]*", "");
# Remove a ";" prefix, if present.
set req.http.Cookie = regsub(req.http.Cookie, "^;\s*", "");
# Remove the wp-settings-1 cookie
set req.http.Cookie = regsuball(req.http.Cookie, "wp-settings-1=[^;]+(; )?", "");
# Remove the wp-settings-time-1 cookie
set req.http.Cookie = regsuball(req.http.Cookie, "wp-settings-time-1=[^;]+(; )?"
, "");
# Remove the wp test cookie
set req.http.Cookie = regsuball(req.http.Cookie, "wordpress_test_cookie=[^;]+(; )?", "");
# Static content unique to the theme can be cached (so no user uploaded images)
# The reason I don't take the wp-content/uploads is because of cache size on bigger blogs
# that would fill up with all those files getting pushed into cache
if (req.url ~ "wp-content/themes/" && req.url ~ "\.(css|js|png|gif|jp(e)?g)") {
unset req.http.cookie;
}
# Even if no cookies are present, I don't want my "uploads" to be cached due to their potential size
if (req.url ~ "/wp-content/uploads/") {
return (pass);
}
# any pages with captchas need to be excluded
if (req.url ~ "^/contact/")
{
return(pass);
}
# Check the cookies for wordpress-specific items
if (req.http.Cookie ~ "wordpress_" || req.http.Cookie ~ "comment_") {
# A wordpress specific cookie has been set
return (pass);
}
# allow PURGE from localhost
if (req.method == "PURGE") {
if (!client.ip ~ purge) {
return(synth(405, "Not allowed."));
}
return (purge);
}
# Force lookup if the request is a no-cache request from the client
if (req.http.Cache-Control ~ "no-cache") {
return (pass);
}
# Try a cache-lookup
return (hash);
}
sub vcl_backend_response {
set beresp.grace = 5m;
}

Let’s take a closer look at the first part of the config:

vcl 4.0;
backend default {
.host = "127.0.0.1";
.port = "8080";
}

Here we define that we’re using version 4.0 of VCL, and that the host to use as a back end is port 8080 on the same server. If your normal HTTP server is running on a different port, you will need to set it here. Also, note that you can use a different host as the backend.

acl purge {
"127.0.0.1";
"localhost";
}

We also set which hosts can trigger a purge of the cache, namely localhost and 127.0.0.1. The web app hosted on the server can then make an HTTP PURGE request to a given path, which will clear that path from the cache. In our case, W3 Total Cache supports this - if it’s a custom web app, you’ll need to implement this functionality yourself to clear the cache when new content is added.

Next, we start the vcl_recv subroutine. This is where we define our rules for deciding whether or not to serve content from the cache. Let’s look at our first rule:

sub vcl_recv {
# Never cache PUT, PATCH, DELETE or POST requests
if (req.method == "PUT" || req.method == "PATCH" || req.method == "DELETE" || req.method == "POST") {
return (pass);
}

Here, we declare that we should never cache any PUT, PATCH, DELETE or POST requests, on the basis that these change the state of the application. This ensures that things like contact forms will work as expected.

Note that we’re getting the value of req.method to determine the HTTP verb used. The req object has many other properties we’ll see being used.

# Never cache cart, account, checkout or addons
if (req.url ~ "^/(cart|my-account|checkout|addons)") {
return (pass);
}
# Never cache adding to cart
if ( req.url ~ "\?add-to-cart=" ) {
return (pass);
}
# Never cache admin or login
if ( req.url ~ "^/wp-(admin|login|cron)" ) {
return (pass);
}
# Never cache WooCommerce API
if ( req.url ~ "wc-api" ) {
return (pass);
}

Next, we define a series of regular expressions, and if the URL (represented by req.url) matches that regex, then the request is passed straight through to Apache without Varnish getting involved. In this case, we never want to cache the following sections:

  • The shopping cart, checkout, addons page or account page
  • The Add to cart button
  • The WordPress admin and login screen, and cron requests
  • The WooCommerce API

You’ll need to consider which parts of your site must always serve the latest content and which don’t need everything to be fully up to date. Typically admin areas any anything interactive must not be cached, while the front page is usually fine.

# Remove has_js and CloudFlare/Google Analytics __* cookies and statcounter is_unique
set req.http.Cookie = regsuball(req.http.Cookie, "(^|;\s*)(_[_a-z]+|has_js|is_unique)=[^;]*", "");
# Remove a ";" prefix, if present.
set req.http.Cookie = regsub(req.http.Cookie, "^;\s*", "");
# Remove the wp-settings-1 cookie
set req.http.Cookie = regsuball(req.http.Cookie, "wp-settings-1=[^;]+(; )?", "");
# Remove the wp-settings-time-1 cookie
set req.http.Cookie = regsuball(req.http.Cookie, "wp-settings-time-1=[^;]+(; )?"
, "");
# Remove the wp test cookie
set req.http.Cookie = regsuball(req.http.Cookie, "wordpress_test_cookie=[^;]+(; )?", "");

Cookies, even ones set on the client side such as those for Google Analytics, can prevent content from being cached. To prevent this, you need to configure Varnish to discard these cookies before passing them on to Apache. In this case, we want to exclude Google Analytics and various WordPress cookies.

# Static content unique to the theme can be cached (so no user uploaded images)
if (req.url ~ "wp-content/themes/" && req.url ~ "\.(css|js|png|gif|jp(e)?g)") {
unset req.http.cookie;
}

Here we allow static content that’s part of the site theme to be cached since that doesn’t change often, so we unset the cookies for that request.

# Even if no cookies are present, I don't want my "uploads" to be cached due to their potential size
if (req.url ~ "/wp-content/uploads/") {
return (pass);
}

Here we prevent any user-uploaded content from being cached, since that can change often.

# any pages with captchas need to be excluded
if (req.url ~ "^/contact/")
{
return(pass);
}

Captchas must obviously never be cached since that will break them. In this case, we assume that the contact form has a captcha, so it gets excluded from the cache.

# Check the cookies for wordpress-specific items
if (req.http.Cookie ~ "wordpress_" || req.http.Cookie ~ "comment_") {
# A wordpress specific cookie has been set
return (pass);
}

Here we check for remaining WordPress-specific cookies. These would indicate that a user is signed in, in which case we may want to serve them all the latest content rather than displaying content from the cache.

# allow PURGE from localhost
if (req.method == "PURGE") {
if (!client.ip ~ purge) {
return(synth(405, "Not allowed."));
}
return (purge);
}

Remember where we allowed the local server to clear the cache? This section actually carries out the purge when it receives a request from an authorised client.

# Force lookup if the request is a no-cache request from the client
if (req.http.Cache-Control ~ "no-cache") {
return (pass);
}

Here we check to see if the Cache-Control HTTP header is set to no-cache. If so, we pass it straight through to Apache.

# Try a cache-lookup
return (hash);
}

This is the last rule under vcl_recv, because it only reaches this point if the request has got past all the other rules. It tries to fetch the page from the cache. If the page is not in the cache, it passes it on to Apache and will cache the response.

sub vcl_backend_response {
set beresp.grace = 5m;
}

This is where we set how long responses are cached for. Here we’ve set it to 5 minutes.

With that done, we should be ready to restart Varnish and Apache. If you are using an operating system with systemd, then the following commands should restart Apache and Varnish:

$ sudo systemctl reload apache2.service
$ sudo systemctl reload varnish.service

For those not yet using systemd, try this instead:

$ sudo service apache2 restart
$ sudo service varnish restart

If you then visit your site and inspect the HTTP headers using your browser’s dev tools, you’ll notice the new HTTP header X-Varnish in the response. This tells you that Varnish is up and running. If you make sure you’re logged out, you should hopefully see that if you load a page, and then load it again, the second response is noticeably quicker.

Installing and configuring Varnish is a relatively quick and easy way of helping your website scale to be able to serve many more users, and if the site becomes popular all of a sudden, it can make a huge difference as to whether the site can stand up to the load or not. If you need more information on how to configure Varnish for your own needs, I recommend consulting the excellent documentation.

22nd August 2015 7:32 pm

When You Should Not Use Wordpress

I must admit, I’ve had a rather bad experience with WordPress recently. The site in question was an e-commerce site, built with WordPress and WooCommerce. In development, we originally put the site on shared hosting, but after a while the hosting company told us off because it was using too much database space, so we moved to a VPS earlier than we normally would. With the benefit of hindsight, we probably should have seen that as the first warning sign.

Then, once the site was up and running on the VPS, it got slower and slower, and eventually the server was killing MySQL off because it was using too many resources. I decided to install a benchmarking plugin and investigate why it was so slow. On loading the home page, it became obvious why the site was so slow - there were in excess of 300 queries on the home page. Looking elsewhere, some other pages were even worse, with one making over 1,000 queries!

At this point, I was practically hyperventilating. If I had written a web app that made that many queries on one page from scratch, I’d be seriously considering whether I was cut out for this industry. With an off-the-shelf CMS, you do have to accept some degree of bloat as a trade-off for quicker development time, but these numbers beggar belief.

I was able to mitigate this to some extent. First, I cut down the number of products shown on individual pages and audited the installed plugins, removing ones we could do without. This still left a lot more queries than I liked.

The next step was to enable caching. I installed Memcached and Varnish (incidentally, if you haven’t used Varnish before, you should check it out - it can make a huge difference for slow sites). I then installed and configured W3 Total Cache to work with them. This didn’t solve the fundamental problem of the initial page loads being too database-intensive, but it did mean that the result was cached for some time afterwards, making things easier on subsequent users.

This still wasn’t enough, however. The admin was still very slow, and often crashed. I actually wound up having to write a shell script that would check to see if MySQL was running and restart it if it wasn’t, and set up a cron job to run it every minute, just to ensure I wasn’t having to restart it myself. The issue was only really dealt with once we upped the specs on the VPS from 1GB RAM and 1 core to 3GB RAM and 2 cores, which should really have been overkill for something like WordPress.

As it turned out, the issue wasn’t exactly helped by the fact that someone had been making an unusually persistent attempt to brute-force wp-login.php. I was able to mitigate this by password-protecting it in the .htaccess file and adding some custom rules to fail2ban. But the fundamental problem remained that the resources used by WordPress to load a single page were grossly excessive.

Since then, we’ve continued to have some difficulties with it. There are some rather arcane criteria for calculating the shipping costs, and implementing them has been a real uphill struggle. We’ve also had to deal with breakages in the theme when updating WooCommerce, and other painful issues. It feels at times like the site will never be “done done”.

Now, I’ve had some issues with WordPress before, but this was by far the nastiest I’d ever seen, and it made me think very hard about when we should and should not consider WordPress as a solution. In hindsight, it would have been much easier to use Laravel to build the site from scratch - it would have made for a much leaner, more efficient site, updating the templates would have been a breeze, and implementing additional functionality would have been straightforward.

NB: I’m trying hard to make sure this is NOT one of those “WordPress sucks” blog posts. I’ll admit that I agree with many of the points from a lot of those, and I abandoned WordPress for my own site a long time ago in favour of a static site generator, but there are times when it is appropriate to use it. What I’m trying to do here is to help others avoid making the mistakes we did recently by giving some advice on when you should and should not use WordPress. Of course, your mileage may vary.

Why was WordPress inappropriate here?

With the benefit of hindsight, I can say that WordPress was definitely not the right solution in this case, and I will be advising against using it in similar circumstances. But why was it inappropriate?

  • Less flexible than rolling a custom solution - While the ecosystem of plugins and themes make it possible to use WordPress for a lot of use cases outside the core functionality of the platform, those plugins and themes aren’t infinitely flexible. If you want to do something one way and the plugin you’re using doesn’t support that, you’re out of luck unless you can fork the plugin or write a new one.
  • Dependence on third party plugins - While we were working on the site, WooCommerce made some changes that broke the theme we were using. We were using a child theme, but updating the parent theme alone didn’t fix it - we had to then apply some of the changes to the child theme as well, which was extremely fiddly. As a result, we’re now very wary about updating plugins and themes. Yet we don’t dare put it off too long, because in my experience attempts to break into WordPress are common, and if you fail to install an upgrade that fixes a vulnerability in good time, you can easily find yourself getting a phone call about a site having been hacked (as I did in December last year).
  • Poor performance - This is a big one, and I have therefore broken it down further:
    • Loading styling from the database - Many of the high end, customisable themes have large numbers of configuration options that can be used to style the site. The downside of these is that it creates additional queries to the database to fetch that data. Unless you have some form of caching in place, that data is loaded for every single request to the front end, generating a significant number of additional queries. You can mitigate this by rolling your own custom WordPress theme for the site, however.
    • Too many queries - My experience has been that as a general rule of thumb, it’s much quicker to make a smaller number of more complex queries to a database than to make a larger number of simple queries. If you build a custom web app, you will always know exactly what data you want to retrieve on a particular page and through careful use of joins, can retrieve exactly the data you need with as few queries as possible. Being a generic solution, WordPress doesn’t know exactly what data you need on any one page, and so may fetch the data using an excessive number of queries. It may also fetch data you don’t actually need.
    • Suboptimal database layout - The database schema for WordPress was originally created with a blog in mind, and may not always be optimal for your particular use case.
    • Caching is not a silver bullet - You can do a lot to improve performance by installing Memcached and Varnish, and configuring a caching plugin to work with them. However, this doesn’t solve the problem of the excessive number of queries, it only mitigates the effects somewhat. Not everything can be cached, and the expensive queries will still have to be run at some point. Caching only increases the time between the queries. Also, configuring Varnish in particular can be something of a black art, and it’s easy to miss something and find out some functionality or other hasn’t been working.

WordPress has a lot of technical limitations and deficiencies from a programmer’s point of view. For all that, it works, it’s easy to set up, and there’s a wide variety of plugins and themes available, so it’s often an appropriate choice. While the performance is poorer than I would like, the harsh truth is that often it doesn’t matter - if your site isn’t serving a huge amount of page requests, a few extra queries don’t actually make all that much difference (within reason, of course). My concern is that use of WordPress when it’s entirely inappropriate is widespread.

Is WordPress being overused?

Archer - WordPress? The Dane Cook of content management systems?

I suspect I’m running the risk of being branded a hipster for saying this (“Now it’s popular, you hate WordPress…”), but the fact that WordPress is widespread and popular does not mean that it’s the best solution for your project. Nor does the fact that it’s technically possible to use it for your project.

A few years ago, I built a now-defunct site and mobile app for a client that monitored web pages, or product prices on web pages, for changes, and notified the user when a change occurred. It was built using CodeIgniter 2, and had an integrated blog. At one point, the client was unhappy because it wasn’t built with WordPress, believing that this was the reason why few people were signing up. To use WordPress for this project would have involved building the additional functionality, including the API for the mobile app, as a plugin, which would have slowed down development considerably - in my experience it’s generally much harder to build something as a WordPress plugin than using an MVC framework due to the lack of separation of concerns, which makes the code base more confusing.

This is a good example of the alarming trend I’ve noticed in the last few years whereby a large number of people seem to be under the mistaken impression that WordPress is some kind of all-singing, all-dancing general purpose solution for building websites. I suspect that the reason for this may be that WordPress is commonplace enough that people outside of the web industry have often heard of it, and therefore they often ask for it since it’s what they’ve heard of, not knowing whether or not it’s actually appropriate for their needs. What isn’t always apparent to non-developers is that it’s often considerably easier for a developer to implement the core functionality of WordPress using a modern MVC framework than it is for them to implement the other functionality using WordPress, and as the functionality is being built with your exact use case in mind, the user interface is often more straightforward than the WordPress admin. Also, the WordPress privilege system can make it difficult for you to limit the user to just the functionality you want them to have, resulting in a situation where either you give the users a potentially dangerous level of access, or force them to contact you to make certain changes, making more work for you.

I’ve heard plenty of people say things like “WordPress is a framework” and “A competent developer can build anything with WordPress”. These claims are utter hogwash. A competent developer is smart enough to recognise that WordPress is not a one-size fits all solution and it’s not always appropriate to use it - you can easily spend more time trying to get it to do something off the beaten track than it would take to build that functionality from scratch. I think the way that Automattic are trying to promote WordPress as an application framework is a really bad idea - trying to use it for this is much more cumbersome than using a modern PHP framework like Laravel.

Even if you ignore the technical deficiencies of WordPress, it is too opinionated to be a good solution for use as a framework, and as such you’ll spend a lot of time trying to work around the existing implementations of existing functionality when they don’t quite meet your requirements.

Conclusion

For all its flaws, WordPress is very useful. It’s generally a good choice for blogs, brochure-style sites, and small e-commerce solutions where the client is not too fussy about the details of how it works. For virtually every other situation, I plan on looking elsewhere in future.

2nd August 2015 5:58 pm

Testing Django Views in Isolation

One thing you may hear said often about test-driven development is that as far as possible, you should test everything in isolation. However, it’s not always immediately clear how you actually go about doing this. In Django, it’s fairly easy to get your head around testing models in isolation because they’re single objects that you can just create, save, and then check their attributes. Forms are also quite easy to test, because you can just set the parameters with the appropriate values and check that the validation works as expected. With views, it’s much harder to imagine how you’d go about testing them in isolation, and often people just settle for writing higher-level functional tests instead. While functional tests are important, they’re also slower than unit tests, which makes it less likely they’ll be run often. So I thought I’d show you a quick and simple example of testing a Django view in isolation.

One of the little projects I’ve written in the past to help get my head around certain aspects of Django is a code-snippet sharing Django application which I named Snippetr. The index route of this application is a form for submitting a brand-new code snippet and I’ll show you how we would write a test for that.

Testing a GET request

Before now, you may well have used the Django test client to test views. That is fine for higher-level tests, but if you want to test a view in isolation, it’s no use because it emulates a real web server and all of the middleware and authentication, which we want to keep out of the way. Instead, we need to use RequestFactory:

from django.test import RequestFactory

RequestFactory actually implements a subset of the functionality of the Django test client, so while it will feel somewhat familiar, it won’t have all the same functionality. For instance, it doesn’t support middleware, so rather than logging in using the test client’s login() method, you instead attach a user directly to the request, as in this example:

request = RequestFactory()
request.user = user

You have to specify the URL in the request, but you also have to explicitly pass the request through to the view you want to test, which can be a bit confusing. Let’s see it in context. First of all, we want to write a test for making a GET request:

class SnippetCreateViewTest(TestCase):
"""
Test the snippet create view
"""
def setUp(self):
self.user = UserFactory()
self.factory = RequestFactory()
def test_get(self):
"""
Test GET requests
"""
request = self.factory.get(reverse('snippet_create'))
request.user = self.user
response = SnippetCreateView.as_view()(request)
self.assertEqual(response.status_code, 200)
self.assertEqual(response.context_data['user'], self.user)
self.assertEqual(response.context_data['request'], request)

First of all, we define a setUp() method that creates a user and an instance of RequestFactory() for use in the test. Note that I’m using Factory Boy to define UserFactory in order to make it easier to work with. Also, if you have more than one view to test, you should create a base class containing the setUp() method that your view tests inherit from.

Next, we have our test for making a GET request. Note that we’re using the reverse() method to get the route for the view named snippet_create. You’ll need to import this as follows if you’re not yet using it:

from django.core.urlresolvers import reverse

We then attach our user object to the request manually, and fetch the response by passing the request to the view as follows:

    response = SnippetCreateView.as_view()(request)

Note that this is the syntax used for class-based views - we call the view’s as_view() method. For a function-based view, the syntax is a bit simpler:

    response = my_view(request)

We then test our response as usual. In this case, the view adds some additional context data, and we check that we can access that, as well as checking the status code.

Testing a POST request

Testing a POST request is a little more challenging in this case because submitting the form will create a new Snippet object and we don’t want to interact with the model layer at all if we can help it. We want to test the view in isolation, partly because it will be faster, and partly because it’s a good idea. We can do this by mocking the Snippet model’s save() method.

To do so, we need to import two things from the mock library. If you’re using Python 3.4 or later, then mock is part of unittest as unittest.mock. Otherwise, it’s a separate library you need to install with pip. Here’s the import statement for those on Python 3.4 or later:

from unittest.mock import patch, MagicMock

And for those on earlier versions:

from mock import patch, MagicMock

Now, our test for the POST requests should look like this:

@patch('snippets.models.Snippet.save', MagicMock(name="save"))
def test_post(self):
"""
Test post requests
"""
# Create the request
data = {
'title': 'My snippet',
'content': 'This is my snippet'
}
request = self.factory.post(reverse('snippet_create'), data)
request.user = self.user
# Get the response
response = SnippetCreateView.as_view()(request)
self.assertEqual(response.status_code, 302)
# Check save was called
self.assertTrue(Snippet.save.called)
self.assertEqual(Snippet.save.call_count, 1)

Note first of all the following line:

    @patch('snippets.models.Snippet.save', MagicMock(name="save"))

Here we’re saying that in this test, when the save() method of the Snippet model is called, it should instead call a mocked version, which lacks the functionality and only registers that it has been called and a few details about it.

Next, we put together the data to be passed through and create a POST request for it. As before, we attach the user to the request. We then pass the request through in the same way as for the GET request. We also check that the response code was 302, meaning that the user would be redirected elsewhere after the form was submitted correctly.

Finally, we assert that Snippet.save.called is true. called is a Boolean value, representing whether the method was called or not. We also check the value of Snippet.save.call_count, which is a count of the number of times the method was called - here we check that it’s set to 1.

As you can see, while the request factory is a little harder than the Django test client to figure out, it’s not too difficult once you get the hang of it. By combining it with judicious use of mock, you can easily test your views in isolation, and without having to interact with the database or set up any middleware, these tests will be much faster than those using the Django test client.

1st August 2015 6:26 pm

Exploring the Hstorefield in Django 1.8

One of the most interesting additions in Django 1.8 is the new Postgres-specific fields. I started using PostgreSQL in preference to MySQL for Django apps last year, and so I was interested in the additional functionality they offer.

By far the biggest deal out of all of these was the new HStoreField type. PostgreSQL added a JSON data type a little while back, and HStoreField allows you to use that field type. This is a really big deal because it allows you to store arbitrary data as JSON and query it. Previously, you could of course just store data as JSON in a text field, but that lacked the same ability to query it. This gives you many of the advantages of a NoSQL document database such as MongoDB in a relational database. For instance, you can store different products with different data about them, and crucially, query them by that data. Previously, the only way to add arbitrary product data and be able to query it was to have it in a separate table, and it was often cumbersome to join them when fetching multiple products.

Let’s see a working example. We might be building an online store where products can have all kinds of arbitrary data stored about them. One product might be a plastic box, and you’d need to list the capacity as an additional attribute. Another product might be a pair of shoes, which have no capacity, but do have a size. It might be difficult to model this otherwise, but HStoreField is perfect for this kind of data.

First, let’s set up our database. I’ll assume you already have PostgreSQL up and running via your package manager. First, we need to create our database:

$ createdb djangostore

Next, we need to create a new user for this database with superuser access:

$ createuser store -s -P

You’ll be prompted for a password - I’m assuming this will just be password here. Next, we need to connect to PostgreSQL using the psql utility:

$ psql djangostore -U store -W

You’ll be prompted for your new password. Next, run the following command:

# CREATE EXTENSION hstore;
# GRANT ALL PRIVILEGES ON DATABASE djangostore TO store;
# \q

The first command installs the HStore extension. Next we make sure our new user has the privileges required on the new database:

We’ve now created our database and a user to interact with it. Next, we’ll set up our Django install:

$ cd Projects
$ mkdir djangostore
$ cd djangostore
$ pyvenv venv
$ source venv/bin/activate
$ pip install Django psycopg2 ipdb
$ django-admin.py startproject djangostore
$ python manage.py startapp store

I’m assuming here that you’re using Python 3.4. On Ubuntu, getting it working is a bit more involved.

Next, open up djangostore/settings.py and amend INSTALLED_APPS to include the new app and the PostgreSQL-specific functionality:

INSTALLED_APPS = (
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'django.contrib.postgres',
'store',
)

You’ll also need to configure the database settings:

DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql_psycopg2',
'NAME': 'djangostore',
'USER': 'store',
'PASSWORD': 'password',
'HOST': 'localhost',
'PORT': '',
}
}

We need to create an empty migration to use HStoreField:

$ python manage.py makemigrations --empty store

This command should create the file store/migrations/0001_initial.py. Open this up and edit it to look like this:

# -*- coding: utf-8 -*-
from __future__ import unicode_literals
from django.db import models, migrations
from django.contrib.postgres.operations import HStoreExtension
class Migration(migrations.Migration):
dependencies = [
]
operations = [
HStoreExtension(),
]

This will make sure the HStore extension is installed. Next, let’s run these migrations:

$ python manage.py migrate
Operations to perform:
Synchronize unmigrated apps: messages, staticfiles, postgres
Apply all migrations: sessions, store, admin, auth, contenttypes
Synchronizing apps without migrations:
Creating tables...
Running deferred SQL...
Installing custom SQL...
Running migrations:
Rendering model states... DONE
Applying contenttypes.0001_initial... OK
Applying auth.0001_initial... OK
Applying admin.0001_initial... OK
Applying contenttypes.0002_remove_content_type_name... OK
Applying auth.0002_alter_permission_name_max_length... OK
Applying auth.0003_alter_user_email_max_length... OK
Applying auth.0004_alter_user_username_opts... OK
Applying auth.0005_alter_user_last_login_null... OK
Applying auth.0006_require_contenttypes_0002... OK
Applying sessions.0001_initial... OK
Applying store.0001_initial... OK

Now, we’re ready to start creating our Product model. Open up store/models.py and amend it as follows:

from django.contrib.postgres.fields import HStoreField
from django.db import models
# Create your models here.
class Product(models.Model):
created_at = models.DateTimeField(auto_now_add=True)
updated_at = models.DateTimeField(auto_now=True)
name = models.CharField(max_length=200)
description = models.TextField()
price = models.FloatField()
attributes = HStoreField()
def __str__(self):
return self.name

Note that HStoreField is not part of the standard group of model fields, and needs to be imported from the Postgres-specific fields module. Next, let’s create and run our migrations:

$ python manage.py makemigrations
$ python manage.py migrate

We should now have a Product model where the attributes field can be any arbitrary data we want. Note that we installed ipdb earlier - if you’re not familiar with it, this is an improved Python debugger, and also pulls in ipython, an improved Python shell, which Django will use if available.

Open up the Django shell:

$ python manage.py shell

Then, import the Product model:

from store.models import Product

Let’s create our first product - a plastic storage box:

box = Product()
box.name = 'Box'
box.description = 'A big box'
box.price = 5.99
box.attributes = { 'capacity': '1L', "colour": "blue"}
box.save()

If we take a look, we can see that the attributes can be returned as a Python dictionary:

In [12]: Product.objects.all()[0].attributes
Out[12]: {'capacity': '1L', 'colour': 'blue'}

We can easily retrieve single values:

In [15]: Product.objects.all()[0].attributes['capacity']
Out[15]: '1L'

Let’s add a second product - a mop:

mop = Product()
mop.name = 'Mop'
mop.description = 'A mop'
mop.price = 12.99
mop.attributes = { 'colour': "red" }
mop.save()

Now, we can filter out only the red items easily:

In [2]: Product.objects.filter(attributes__contains={'colour': 'red'})
Out[2]: [<Product: Mop>]

Here we search for items where the colour attribute is set to red, and we only get back the mop. Let’s do the same for blue items:

In [3]: Product.objects.filter(attributes__contains={'colour': 'blue'})
Out[3]: [<Product: Box>]

Here it returns the box. Let’s now search for an item with a capacity of 1L:

In [4]: Product.objects.filter(attributes__contains={'capacity': '1L'})
Out[4]: [<Product: Box>]

Only the box has the capacity attribute at all, and it’s the only one returned. Let’s see what happens when we search for an item with a capacity of 2L, which we know is not present:

In [5]: Product.objects.filter(attributes__contains={'capacity': '2L'})
Out[5]: []

No items returned, as expected. Let’s look for any item with the capacity attribute:

In [6]: Product.objects.filter(attributes__has_key='capacity')
Out[6]: [<Product: Box>]

Again, it only returns the box, as that’s the only one where that key exists. Note that all of this is tightly integrated with the existing API for the Django ORM. Let’s add a third product, a food hamper:

In [3]: hamper = Product()
In [4]: hamper.name = 'Hamper'
In [5]: hamper.description = 'A food hamper'
In [6]: hamper.price = 19.99
In [7]: hamper.attributes = {
...: 'contents': 'ham, cheese, coffee',
...: 'size': '90cmx60cm'
...: }
In [8]: hamper.save()

Next, let’s return only those items that have a contents attribute that contains cheese:

In [9]: Product.objects.filter(attributes__contents__contains='cheese')
Out[9]: [<Product: Hamper>]

As you can see, the HStoreField type allows for quite complex queries, while allowing you to set arbitrary values for an individual item. This overcomes one of the biggest issues with relational databases - the inability to set arbitrary data. Previously, you might have to work around it in some fashion, such as creating a table containing attributes for individual items which had to be joined on the product table. This is very cumbersome and difficult to use, especially when you wanted to work with more than one product. With this approach, it’s easy to filter products by multiple values in the HStore field, and you get back all of the attributes at once, as in this example:

In [13]: Product.objects.filter(attributes__capacity='1L', attributes__colour='blue')
Out[13]: [<Product: Box>]
In [14]: Product.objects.filter(attributes__capacity='1L', attributes__colour='blue')[0].attributes
Out[14]: {'capacity': '1L', 'colour': 'blue'}

Similar functionality is coming in a future version of MySQL, so it wouldn’t be entirely surprising to see HStoreField become more generally available in Django in the near future. For now, this functionality is extremely useful and makes for a good reason to ditch MySQL in favour of PostgreSQL for your future Django apps.

21st July 2015 9:15 pm

New Laptop

For a while now it’s been obvious that I needed a new laptop. My main workhorse for a while has been a 2008 MacBook, but I’m not really a fan of Mac OS X and it was stuck on Snow Leopard, so it was somewhat behind the times. It was also painfully slow by modern standards - regenerating this site took a couple of minutes. I had two other reasonably modern laptops, but one was too big and cumbersome, while the other was a Dell Mini, which isn’t really fast enough for a developer. When I last bought a laptop, I wasn’t even a developer, so it was long past time I got a more suitable machine.

I therefore took the plunge and ordered a new Dell XPS 13 Developer Edition, which arrived today. It’s an absolutely beautiful machine, and it’s extremely light. It’s also a lot faster than any other machine I own. The screen is exceptionally sharp, and setting it up was nice and easy.

After an hour or so with this machine, I’m already really happy with it. We’ll have to see whether I still think so after a few months using it.

Recent Posts

Input Components With the Usestate and Useeffect Hooks in React

Flexible Data Types With the JSON Field

Storing Wordpress Configuration in Environment Variables

Using Mix Versioning Outside Laravel

Setting Private Properties in Tests

About me

I'm a web and mobile app developer based in Norfolk. My skillset includes Python, PHP and Javascript, and I have extensive experience working with CodeIgniter, Laravel, Zend Framework, Django, Phonegap and React.js.