Matthew Daly's Blog

I'm a web developer in Norfolk. This is my blog...

1st August 2015 6:26 pm

Exploring the Hstorefield in Django 1.8

One of the most interesting additions in Django 1.8 is the new Postgres-specific fields. I started using PostgreSQL in preference to MySQL for Django apps last year, and so I was interested in the additional functionality they offer.

By far the biggest deal out of all of these was the new HStoreField type. PostgreSQL added a JSON data type a little while back, and HStoreField allows you to use that field type. This is a really big deal because it allows you to store arbitrary data as JSON and query it. Previously, you could of course just store data as JSON in a text field, but that lacked the same ability to query it. This gives you many of the advantages of a NoSQL document database such as MongoDB in a relational database. For instance, you can store different products with different data about them, and crucially, query them by that data. Previously, the only way to add arbitrary product data and be able to query it was to have it in a separate table, and it was often cumbersome to join them when fetching multiple products.

Let’s see a working example. We might be building an online store where products can have all kinds of arbitrary data stored about them. One product might be a plastic box, and you’d need to list the capacity as an additional attribute. Another product might be a pair of shoes, which have no capacity, but do have a size. It might be difficult to model this otherwise, but HStoreField is perfect for this kind of data.

First, let’s set up our database. I’ll assume you already have PostgreSQL up and running via your package manager. First, we need to create our database:

$ createdb djangostore

Next, we need to create a new user for this database with superuser access:

$ createuser store -s -P

You’ll be prompted for a password - I’m assuming this will just be password here. Next, we need to connect to PostgreSQL using the psql utility:

$ psql djangostore -U store -W

You’ll be prompted for your new password. Next, run the following command:

# CREATE EXTENSION hstore;
# GRANT ALL PRIVILEGES ON DATABASE djangostore TO store;
# \q

The first command installs the HStore extension. Next we make sure our new user has the privileges required on the new database:

We’ve now created our database and a user to interact with it. Next, we’ll set up our Django install:

$ cd Projects
$ mkdir djangostore
$ cd djangostore
$ pyvenv venv
$ source venv/bin/activate
$ pip install Django psycopg2 ipdb
$ django-admin.py startproject djangostore
$ python manage.py startapp store

I’m assuming here that you’re using Python 3.4. On Ubuntu, getting it working is a bit more involved.

Next, open up djangostore/settings.py and amend INSTALLED_APPS to include the new app and the PostgreSQL-specific functionality:

INSTALLED_APPS = (
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'django.contrib.postgres',
'store',
)

You’ll also need to configure the database settings:

DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql_psycopg2',
'NAME': 'djangostore',
'USER': 'store',
'PASSWORD': 'password',
'HOST': 'localhost',
'PORT': '',
}
}

We need to create an empty migration to use HStoreField:

$ python manage.py makemigrations --empty store

This command should create the file store/migrations/0001_initial.py. Open this up and edit it to look like this:

# -*- coding: utf-8 -*-
from __future__ import unicode_literals
from django.db import models, migrations
from django.contrib.postgres.operations import HStoreExtension
class Migration(migrations.Migration):
dependencies = [
]
operations = [
HStoreExtension(),
]

This will make sure the HStore extension is installed. Next, let’s run these migrations:

$ python manage.py migrate
Operations to perform:
Synchronize unmigrated apps: messages, staticfiles, postgres
Apply all migrations: sessions, store, admin, auth, contenttypes
Synchronizing apps without migrations:
Creating tables...
Running deferred SQL...
Installing custom SQL...
Running migrations:
Rendering model states... DONE
Applying contenttypes.0001_initial... OK
Applying auth.0001_initial... OK
Applying admin.0001_initial... OK
Applying contenttypes.0002_remove_content_type_name... OK
Applying auth.0002_alter_permission_name_max_length... OK
Applying auth.0003_alter_user_email_max_length... OK
Applying auth.0004_alter_user_username_opts... OK
Applying auth.0005_alter_user_last_login_null... OK
Applying auth.0006_require_contenttypes_0002... OK
Applying sessions.0001_initial... OK
Applying store.0001_initial... OK

Now, we’re ready to start creating our Product model. Open up store/models.py and amend it as follows:

from django.contrib.postgres.fields import HStoreField
from django.db import models
# Create your models here.
class Product(models.Model):
created_at = models.DateTimeField(auto_now_add=True)
updated_at = models.DateTimeField(auto_now=True)
name = models.CharField(max_length=200)
description = models.TextField()
price = models.FloatField()
attributes = HStoreField()
def __str__(self):
return self.name

Note that HStoreField is not part of the standard group of model fields, and needs to be imported from the Postgres-specific fields module. Next, let’s create and run our migrations:

$ python manage.py makemigrations
$ python manage.py migrate

We should now have a Product model where the attributes field can be any arbitrary data we want. Note that we installed ipdb earlier - if you’re not familiar with it, this is an improved Python debugger, and also pulls in ipython, an improved Python shell, which Django will use if available.

Open up the Django shell:

$ python manage.py shell

Then, import the Product model:

from store.models import Product

Let’s create our first product - a plastic storage box:

box = Product()
box.name = 'Box'
box.description = 'A big box'
box.price = 5.99
box.attributes = { 'capacity': '1L', "colour": "blue"}
box.save()

If we take a look, we can see that the attributes can be returned as a Python dictionary:

In [12]: Product.objects.all()[0].attributes
Out[12]: {'capacity': '1L', 'colour': 'blue'}

We can easily retrieve single values:

In [15]: Product.objects.all()[0].attributes['capacity']
Out[15]: '1L'

Let’s add a second product - a mop:

mop = Product()
mop.name = 'Mop'
mop.description = 'A mop'
mop.price = 12.99
mop.attributes = { 'colour': "red" }
mop.save()

Now, we can filter out only the red items easily:

In [2]: Product.objects.filter(attributes__contains={'colour': 'red'})
Out[2]: [<Product: Mop>]

Here we search for items where the colour attribute is set to red, and we only get back the mop. Let’s do the same for blue items:

In [3]: Product.objects.filter(attributes__contains={'colour': 'blue'})
Out[3]: [<Product: Box>]

Here it returns the box. Let’s now search for an item with a capacity of 1L:

In [4]: Product.objects.filter(attributes__contains={'capacity': '1L'})
Out[4]: [<Product: Box>]

Only the box has the capacity attribute at all, and it’s the only one returned. Let’s see what happens when we search for an item with a capacity of 2L, which we know is not present:

In [5]: Product.objects.filter(attributes__contains={'capacity': '2L'})
Out[5]: []

No items returned, as expected. Let’s look for any item with the capacity attribute:

In [6]: Product.objects.filter(attributes__has_key='capacity')
Out[6]: [<Product: Box>]

Again, it only returns the box, as that’s the only one where that key exists. Note that all of this is tightly integrated with the existing API for the Django ORM. Let’s add a third product, a food hamper:

In [3]: hamper = Product()
In [4]: hamper.name = 'Hamper'
In [5]: hamper.description = 'A food hamper'
In [6]: hamper.price = 19.99
In [7]: hamper.attributes = {
...: 'contents': 'ham, cheese, coffee',
...: 'size': '90cmx60cm'
...: }
In [8]: hamper.save()

Next, let’s return only those items that have a contents attribute that contains cheese:

In [9]: Product.objects.filter(attributes__contents__contains='cheese')
Out[9]: [<Product: Hamper>]

As you can see, the HStoreField type allows for quite complex queries, while allowing you to set arbitrary values for an individual item. This overcomes one of the biggest issues with relational databases - the inability to set arbitrary data. Previously, you might have to work around it in some fashion, such as creating a table containing attributes for individual items which had to be joined on the product table. This is very cumbersome and difficult to use, especially when you wanted to work with more than one product. With this approach, it’s easy to filter products by multiple values in the HStore field, and you get back all of the attributes at once, as in this example:

In [13]: Product.objects.filter(attributes__capacity='1L', attributes__colour='blue')
Out[13]: [<Product: Box>]
In [14]: Product.objects.filter(attributes__capacity='1L', attributes__colour='blue')[0].attributes
Out[14]: {'capacity': '1L', 'colour': 'blue'}

Similar functionality is coming in a future version of MySQL, so it wouldn’t be entirely surprising to see HStoreField become more generally available in Django in the near future. For now, this functionality is extremely useful and makes for a good reason to ditch MySQL in favour of PostgreSQL for your future Django apps.

21st July 2015 9:15 pm

New Laptop

For a while now it’s been obvious that I needed a new laptop. My main workhorse for a while has been a 2008 MacBook, but I’m not really a fan of Mac OS X and it was stuck on Snow Leopard, so it was somewhat behind the times. It was also painfully slow by modern standards - regenerating this site took a couple of minutes. I had two other reasonably modern laptops, but one was too big and cumbersome, while the other was a Dell Mini, which isn’t really fast enough for a developer. When I last bought a laptop, I wasn’t even a developer, so it was long past time I got a more suitable machine.

I therefore took the plunge and ordered a new Dell XPS 13 Developer Edition, which arrived today. It’s an absolutely beautiful machine, and it’s extremely light. It’s also a lot faster than any other machine I own. The screen is exceptionally sharp, and setting it up was nice and easy.

After an hour or so with this machine, I’m already really happy with it. We’ll have to see whether I still think so after a few months using it.

4th July 2015 1:01 pm

Handling Images As Base64 Strings With Django REST Framework

I’m currently working on a Phonegap app that involves taking pictures and uploading them via a REST API. I’ve done this before, and I found at that time that the best way to do so was to fetch the image as a base-64 encoded string and push that up, rather than the image file itself. However, the last time I did so, I was using Tastypie to build the API, and I’ve since switched over to Django REST Framework as my API toolkit of choice.

It didn’t take long to find this gist giving details of how to do so, but it didn’t work as is, partly because I was using Python 3, and partly because the from_native method has gone as at Django REST Framework 3.0. It was, however, straightforward to adapt it to work. Here’s my solution:

import base64, uuid
from django.core.files.base import ContentFile
from rest_framework import serializers
# Custom image field - handles base 64 encoded images
class Base64ImageField(serializers.ImageField):
def to_internal_value(self, data):
if isinstance(data, str) and data.startswith('data:image'):
# base64 encoded image - decode
format, imgstr = data.split(';base64,') # format ~= data:image/X,
ext = format.split('/')[-1] # guess file extension
id = uuid.uuid4()
data = ContentFile(base64.b64decode(imgstr), name = id.urn[9:] + '.' + ext)
return super(Base64ImageField, self).to_internal_value(data)

This solution will handle both base 64 encoded strings and image files. Then, just use this field as normal.

17th June 2015 8:34 pm

Getting Django-behave and Celery to Work Together

I ran into a small issue today. I’m working on a Django app which uses Celery to handle certain tasks that don’t need to return a response within the context of the HTTP request. I also wanted to use django_behave for running BDD tests. The trouble is that both django_behave and Celery provide their own custom test runners that extend the default Django test runner, and so it looked like I might have to choose between the two.

However, it turned out that the Celery one was actually very simple, with only a handful of changes needing to be made to the default test runner to make it work with Celery. I was therefore able to create my own custom test runner that inherited from DjangoBehaveTestSuiteRunner and applied the changes necessary to get Celery working with it. Here is the test runner I wrote, which was saved as myproject/runner.py:

from django.conf import settings
from djcelery.contrib.test_runner import _set_eager
from django_behave.runner import DjangoBehaveTestSuiteRunner
class CeleryAndBehaveRunner(DjangoBehaveTestSuiteRunner):
def setup_test_environment(self, **kwargs):
_set_eager()
settings.BROKER_BACKEND = 'memory'
super(CeleryAndBehaveRunner, self).setup_test_environment(**kwargs)

To use it, you need to set the test runner in settings.py

TEST_RUNNER = 'myproject.runner.CeleryAndBehaveRunner'

Once that was done, my tests worked flawlessly with Celery, and the Behave tests ran as expected.

14th June 2015 9:29 pm

Setting Etags in Laravel 5

Although I’d prefer to use Python or Node.js, there are some times when circumstances dictate that I need to use PHP for a project at work. In the past, I used CodeIgniter, but that was through nothing more than inertia. For some time I’d been planning to switch to Laravel, largely because of the baked-in PHPUnit support, but events conspired against me - one big project that came along had a lot in common with an earlier one, so I forked it rather than starting over.

Recently I built a REST API for a mobile app, and I decided to use that to try out Laravel (if it had been available at the time, I’d have gone for Lumen instead). I was very pleased with the results - I was able to quickly put together the back end I wanted, with good test coverage, and the tinker command in particular was useful in debugging. The end result is fast and efficient, with query caching in place using Memcached to improve response times.

I also implemented a simple middleware to add ETags to HTTP responses and compare them on incoming requests, returning a 304 Not Modified status code if they are the same, which is given below:

<?php namespace App\Http\Middleware;
use Closure;
class ETagMiddleware {
/**
* Implement Etag support
*
* @param \Illuminate\Http\Request $request
* @param \Closure $next
* @return mixed
*/
public function handle($request, Closure $next)
{
// Get response
$response = $next($request);
// If this was a GET request...
if ($request->isMethod('get')) {
// Generate Etag
$etag = md5($response->getContent());
$requestEtag = str_replace('"', '', $request->getETags());
// Check to see if Etag has changed
if($requestEtag && $requestEtag[0] == $etag) {
$response->setNotModified();
}
// Set Etag
$response->setEtag($etag);
}
// Send response
return $response;
}
}

This is based on a solution for Laravel 4 by Nick Verwymeren, but implemented as Laravel 5 middleware, not a Laravel 4 filter. To use this with Laravel 5, save this as app/Http/Middleware/ETagMiddleware.php. Then add this to the $middleware array in app/Http/Kernel.php:

        'App\Http\Middleware\ETagMiddleware',

It’s quite simple to write this kind of middleware with Laravel, and using something like this is a no-brainer for most web apps considering the bandwidth it will likely save your users.

Recent Posts

How Much Difference Does Adding An Index to a Database Table Make?

Searching Content With Fuse.js

Higher-order Components in React

Creating Your Own Dependency Injection Container in PHP

Understanding Query Objects

About me

I'm a web and mobile app developer based in Norfolk. My skillset includes Python, PHP and Javascript, and I have extensive experience working with CodeIgniter, Laravel, Zend Framework, Django, Phonegap and React.js.