Exploring the HStoreField in Django 1.8

Published by at 1st August 2015 5:26 pm

One of the most interesting additions in Django 1.8 is the new Postgres-specific fields. I started using PostgreSQL in preference to MySQL for Django apps last year, and so I was interested in the additional functionality they offer.

By far the biggest deal out of all of these was the new HStoreField type. PostgreSQL added a JSON data type a little while back, and HStoreField allows you to use that field type. This is a really big deal because it allows you to store arbitrary data as JSON and query it. Previously, you could of course just store data as JSON in a text field, but that lacked the same ability to query it. This gives you many of the advantages of a NoSQL document database such as MongoDB in a relational database. For instance, you can store different products with different data about them, and crucially, query them by that data. Previously, the only way to add arbitrary product data and be able to query it was to have it in a separate table, and it was often cumbersome to join them when fetching multiple products.

Let's see a working example. We might be building an online store where products can have all kinds of arbitrary data stored about them. One product might be a plastic box, and you'd need to list the capacity as an additional attribute. Another product might be a pair of shoes, which have no capacity, but do have a size. It might be difficult to model this otherwise, but HStoreField is perfect for this kind of data.

First, let's set up our database. I'll assume you already have PostgreSQL up and running via your package manager. First, we need to create our database:

$ createdb djangostore

Next, we need to create a new user for this database with superuser access:

$ createuser store -s -P

You'll be prompted for a password - I'm assuming this will just be password here. Next, we need to connect to PostgreSQL using the psql utility:

$ psql djangostore -U store -W

You'll be prompted for your new password. Next, run the following command:

1# CREATE EXTENSION hstore;
2# GRANT ALL PRIVILEGES ON DATABASE djangostore TO store;
3# \q

The first command installs the HStore extension. Next we make sure our new user has the privileges required on the new database:

We've now created our database and a user to interact with it. Next, we'll set up our Django install:

1$ cd Projects
2$ mkdir djangostore
3$ cd djangostore
4$ pyvenv venv
5$ source venv/bin/activate
6$ pip install Django psycopg2 ipdb
7$ django-admin.py startproject djangostore
8$ python manage.py startapp store

I'm assuming here that you're using Python 3.4. On Ubuntu, getting it working is a bit more involved.

Next, open up djangostore/settings.py and amend INSTALLED_APPS to include the new app and the PostgreSQL-specific functionality:

1INSTALLED_APPS = (
2 'django.contrib.admin',
3 'django.contrib.auth',
4 'django.contrib.contenttypes',
5 'django.contrib.sessions',
6 'django.contrib.messages',
7 'django.contrib.staticfiles',
8 'django.contrib.postgres',
9 'store',
10)

You'll also need to configure the database settings:

1DATABASES = {
2 'default': {
3 'ENGINE': 'django.db.backends.postgresql_psycopg2',
4 'NAME': 'djangostore',
5 'USER': 'store',
6 'PASSWORD': 'password',
7 'HOST': 'localhost',
8 'PORT': '',
9 }
10}

We need to create an empty migration to use HStoreField:

$ python manage.py makemigrations --empty store

This command should create the file store/migrations/0001_initial.py. Open this up and edit it to look like this:

1# -*- coding: utf-8 -*-
2from __future__ import unicode_literals
3
4from django.db import models, migrations
5from django.contrib.postgres.operations import HStoreExtension
6
7
8class Migration(migrations.Migration):
9
10 dependencies = [
11 ]
12
13 operations = [
14 HStoreExtension(),
15 ]

This will make sure the HStore extension is installed. Next, let's run these migrations:

1$ python manage.py migrate
2Operations to perform:
3 Synchronize unmigrated apps: messages, staticfiles, postgres
4 Apply all migrations: sessions, store, admin, auth, contenttypes
5Synchronizing apps without migrations:
6 Creating tables...
7 Running deferred SQL...
8 Installing custom SQL...
9Running migrations:
10 Rendering model states... DONE
11 Applying contenttypes.0001_initial... OK
12 Applying auth.0001_initial... OK
13 Applying admin.0001_initial... OK
14 Applying contenttypes.0002_remove_content_type_name... OK
15 Applying auth.0002_alter_permission_name_max_length... OK
16 Applying auth.0003_alter_user_email_max_length... OK
17 Applying auth.0004_alter_user_username_opts... OK
18 Applying auth.0005_alter_user_last_login_null... OK
19 Applying auth.0006_require_contenttypes_0002... OK
20 Applying sessions.0001_initial... OK
21 Applying store.0001_initial... OK

Now, we're ready to start creating our Product model. Open up store/models.py and amend it as follows:

1from django.contrib.postgres.fields import HStoreField
2from django.db import models
3
4# Create your models here.
5class Product(models.Model):
6 created_at = models.DateTimeField(auto_now_add=True)
7 updated_at = models.DateTimeField(auto_now=True)
8 name = models.CharField(max_length=200)
9 description = models.TextField()
10 price = models.FloatField()
11 attributes = HStoreField()
12
13 def __str__(self):
14 return self.name

Note that HStoreField is not part of the standard group of model fields, and needs to be imported from the Postgres-specific fields module. Next, let's create and run our migrations:

1$ python manage.py makemigrations
2$ python manage.py migrate

We should now have a Product model where the attributes field can be any arbitrary data we want. Note that we installed ipdb earlier - if you're not familiar with it, this is an improved Python debugger, and also pulls in ipython, an improved Python shell, which Django will use if available.

Open up the Django shell:

$ python manage.py shell

Then, import the Product model:

from store.models import Product

Let's create our first product - a plastic storage box:

1box = Product()
2box.name = 'Box'
3box.description = 'A big box'
4box.price = 5.99
5box.attributes = { 'capacity': '1L', "colour": "blue"}
6box.save()

If we take a look, we can see that the attributes can be returned as a Python dictionary:

1In [12]: Product.objects.all()[0].attributes
2Out[12]: {'capacity': '1L', 'colour': 'blue'}

We can easily retrieve single values:

1In [15]: Product.objects.all()[0].attributes['capacity']
2Out[15]: '1L'

Let's add a second product - a mop:

1mop = Product()
2mop.name = 'Mop'
3mop.description = 'A mop'
4mop.price = 12.99
5mop.attributes = { 'colour': "red" }
6mop.save()

Now, we can filter out only the red items easily:

1In [2]: Product.objects.filter(attributes__contains={'colour': 'red'})
2Out[2]: [<Product: Mop>]

Here we search for items where the colour attribute is set to red, and we only get back the mop. Let's do the same for blue items:

1In [3]: Product.objects.filter(attributes__contains={'colour': 'blue'})
2Out[3]: [<Product: Box>]

Here it returns the box. Let's now search for an item with a capacity of 1L:

1In [4]: Product.objects.filter(attributes__contains={'capacity': '1L'})
2Out[4]: [<Product: Box>]

Only the box has the capacity attribute at all, and it's the only one returned. Let's see what happens when we search for an item with a capacity of 2L, which we know is not present:

1In [5]: Product.objects.filter(attributes__contains={'capacity': '2L'})
2Out[5]: []

No items returned, as expected. Let's look for any item with the capacity attribute:

1In [6]: Product.objects.filter(attributes__has_key='capacity')
2Out[6]: [<Product: Box>]

Again, it only returns the box, as that's the only one where that key exists. Note that all of this is tightly integrated with the existing API for the Django ORM. Let's add a third product, a food hamper:

1In [3]: hamper = Product()
2
3In [4]: hamper.name = 'Hamper'
4
5In [5]: hamper.description = 'A food hamper'
6
7In [6]: hamper.price = 19.99
8
9In [7]: hamper.attributes = {
10 ...: 'contents': 'ham, cheese, coffee',
11 ...: 'size': '90cmx60cm'
12 ...: }
13
14In [8]: hamper.save()

Next, let's return only those items that have a contents attribute that contains cheese:

1In [9]: Product.objects.filter(attributes__contents__contains='cheese')
2Out[9]: [<Product: Hamper>]

As you can see, the HStoreField type allows for quite complex queries, while allowing you to set arbitrary values for an individual item. This overcomes one of the biggest issues with relational databases - the inability to set arbitrary data. Previously, you might have to work around it in some fashion, such as creating a table containing attributes for individual items which had to be joined on the product table. This is very cumbersome and difficult to use, especially when you wanted to work with more than one product. With this approach, it's easy to filter products by multiple values in the HStore field, and you get back all of the attributes at once, as in this example:

1In [13]: Product.objects.filter(attributes__capacity='1L', attributes__colour='blue')
2Out[13]: [<Product: Box>]
3In [14]: Product.objects.filter(attributes__capacity='1L', attributes__colour='blue')[0].attributes
4Out[14]: {'capacity': '1L', 'colour': 'blue'}

Similar functionality is coming in a future version of MySQL, so it wouldn't be entirely surprising to see HStoreField become more generally available in Django in the near future. For now, this functionality is extremely useful and makes for a good reason to ditch MySQL in favour of PostgreSQL for your future Django apps.