Exploring the HStoreField in Django 1.8
Published by Matthew Daly at 1st August 2015 5:26 pm
One of the most interesting additions in Django 1.8 is the new Postgres-specific fields. I started using PostgreSQL in preference to MySQL for Django apps last year, and so I was interested in the additional functionality they offer.
By far the biggest deal out of all of these was the new HStoreField
type. PostgreSQL added a JSON data type a little while back, and HStoreField
allows you to use that field type. This is a really big deal because it allows you to store arbitrary data as JSON and query it. Previously, you could of course just store data as JSON in a text field, but that lacked the same ability to query it. This gives you many of the advantages of a NoSQL document database such as MongoDB in a relational database. For instance, you can store different products with different data about them, and crucially, query them by that data. Previously, the only way to add arbitrary product data and be able to query it was to have it in a separate table, and it was often cumbersome to join them when fetching multiple products.
Let's see a working example. We might be building an online store where products can have all kinds of arbitrary data stored about them. One product might be a plastic box, and you'd need to list the capacity as an additional attribute. Another product might be a pair of shoes, which have no capacity, but do have a size. It might be difficult to model this otherwise, but HStoreField
is perfect for this kind of data.
First, let's set up our database. I'll assume you already have PostgreSQL up and running via your package manager. First, we need to create our database:
$ createdb djangostore
Next, we need to create a new user for this database with superuser access:
$ createuser store -s -P
You'll be prompted for a password - I'm assuming this will just be password
here. Next, we need to connect to PostgreSQL using the psql
utility:
$ psql djangostore -U store -W
You'll be prompted for your new password. Next, run the following command:
1# CREATE EXTENSION hstore;2# GRANT ALL PRIVILEGES ON DATABASE djangostore TO store;3# \q
The first command installs the HStore extension. Next we make sure our new user has the privileges required on the new database:
We've now created our database and a user to interact with it. Next, we'll set up our Django install:
1$ cd Projects2$ mkdir djangostore3$ cd djangostore4$ pyvenv venv5$ source venv/bin/activate6$ pip install Django psycopg2 ipdb7$ django-admin.py startproject djangostore8$ python manage.py startapp store
I'm assuming here that you're using Python 3.4. On Ubuntu, getting it working is a bit more involved.
Next, open up djangostore/settings.py
and amend INSTALLED_APPS
to include the new app and the PostgreSQL-specific functionality:
1INSTALLED_APPS = (2 'django.contrib.admin',3 'django.contrib.auth',4 'django.contrib.contenttypes',5 'django.contrib.sessions',6 'django.contrib.messages',7 'django.contrib.staticfiles',8 'django.contrib.postgres',9 'store',10)
You'll also need to configure the database settings:
1DATABASES = {2 'default': {3 'ENGINE': 'django.db.backends.postgresql_psycopg2',4 'NAME': 'djangostore',5 'USER': 'store',6 'PASSWORD': 'password',7 'HOST': 'localhost',8 'PORT': '',9 }10}
We need to create an empty migration to use HStoreField
:
$ python manage.py makemigrations --empty store
This command should create the file store/migrations/0001_initial.py
. Open this up and edit it to look like this:
1# -*- coding: utf-8 -*-2from __future__ import unicode_literals34from django.db import models, migrations5from django.contrib.postgres.operations import HStoreExtension678class Migration(migrations.Migration):910 dependencies = [11 ]1213 operations = [14 HStoreExtension(),15 ]
This will make sure the HStore extension is installed. Next, let's run these migrations:
1$ python manage.py migrate2Operations to perform:3 Synchronize unmigrated apps: messages, staticfiles, postgres4 Apply all migrations: sessions, store, admin, auth, contenttypes5Synchronizing apps without migrations:6 Creating tables...7 Running deferred SQL...8 Installing custom SQL...9Running migrations:10 Rendering model states... DONE11 Applying contenttypes.0001_initial... OK12 Applying auth.0001_initial... OK13 Applying admin.0001_initial... OK14 Applying contenttypes.0002_remove_content_type_name... OK15 Applying auth.0002_alter_permission_name_max_length... OK16 Applying auth.0003_alter_user_email_max_length... OK17 Applying auth.0004_alter_user_username_opts... OK18 Applying auth.0005_alter_user_last_login_null... OK19 Applying auth.0006_require_contenttypes_0002... OK20 Applying sessions.0001_initial... OK21 Applying store.0001_initial... OK
Now, we're ready to start creating our Product
model. Open up store/models.py
and amend it as follows:
1from django.contrib.postgres.fields import HStoreField2from django.db import models34# Create your models here.5class Product(models.Model):6 created_at = models.DateTimeField(auto_now_add=True)7 updated_at = models.DateTimeField(auto_now=True)8 name = models.CharField(max_length=200)9 description = models.TextField()10 price = models.FloatField()11 attributes = HStoreField()1213 def __str__(self):14 return self.name
Note that HStoreField
is not part of the standard group of model fields, and needs to be imported from the Postgres-specific fields module. Next, let's create and run our migrations:
1$ python manage.py makemigrations2$ python manage.py migrate
We should now have a Product
model where the attributes
field can be any arbitrary data we want. Note that we installed ipdb
earlier - if you're not familiar with it, this is an improved Python debugger, and also pulls in ipython
, an improved Python shell, which Django will use if available.
Open up the Django shell:
$ python manage.py shell
Then, import the Product
model:
from store.models import Product
Let's create our first product - a plastic storage box:
1box = Product()2box.name = 'Box'3box.description = 'A big box'4box.price = 5.995box.attributes = { 'capacity': '1L', "colour": "blue"}6box.save()
If we take a look, we can see that the attributes can be returned as a Python dictionary:
1In [12]: Product.objects.all()[0].attributes2Out[12]: {'capacity': '1L', 'colour': 'blue'}
We can easily retrieve single values:
1In [15]: Product.objects.all()[0].attributes['capacity']2Out[15]: '1L'
Let's add a second product - a mop:
1mop = Product()2mop.name = 'Mop'3mop.description = 'A mop'4mop.price = 12.995mop.attributes = { 'colour': "red" }6mop.save()
Now, we can filter out only the red items easily:
1In [2]: Product.objects.filter(attributes__contains={'colour': 'red'})2Out[2]: [<Product: Mop>]
Here we search for items where the colour
attribute is set to red
, and we only get back the mop. Let's do the same for blue items:
1In [3]: Product.objects.filter(attributes__contains={'colour': 'blue'})2Out[3]: [<Product: Box>]
Here it returns the box. Let's now search for an item with a capacity of 1L:
1In [4]: Product.objects.filter(attributes__contains={'capacity': '1L'})2Out[4]: [<Product: Box>]
Only the box has the capacity attribute at all, and it's the only one returned. Let's see what happens when we search for an item with a capacity of 2L, which we know is not present:
1In [5]: Product.objects.filter(attributes__contains={'capacity': '2L'})2Out[5]: []
No items returned, as expected. Let's look for any item with the capacity
attribute:
1In [6]: Product.objects.filter(attributes__has_key='capacity')2Out[6]: [<Product: Box>]
Again, it only returns the box, as that's the only one where that key exists. Note that all of this is tightly integrated with the existing API for the Django ORM. Let's add a third product, a food hamper:
1In [3]: hamper = Product()23In [4]: hamper.name = 'Hamper'45In [5]: hamper.description = 'A food hamper'67In [6]: hamper.price = 19.9989In [7]: hamper.attributes = {10 ...: 'contents': 'ham, cheese, coffee',11 ...: 'size': '90cmx60cm'12 ...: }1314In [8]: hamper.save()
Next, let's return only those items that have a contents
attribute that contains cheese
:
1In [9]: Product.objects.filter(attributes__contents__contains='cheese')2Out[9]: [<Product: Hamper>]
As you can see, the HStoreField
type allows for quite complex queries, while allowing you to set arbitrary values for an individual item. This overcomes one of the biggest issues with relational databases - the inability to set arbitrary data. Previously, you might have to work around it in some fashion, such as creating a table containing attributes for individual items which had to be joined on the product table. This is very cumbersome and difficult to use, especially when you wanted to work with more than one product. With this approach, it's easy to filter products by multiple values in the HStore field, and you get back all of the attributes at once, as in this example:
1In [13]: Product.objects.filter(attributes__capacity='1L', attributes__colour='blue')2Out[13]: [<Product: Box>]3In [14]: Product.objects.filter(attributes__capacity='1L', attributes__colour='blue')[0].attributes4Out[14]: {'capacity': '1L', 'colour': 'blue'}
Similar functionality is coming in a future version of MySQL, so it wouldn't be entirely surprising to see HStoreField
become more generally available in Django in the near future. For now, this functionality is extremely useful and makes for a good reason to ditch MySQL in favour of PostgreSQL for your future Django apps.