Skip to main content

Converting a Django ForeignKey to a GenericForeignKey

I’m currently working on a Django project called the SWAP that contains lots of disease outbreak time series data. Previously, the model looked something like this (I’m using Django 1.9 with Python 3.4+):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
from django.db import models
 
class Outbreak(models.Model):
    ...
 
class TimeStep(models.Model):
    outbreak = models.ForeignKey(Outbreak)
    timestamp = models.DateTimeField()
    count = models.PositiveIntegerField(blank=True, null=True)
 
    class Meta:
        ordering = ['timestamp']
 
    def __str__(self):
        return '{} - {}'.format(self.timestamp, self.count if self.count else '[empty]')

We’re working on incorporating disease spread forecasts into the project. Forecasts actually require the same sort of time series infrastructure that outbreaks do. It seemed wasteful to create a new TimeStep class just for forecasts, so I started doing some research and quickly stumbled upon generic relations.

Django’s documentation on this topic is somewhat lacking, so it took a good amount of digging around to figure out how to structure things and migrate my existing models. The following resources were really useful in my search:

The last link was the most useful because it discussed how to create the migrations; however, it was written for South, which is now outdated because South has been absorbed into Django as the Migrations framework. I’m writing this post to describe how I migrated a model using a ForeignKey to a GenericForeignKey.

1. Add the necessary fields

Two primary fields are needed in the TimeStep class, content_type and object_id:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
from django.contrib.contenttypes.models import ContentType
from django.db import models
 
class TimeStep(models.Model):
    outbreak = models.ForeignKey(Outbreak)
 
    content_type = models.ForeignKey(ContentType, null=True, blank=True)
    object_id = models.PositiveIntegerField(null=True, blank=True)
 
    timestamp = models.DateTimeField()
    count = models.PositiveIntegerField(blank=True, null=True)
 
    class Meta:
        ordering = ['timestamp']
 
    def __str__(self):
        return '{} - {}'.format(self.timestamp, self.count if self.count else '[empty]')

A GenericForeignKey is essentially a tuple, (content_type, object_id). The content type and object ID are all Django needs to perform a lookup. We’ll actually add the GenericForeignKey field later.

Note that null=True, blank=True is added to the new fields temporarily so that the migration step doesn’t complain about them being blank. We’ll remove those requirements later.

Now, make and run the migration:

./manage.py makemigrations
./manage.py migrate

2. Populate the new fields using a data migration

The fields now exist, but they’re empty. We need to create a data migration. To do this, use the following command to create an empty migration, replacing swap with your project’s name:

./manage.py makemigrations --empty swap

For me, this created a new file under swap/migrations/ called 0039_auto_20160307_1408.py. The final migration looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# -*- coding: utf-8 -*-
# Generated by Django 1.9.2 on 2016-03-07 14:08
from __future__ import unicode_literals
 
from django.db import migrations
 
 
def migrate_foreign_key(apps, schema_editor):
    """
        Data migration to populate the GenericForeignKey fields.
    """
    TimeStep = apps.get_model('swap', 'TimeStep')
    ContentType = apps.get_model('contenttypes', 'ContentType')
 
    outbreak_content_type = ContentType.objects.get(app_label='swap', model='outbreak')
 
    for timestep in TimeStep.objects.all():
        timestep.content_type = outbreak_content_type
        timestep.object_id = timestep.outbreak.pk
        timestep.save()
 
 
class Migration(migrations.Migration):
 
    dependencies = [
        ('swap', '0038_auto_20160307_1408'),
    ]
 
    operations = [
        migrations.RunPython(migrate_foreign_key),
    ]

migrate_foreign_key is a very simple function that properly specifies the content_type of all TimeStep objects as an Outbreak. It then copies each outbreak’s primary key into the object_id.

Note that the call to apps.get_model is necessary. You cannot just import your model and use it. As the official docs say, this is so that we get the correct versioned model in this context.

Run the migration:

./manage.py migrate

3. Cleanup

Finally, I need to do 3 things:

  1. Remove the old ForeignKey
  2. Add the new GenericForeignKey
  3. Remove the null and blank requirements

After I do these 3 things, my TimeStep model now looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
from django.contrib.contenttypes.fields import GenericForeignKey
from django.contrib.contenttypes.models import ContentType
from django.db import models
 
class TimeStep(models.Model):
    content_type = models.ForeignKey(ContentType, on_delete=models.CASCADE)
    object_id = models.PositiveIntegerField()
    content_object = GenericForeignKey('content_type', 'object_id')
 
    timestamp = models.DateTimeField()
    count = models.PositiveIntegerField(blank=True, null=True)
 
    class Meta:
        ordering = ['timestamp']
 
    def __str__(self):
        return '{} - {}'.format(self.timestamp, self.value if self.value else '[empty]')

Finally, make and run the migrations:

./manage.py makemigrations
./manage.py migrate

Running the migration may cause Django to prompt you to fill in values for the content_type and object_id fields. If it does, there should be an option to not fill in a value due to a data migration; select that option.

I added a couple new methods to TimeStep to make querying easier:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
from django.contrib.contenttypes.fields import GenericForeignKey
from django.contrib.contenttypes.models import ContentType
from django.db import models
 
class Outbreak(models.Model):
    ...
 
class ForecastSeries(models.Model):
    ...
 
class TimeStep(models.Model):
    content_type = models.ForeignKey(ContentType, on_delete=models.CASCADE)
    object_id = models.PositiveIntegerField()
    content_object = GenericForeignKey('content_type', 'object_id')
 
    timestamp = models.DateTimeField()
    count = models.PositiveIntegerField(blank=True, null=True)  # we allow for a null blank count so that we can allow the analyst to specify the start/end of the interval
 
    # store the content types for each type of object that will use TimeStep
    _content_types = dict()
    _content_types[Outbreak] = ContentType.objects.get_for_model(Outbreak)
    _content_types[ForecastSeries] = ContentType.objects.get_for_model(ForecastSeries)
 
    class Meta:
        ordering = ['timestamp']
 
    def __str__(self):
        return '{} - {}'.format(self.timestamp, self.value if self.value else '[empty]')
 
    @staticmethod
    def get_outbreak_timesteps(outbreak):
        """
            Get a particular Outbreak's time series. You *cannot* do `TimeStep.objects.filter(outbreak=o)` because this class
            uses a GenericForeignKey.
        """
        return TimeStep.objects.filter(content_type=TimeStep._content_types[Outbreak], object_id=outbreak.pk)
 
    @staticmethod
    def get_forecast_series_timesteps(forecast_series):
        """
            Get a particular ForecastSeries' time series.
        """
        return TimeStep.objects.filter(content_type=TimeStep._content_types[ForecastSeries], object_id=forecast_series.pk)

Because I can no longer do TimeStep.objects.filter(outbreak=o), querying is a tad more complex, but it doesn’t have to be. These methods allow me to do TimeStep.get_outbreak_timesteps(o) or TimeStep.get_forecast_series_timesteps(f) to pull a particular model’s time series.

That’s it! Nice and easy. Let me know if you find this useful or have any questions.

ZIP codes shouldn’t be represented as integers!

This is just a quick post to point out that ZIP codes should never be stored or represented as integers. ZIP codes should always be stored and represented as strings. Take the ZIP code 07755 (Oakhurst, NJ), for example. Any reasonable parsing algorithm will correct read in 07755, but 07755 will be immediately converted to 7755 as leading zeros mean nothing in the context of integers. The problem doesn’t become apparent until it becomes time to output the ZIP code (for example, in a KML file). 7755 is not a valid ZIP code.

A common place where this problem pops up is in web apps where user ZIP codes are stored. In Django, for example, you may be tempted to add the following field to a model:

zip = models.IntegerField(max_length=5)

While this will certainly work, if you ever need to display user ZIP codes (e.g., to let a user view their current address), leading zeros won’t be displayed. Instead, the correct way to represent a ZIP code would be:

zip = models.CharField(max_length=5)

Now, leading zeros won’t get cut off.