Part 3: Advanced Commands and Data Migrations

Iteratively working on a migration

Sometimes, you’ll find that you’ve made model changes that need to be further refined. Say you define this model:

class Group(models.Model):
    name = models.TextField(verbose_name="Name")
    facebook_page__id = models.CharField(max_length=255)

and you’ve created and applied this migration:

./ schemamigration southtut --auto
./ migrate southtut

You then notice two things: One, name should really be a CharField, not a TextField; and facebook_page__id contains double underscores where there should be a single one. You can fix these issues in your model, and then run:

./ schemamigration southtut --auto --update
 + Added model southtut.Group
Migration to be updated, 0026_auto__add_group, is already applied, rolling it back now...
previous_migration: 0025_auto__foo (applied: 2012-05-25 21:20:47)
Running migrations for southtut:
  - Migrating backwards to just after 0025_auto__foo.
  < partner:0026_auto__add_group
Updated You can now apply this migration with: ./ migrate southtut

What happened here is that South removed the most recent migration, which created the model, but included the mistakes that were made, and replaced it with a new migration that includes the latest corrections made to the model.

It also noticed that the migration had already been applied, and automatically rolled it back for you. You can now apply the latest version of the migration to create the correct version of the model:

./ migrate southtut

You may repeat this process as often as required to iron out any issues and come up with the final database changes required; which you can then publish, neatly packed into a single migration.

Listing current migrations

It can be very useful to know what migrations you currently have applied, and which ones are available. For this reason, there’s ./ migrate --list.

Run against our project from before, we get:

$ ./ migrate --list

 (*) 0001_initial
 (*) 0002_auto__add_field_knight_dances_whenever_able
 (*) 0003_auto__add_field_knight_shrubberies
 (*) 0004_auto__add_unique_knight_name

The output has an asterisk (*) next to a migration name if it has been applied, and an empty space ( ) if not [1].

If you have a lot of apps or migrations, you can also specify an app name to show just the migrations from that app.

[1]An interesting side effect of this is that you can run the command ./ migrate --list |grep -v "*" to see which migrations are unapplied, and need running.

Data migrations

The previous parts have only covered schema migrations - migrations which change the layout of your columns and indexes. There’s also another kind of migration, the so-called data migration.

Data migrations are used to change the data stored in your database to match a new schema, or feature. For example, if you’ve been storing passwords in plain text [2], and you’re moving to salted and hashed passwords, you might have these three steps (where each step corresponds to a migration):

  • Create two new columns, password_salt and password_hash (a schema migration).
  • Using the contents of the old password column, calculate salts and hashes for each user (a data migration)
  • Remove the old password column (a schema migration).
[2]If you’re actually storing passwords in plaintext, please convert. Now.

The first and last migrations you already know how to do; make the relevant changes in the file, and run ./ schemamigration --auto myapp. Remember that you need to add the two columns separately to deleting the old column, as otherwise the old column won’t be around for us to get data out of, and you’ll have lost all your users’ passwords [3].

[3]Always, always, backup your database before doing any kind of potentially destructive migration. One time, it will go wrong.

Let’s follow a real example. Make a new app, and call it southtut2. Add it to INSTALLED_APPS, and then give it this model:

from django.db import models

class User(models.Model):

    username = models.CharField(max_length=255)
    password = models.CharField(max_length=60)
    name = models.TextField()

Make an initial migration for it, apply it, and then add a record:

$ ./ schemamigration --initial southtut2
Creating migrations directory at '/home/andrew/Programs/litret/southtut2/migrations'...
Creating in '/home/andrew/Programs/litret/southtut2/migrations'...
+ Added model southtut2.User
Created You can now apply this migration with: ./ migrate southtut2

$ ./ migrate southtut2
Running migrations for southtut2:
 - Migrating forwards to 0001_initial.
 > southtut2:0001_initial
 - Loading initial data for southtut2.

$ ./ shell
In [1]: from southtut2.models import User

In [2]: User.objects.create(username="andrew", password="ihopetheycantseethis", name="Andrew Godwin")
Out[2]: <User: User object>

In [3]: User.objects.get(id=1).password
Out[3]: u'ihopetheycantseethis'

As you can see, the password is clearly visible, which isn’t good. Let’s move to password hashing, while keeping everyone’s password valid. Firstly, modify the model so it looks like this:

from django.db import models
import sha

class User(models.Model):

    username = models.CharField(max_length=255)
    password = models.CharField(max_length=60)
    password_salt = models.CharField(max_length=8, null=True)
    password_hash = models.CharField(max_length=40, null=True)
    name = models.TextField()

    def check_password(self, password):
        return sha.sha(self.password_salt + password).hexdigest() == self.password_hash

Make a schema migration that will create our two new columns (notice that they’ve both been added as null=True; once they have data, we’ll alter them to be null=False):

$ ./ schemamigration southtut2 --auto
 + Added field password_salt on southtut2.User
 + Added field password_hash on southtut2.User
Created You can now apply this migration with: ./ migrate southtut2

Now, the second migration is more interesting. Firstly, we need to create a skeleton data migration (unlike schema migrations, South can’t write these for you):

$ ./ datamigration southtut2 hash_passwords

If you open up the file, you’ll see that South has made the shell of a migration; the models definitions are there, the forwards() and backwards() functions are there, but there’s no code in either. We’ll write some code to port the passwords over in the forwards function:

def forwards(self, orm):
    import random, sha, string
    for user in orm.User.objects.all():
        user.password_salt = "".join([random.choice(string.letters) for i in range(8)])
        user.password_hash = sha.sha(user.password_salt + user.password).hexdigest()

Notice that we use orm.User to access the User model - this gives us the version of User from when this migration was created, so if we want to run the migration in future, it won’t get a completely different, new, User model.

If you want to access models from other apps in your data migration, use a syntax like orm['contenttypes.ContentType']. Models will be available if you can somehow get to them via ForeignKey or ManyToMany traversal from your app’s models; if you want to freeze other models, simply pass --freeze appname on the datamigration command line.

We should also raise an error in the backwards() method, since this process is by its very nature irreversible:

def backwards(self, orm):
    raise RuntimeError("Cannot reverse this migration.")

That looks good. Finally, remove the password field from your model, and run schemamigration one last time to make a migration to remove that field:

$ ./ schemamigration southtut2 --auto
 ? The field 'User.password' does not have a default specified, yet is NOT NULL.
 ? Since you are adding or removing this field, you MUST specify a default
 ? value to use for existing rows. Would you like to:
 ?  1. Quit now, and add a default to the field in
 ?  2. Specify a one-off value to use for existing columns now
 ? Please select a choice: 2
 ? Please enter Python code for your one-off default value.
 ? The datetime module is available, so you can do e.g.
 >>> ""
 - Deleted field password on southtut2.User
Created You can now apply this migration with: ./ migrate southtut2

Notice that South is asking for a default value for password; if you were to reverse this migration, it tries to re-add the password column, and thus needs either a default value or for the field to be null=True. Here, I’ve fed it the empty string, as that’s a reasonable default in this case.

Finally, let’s apply all three migrations:

$ ./ migrate southtut2
Running migrations for southtut2:
 - Migrating forwards to 0004_auto__del_field_user_password.
 > southtut2:0002_auto__add_field_user_password_salt__add_field_user_password_hash
 > southtut2:0003_hash_passwords
 > southtut2:0004_auto__del_field_user_password
 - Loading initial data for southtut2.

Looks good - we’ve added the new columns, migrated the passwords over, and then deleted the old column. Let’s check our data was preserved:

$ ./ shell
In [1]: from southtut2.models import User

In [2]: User.objects.get(id=1).check_password("ihopetheycantseethis")
Out[2]: True

In [3]: User.objects.get(id=1).check_password("fakepass")
Out[3]: False

That looks like a successful data migration!

You can do a lot more with this inside a data migration; any model can be available to you. The only caveat is that you won’t have access to any custom methods or managers on your models, as they’re not preserved as part of the freezing process (there’s no way to do this generally); you’ll have to copy any code you want into the migration itself. Feel free to make them methods on the Migration class; South ignores everything apart from forwards and backwards.