I found solutions to a multiprocessing problem I was debugging in a Django management command. It is a long-running service process that spawns child processes and all of them use the database backend in parallel.

Django's database API is thread-safe. However, when forking new processes, the database connection becomes shared among un-cooperating processes. SQL statements and returned datasets get mangled, unpredictably generating error messages like “Commands out of sync; you can't run this command now” and introducing inconsistencies in the database which generate errors later.

I tried using multiprocessing.Locks around any function that used the database but this is error-prone, hard to test, lazy, ugly and didn't work.

I thought of 3 solutions:

  1. After forking, trick the ORM into thinking the database is closed by setting db.connection = None. Django will create a new connection as needed. The original connection stays open within the parent process.
  2. Wrap the subprocess' main function in the db.connection.temporary_connection() context manager.
  3. close() the connection before forking. Creates unnecessary overhead in the main process, but the other two methods are undocumented so this is the only one that is not hackish.

While testing, I discarded options 1 and 2. There is much more state that needs to be taken into account than just the connection object, and much of that state is backend-specific.

I ended up using the following code which works perfectly despite a very slight overhead:

if not p.is_alive():  # p is a multiprocessing.Process instance
    db.connections.close_all()  # Running Django 1.8
    p.start()

Relevant StackOverflow question: Django multiprocessing and database connections

Comments

Comment Atom Feed

There are no comments yet.

Add a Comment

You can use the Markdown syntax to format your comment.