While working on the port of CubicWeb to the Windows platform, including supporting MS Sql Server as the database backend, I got bitten by a weird behavior of that database engine. When working with cubicweb, most administrations command are wrappped by the cubicweb-ctl utility and database backups are performed by running cubicweb-ctl db-dump <instancename>. If the instance uses PostgreSQL as the backend, this will call the pg_dump utility.

When porting to Sql Server, I could not find such a utility, but I found that Transact SQL has a BACKUP DATABASE command, so I was able to call it using Python's pyodbc module. I tested it interactively, and was satisfied with the result:

>>> from logilab.common.db import get_connection
>>> cnx = get_connection(driver='sqlserver2005', database='mydb', host='localhost', extra_args='autocommit;trusted_connection')
>>> cursor = cnx.cursor()
>>> cursor.execute('BACKUP DATABASE ? TO DISK = ?', ('mydb', 'C:\\Data\\mydb.dump'))
>>> cnx.close()

However, testing that very same code through cubicweb-ctl produced no file in C:\\Data\\. To make a (quite) long story short, the thing is that the BACKUP DATABASE command is asynchronous (or maybe the odbc driver is) and the call to cursor.execute(...) will return immediately, before the backup actually starts. When running interactively, by the time I got to type cnx.close() the backup was finished but when running in a function, the connection was closed before the backup started (which effectively killed the backup operation).

I worked around this by monitoring the size of the backup file in a loop and waiting until that size gets stable before closing the connection:

import os
import time
from logilab.common.db import get_connection

filename = 'c:\\data\\toto.dump'
dbname = 'mydb'
cnx = get_connection(driver='sqlserver2005',
cursor = cnx.cursor()
cursor.execute("BACKUP DATABASE ? TO DISK= ? ", (dbname, filename,))
prev_size = -1
err_count = 0
same_size_count = 0
while err_count < 10 and same_size_count < 10:
        size = os.path.getsize(filename)
        print 'file size', size
    except OSError, exc:
        err_count +=1
        print exc
    if size > prev_size:
        same_size_count = 0
        prev_size = size
       same_size_count += 1

I hope sharing this will save some people time...

Note: get_connection() comes from logilab.common.db which is a wrapper module which tries to simplify writing code for different database backends by handling once for all various idiosyncrasies. If you want pure pyodbc code, you can replace it with:

from pyodbc import connect
cnx = connect(driver='SQL Server Native Client 10.0',

The autocommit=True part is especially important, because BACKUP DATABASE will fail if run from within a transaction.

blog entry of