subscribe to this blog

CubicWeb Blog

News about the framework and its uses.

Documentation progress

2010/04/20 by Aurelien Campeas

As part of an effort to improve the documentation (see the cw_course version) a lot of chapters have been completed (and filled with real-world examples). Many more were updated and reorganized.

I won't list everything but here are the most important improvements:

picture under creative commons

Picture under Creative Commons, courtesy of digitalnoise.

  • The publishing process
  • Templates & the architecture of views
  • Primary views customizations (including use of the uicfg module)
  • Controllers
  • Hooks & Operations
  • Proper usage of the ORM
  • Unit tests
  • Breadcrumbs
  • URL rewrite
  • Using the CW javascript library

Last but not least, a whole new tutorial based on Sylvain's great series Building my photos Web site has been included. It covers some advanced topics such as Operations and sophisticated security settings.

The visual style has been enhanced a bit to have better readability.

As always, patches are welcome !

picture under Creative Commons, courtesy of digitalnoise


Building my photos web site with CubicWeb part II: security, testing and migration

2010/04/13 by Sylvain Thenault

This post will cover various topics:

  • configuring security
  • migrating an existing instance
  • writing some unit tests

Goal

Here are the read permissions I want:

  • folders, files, images and comments should have one of the following visibility rules:
    • 'public', everyone can see it
    • 'authenticated', only authenticated users can see it
    • 'restricted', only a subset of authenticated users can see it
  • managers (e.g. me) can see everything
  • only authenticated users can see people
  • everyone can see classifier entities (tag and zone)

Also, unless explicity specified, the visibility of an image should be the same as the visibility of its parent folder and the visibility of a comment should be the same as the one of the commented entity. If there is no parent entity, the default visibility is 'authenticated'.

Regarding write permissions, that's much easier:

  • the anonymous user can't write
  • authenticated users can only add comment
  • managers will add the remaining stuff

Now, let's implement that!

Proper security in CubicWeb is done at the schema level, so you don't have to bother with it in the views, for the users will only see what they have access to.

Step 1: adding permissions to the schema

In the schema, you can grant access according to groups or RQL expressions (users get access if the expression return some results). To implements the read security defined above, groups are not enough, we'll need to use RQL expressions. Here is the idea:

  • add a visibility attribute on folder, image and comment, with a vocabulary ('public', 'authenticated', 'restricted', 'parent')
  • add a may_be_read_by relation that links folder, image or comment to users,
  • add hooks to propagate permission changes.

So the first thing to do is to modify the schema.py of my cube to define these relations:

from yams.constraints import StaticVocabularyConstraint

class visibility(RelationDefinition):
    subject = ('Folder', 'File', 'Image', 'Comment')
    object = 'String'
    constraints = [StaticVocabularyConstraint(('public', 'authenticated',
                                               'restricted', 'parent'))]
    default = 'parent'
    cardinality = '11' # required

class may_be_read_by(RelationDefinition):
    subject = ('Folder', 'File', 'Image', 'Comment',)
    object = 'CWUser'

We can note the following points:

  • we've added a new visibility attribute to folder, file, image and comment using a RelationDefinition
  • cardinality = '11' means this attribute is required. This is usually hidden under the required argument given to the String constructor, but we can rely on this here (same thing for StaticVocabularyConstraint, which is usually hidden by the vocabulary argument)
  • the 'parent' possible value will be used for visibility propagation

Now, we should be able to define security rules in the schema, based on these new attribute and relation. Here is the code to add to schema.py:

from cubicweb.schema import ERQLExpression

VISIBILITY_PERMISSIONS = {
    'read':   ('managers',
               ERQLExpression('X visibility "public"'),
               ERQLExpression('X visibility "authenticated", U in_group G, G name "users"'),
               ERQLExpression('X may_be_read_by U')),
    'add':    ('managers',),
    'update': ('managers', 'owners',),
    'delete': ('managers', 'owners'),
    }
AUTH_ONLY_PERMISSIONS = {
        'read':   ('managers', 'users'),
        'add':    ('managers',),
        'update': ('managers', 'owners',),
        'delete': ('managers', 'owners'),
        }
CLASSIFIERS_PERMISSIONS = {
        'read':   ('managers', 'users', 'guests'),
        'add':    ('managers',),
        'update': ('managers', 'owners',),
        'delete': ('managers', 'owners'),
        }

from cubes.folder.schema import Folder
from cubes.file.schema import File, Image
from cubes.comment.schema import Comment
from cubes.person.schema import Person
from cubes.zone.schema import Zone
from cubes.tag.schema import Tag

Folder.__permissions__ = VISIBILITY_PERMISSIONS
File.__permissions__ = VISIBILITY_PERMISSIONS
Image.__permissions__ = VISIBILITY_PERMISSIONS
Comment.__permissions__ = VISIBILITY_PERMISSIONS.copy()
Comment.__permissions__['add'] = ('managers', 'users',)
Person.__permissions__ = AUTH_ONLY_PERMISSIONS
Zone.__permissions__ = CLASSIFIERS_PERMISSIONS
Tag.__permissions__ = CLASSIFIERS_PERMISSIONS

What's important in there:

  • VISIBILITY_PERMISSIONS provides read access to an entity:
    • if user is in the 'managers' group,
    • or if visibility attribute's value is 'public',
    • or if visibility attribute's value is 'authenticated' and user (designed by the 'U' variable in the expression) is in the 'users' group (all authenticated users are expected to be in this group)
    • or if user is linked to the entity (the 'X' variable) through the may_be_read_by permission
  • we modify permissions of the entity types we use by importing them and modifying their __permissions__ attribute
  • notice the .copy(): we only want to modify 'add' permission for Comment, not for all entity types using VISIBILITY_PERMISSIONS!
  • remaning parts of the security model is done using regular groups:
    • 'users' is the group to which all authenticated users will belong
    • 'guests' is the group of anonymous users

Step 2: security propagation in hooks

To fullfill our requirements, we have to implement:

Also, unless explicity specified, the visibility of an image should be the same as
the visibility of its parent folder and the visibility of a comment should be the same as the
one of the commented entity. If there is no parent entity, the default visibility is
'authenticated'.

This kind of 'active' rule will be done using CubicWeb's hook system. Hooks are triggered on database event such as addition of new entity or relation.

The tricky part of the requirement is in unless explicitly specified, notably because when the entity addition hook is executed, we don't know yet its 'parent' entity (eg folder of an image, image commented by a comment). To handle such things, CubicWeb provides Operation, which allow to schedule things to do at commit time.

In our case we will:

  • on entity creation, schedule an operation that will set default visibility
  • when a "parent" relation is added, propagate parent's visibility unless the child already has a visibility set

Here is the code in cube's hooks.py:

from cubicweb.selectors import implements
from cubicweb.server import hook

class SetVisibilityOp(hook.Operation):
    def precommit_event(self):
        for eid in self.session.transaction_data.pop('pending_visibility'):
            entity = self.session.entity_from_eid(eid)
            if entity.visibility == 'parent':
                entity.set_attributes(visibility=u'authenticated')

class SetVisibilityHook(hook.Hook):
    __regid__ = 'sytweb.setvisibility'
    __select__ = hook.Hook.__select__ & implements('Folder', 'File', 'Image', 'Comment')
    events = ('after_add_entity',)
    def __call__(self):
        hook.set_operation(self._cw, 'pending_visibility', self.entity.eid,
                           SetVisibilityOp)

class SetParentVisibilityHook(hook.Hook):
    __regid__ = 'sytweb.setparentvisibility'
    __select__ = hook.Hook.__select__ & hook.match_rtype('filed_under', 'comments')
    events = ('after_add_relation',)

    def __call__(self):
        parent = self._cw.entity_from_eid(self.eidto)
        child = self._cw.entity_from_eid(self.eidfrom)
        if child.visibility == 'parent':
            child.set_attributes(visibility=parent.visibility)

Remarks:

  • hooks are application objects, hence have selectors that should match entity or relation type to which the hook applies. To match relation type, we use the hook specific match_rtype selector.
  • usage of set_operation: instead of adding an operation for each added entity, set_operation allows to create a single one and to store the eids of the entities to be processed in the session transaction data. This is a good pratice to avoid heavy operations manipulation cost when creating a lot of entities in the same transaction.
  • the precommit_event method of the operation will be called at transaction's commit time.
  • in a hook, self._cw is the repository session, not a web request as usually in views
  • according to hook's event, you have access to different member on the hook instance. Here:
    • self.entity is the newly added entity on 'after_add_entity' events
    • self.eidfrom / self.eidto are the eid of the subject / object entity on 'after_add_relation' events (you may also get the relation type using self.rtype)

The 'parent' visibility value is used to tell "propagate using parent security" because we want that attribute to be required, so we can't use None value else we'll get an error before we get any chance to propagate...

Now, we also want to propagate the may_be_read_by relation. Fortunately, CubicWeb provides some base hook classes for such things, so we only have to add the following code to hooks.py:

# relations where the "parent" entity is the subject
S_RELS = set()
# relations where the "parent" entity is the object
O_RELS = set(('filed_under', 'comments',))

class AddEntitySecurityPropagationHook(hook.PropagateSubjectRelationHook):
    """propagate permissions when new entity are added"""
    __regid__ = 'sytweb.addentity_security_propagation'
    __select__ = (hook.PropagateSubjectRelationHook.__select__
                  & hook.match_rtype_sets(S_RELS, O_RELS))
    main_rtype = 'may_be_read_by'
    subject_relations = S_RELS
    object_relations = O_RELS

class AddPermissionSecurityPropagationHook(hook.PropagateSubjectRelationAddHook):
    __regid__ = 'sytweb.addperm_security_propagation'
    __select__ = (hook.PropagateSubjectRelationAddHook.__select__
                  & hook.match_rtype('may_be_read_by',))
    subject_relations = S_RELS
    object_relations = O_RELS

class DelPermissionSecurityPropagationHook(hook.PropagateSubjectRelationDelHook):
    __regid__ = 'sytweb.delperm_security_propagation'
    __select__ = (hook.PropagateSubjectRelationDelHook.__select__
                  & hook.match_rtype('may_be_read_by',))
    subject_relations = S_RELS
    object_relations = O_RELS
  • the AddEntitySecurityPropagationHook will propagate the relation when filed_under or comments relations are added
    • the S_RELS and O_RELS set as well as the match_rtype_sets selector are used here so that if my cube is used by another one, it'll be able to configure security propagation by simply adding relation to one of the two sets.
  • the two others will propagate permissions changes on parent entities to children entities

Step 3: testing our security

Security is tricky. Writing some tests for it is a very good idea. You should even write them first, as Test Driven Development recommends!

Here is a small test case that'll check the basis of our security model, in test/unittest_sytweb.py:

from cubicweb.devtools.testlib import CubicWebTC
from cubicweb import Binary

class SecurityTC(CubicWebTC):

    def test_visibility_propagation(self):
        # create a user for later security checks
        toto = self.create_user('toto')
        # init some data using the default manager connection
        req = self.request()
        folder = req.create_entity('Folder',
                                   name=u'restricted',
                                   visibility=u'restricted')
        photo1 = req.create_entity('Image',
                                   data_name=u'photo1.jpg',
                                   data=Binary('xxx'),
                                   filed_under=folder)
        self.commit()
        photo1.clear_all_caches() # good practice, avoid request cache effects
        # visibility propagation
        self.assertEquals(photo1.visibility, 'restricted')
        # unless explicitly specified
        photo2 = req.create_entity('Image',
                                   data_name=u'photo2.jpg',
                                   data=Binary('xxx'),
                                   visibility=u'public',
                                   filed_under=folder)
        self.commit()
        self.assertEquals(photo2.visibility, 'public')
        # test security
        self.login('toto')
        req = self.request()
        self.assertEquals(len(req.execute('Image X')), 1) # only the public one
        self.assertEquals(len(req.execute('Folder X')), 0) # restricted...
        # may_be_read_by propagation
        self.restore_connection()
        folder.set_relations(may_be_read_by=toto)
        self.commit()
        photo1.clear_all_caches()
        self.failUnless(photo1.may_be_read_by)
        # test security with permissions
        self.login('toto')
        req = self.request()
        self.assertEquals(len(req.execute('Image X')), 2) # now toto has access to photo2
        self.assertEquals(len(req.execute('Folder X')), 1) # and to restricted folder

if __name__ == '__main__':
    from logilab.common.testlib import unittest_main
    unittest_main()

It is not complete, but it shows most of the things you will want to do in tests: adding some content, creating users and connecting as them in the test, etc...

To run it type:

[syt@scorpius test]$ pytest unittest_sytweb.py
========================  unittest_sytweb.py  ========================
-> creating tables [....................]
-> inserting default user and default groups.
-> storing the schema in the database [....................]
-> database for instance data initialized.
.
----------------------------------------------------------------------
Ran 1 test in 22.547s

OK

The first execution is taking time, since it creates a sqlite database for the test instance. The second one will be much quicker:

[syt@scorpius test]$ pytest unittest_sytweb.py
========================  unittest_sytweb.py  ========================
.
----------------------------------------------------------------------
Ran 1 test in 2.662s

OK

If you do some changes in your schema, you'll have to force regeneration of that database. You do that by removing the tmpdb* files before running the test:

[syt@scorpius test]$ rm tmpdb*

BTW, pytest is a very convenient utilities to control test execution, from the logilab-common package.

Step 4: writing the migration script and migrating the instance

Prior to those changes, Iv'e created an instance, fed it with some data, so I don't want to create a new one, but to migrate the existing one. Let's see how to do that.

Migration commands should be put in the cube's migration directory, in a file named file:<X.Y.Z>_Any.py ('Any' being there mostly for historical reason).

Here I'll create a migration/0.2.0_Any.py file containing the following instructions:

add_relation_type('may_be_read_by')
add_relation_type('visibility')
sync_schema_props_perms()

Then I update the version number in cube's __pkginfo__.py to 0.2.0. And that's it! Those instructions will:

  • update the instance's schema by adding our two new relations and update the underlying database tables accordingly (the two first instructions)
  • update schema's permissions definition (the later instruction)

To migrate my instance I simply type:

[syt@scorpius ~]$ cubicweb-ctl upgrade sytweb

I will then be asked some questions to do the migration step by step. You should say YES when it asks if a backup of your database should be done, so you can get back to the initial state if anything goes wrong...

Conclusion

This is a somewhat long post that I bet you will have to read at least twice ;) There is a hell lot of information hidden in there... But that should start to give you an idea of CubicWeb's power...

See you next time for part III !


Building my photos web site with CubicWeb (Part I)

2010/04/01 by Sylvain Thenault

Desired features

  • photo gallery;
  • photo stored onto the fs and displayed through a web interface dynamically;
  • navigation through folder (album), tags, geographical zone, people on the picture... using facets;
  • advanced security (eg not everyone can see everything). More on this later.

Let's go then

Step 1: creating a new cube for my web site

One note about my development environment: I wanted to use packaged version of CubicWeb and cubes while keeping my cube in my user directory, let's say ~src/cubes. It can be done by setting the following environment variables:

CW_CUBES_PATH=~/src/cubes
CW_MODE=user

The new cube, holding custom code for this web site, can now be created using:

cubicweb-ctl newcube --directory=~/src/cubes sytweb

Step 2: pick building blocks into existing cubes

Almost everything I want to represent in my web-site is somewhat already modelized in existing cubes that I'll extend for my needs:

  • folder, containing Folder entity type, which will be used as both 'album' and a way to map file system folders. Entities are added to a given folder using the filed_under relation.
  • file, containing File and Image entity type, gallery view, and a file system import utility.
  • zone, containing the Zone entity type for hierarchical geographical zones. Entities (including sub-zones) are added to a given zone using the situated_in relation.
  • person, containing the Person entity type plus some basic views.
  • comment, providing a full commenting system allowing one to comment entity types supporting the comments relation by adding a Comment entity.
  • tag, providing a full tagging system as an easy and powerful way to classify entities supporting the tags relation by linking the to Tag entities. This will allow navigation into a large number of pictures.

Ok, now I'll tell my cube requires all this by editing cubes/sytweb/__pkginfo__.py:

__depends_cubes__ = {'file': '>= 1.2.0',
                     'folder': '>= 1.1.0',
                     'person': '>= 1.2.0',
                     'comment': '>= 1.2.0',
                     'tag': '>= 1.2.0',
                     'zone': None,
                     }
__depends__ = {'cubicweb': '>= 3.5.10',
               }
for key,value in __depends_cubes__.items():
    __depends__['cubicweb-'+key] = value
__use__ = tuple(__depends_cubes__)

Notice that you can express minimal version of the cube that should be used, None meaning whatever version available.

Step 3: glue everything together in my cube's schema

from yams.buildobjs import RelationDefinition

class comments(RelationDefinition):
    subject = 'Comment'
    object = ('File', 'Image')
    cardinality = '1*'
    composite = 'object'

class tags(RelationDefinition):
    subject = 'Tag'
    object = ('File', 'Image')

class filed_under(RelationDefinition):
    subject = ('File', 'Image')
    object = 'Folder'

class situated_in(RelationDefinition):
    subject = 'Image'
    object = 'Zone'

class displayed_on(RelationDefinition):
    subject = 'Person'
    object = 'Image'

This schema:

  • allows to comment and tag File and Image entity types by adding the comments and tags relations. This should be all we have to do for this feature since the related cubes provide 'pluggable section' which are automatically displayed in the primary view of entity types supporting the relation.
  • adds a situated_in relation definition so that image entities can be geolocalized.
  • add a new relation displayed_on relation telling who can be seen on a picture.

This schema will probably have to evolve as time goes (for security handling at least), but since the possibility to change and update the schema evolving is one of CubicWeb features (and goals), we won't worry and see that later when needed.

Step 4: creating the instance

Now that I have a schema, I want to create an instance of that new 'sytweb' cube, so I run:

cubicweb-ctl create sytweb sytweb_instance

hint: if you get an error while the database is initialized, you can avoid having to reanswer to questions by running

cubicweb-ctl db-create sytweb_instance

This will use your already configured instance and start directly from the database creation step, thus skipping questions asked by the 'create' command.

Once the instance and database are fully initialized, run

cubicweb-ctl start sytweb_instance

to start the instance, check you can connect on it, etc...

Next times

We will customize the index page, see security configuration, use the Bytes FileSystem Storage... Lots of cool stuff remaining :)

Next post : security, testing and migration


Fun with graphs in apycot

2010/03/24 by Arthur Lutz

Yesterday I had a little quick fun with apycot in the train, using the existing plots infrastructure I managed to quickly add a few graphs to the application. I only had an old dump of our apycot for mercurial (http://apycot.hg-scm.org/) so the timespan is not huge, but I like it anyway! Here are some dev screenshots while you wait for this feature your your application... The pylint grades where pretty constant so I'm not including that graph.

http://www.cubicweb.org/image/779761?vid=downloadhttp://www.cubicweb.org/image/779768?vid=download

Now, I have to make solid code and integrate it properly.


CubicWeb 3.7 released

2010/03/19

Hi there !

I'm pleased to announce the 3.7 release of CubicWeb, after a much shorter development cycle than for the 3.6...

But it still have some interesting changes:

  • NOW DEPENDS ON PYTHON 2.5
  • use the newly created logilab.database package (you'll have to install it as well as upgrade logilab.common and rql)
  • proper behaviour on the repository side of cubiweb:
    • dropped unsafe_execute, execute is now unsafe by default in hooks and operations. You can still explicitly control security using the enabled_secury context manager
    • proper transaction hooks control using the hooks_control context manager
  • started some transaction undo support (only undo of deletion supported right now)
  • various other bug fixes and improvments

Notice the 3.6 branch will still be maintained for some time.

Enjoy!


Continuous Integration platform for Mercurial with apycot

2010/03/15 by Arthur Lutz

Since the mercurial 1.5 sprint Pierre-Yves has been working on improving Continuous Integration for Mercurial. All developers are encouraged to run the test suites and code quality checkers but it's no always feasible to test every cases, different OS, different python versions, strange test dependencies, slow coverage run, etc. Moreover it's generally useful to keep track of the results of previous tests, especially for benchmarks.

At http://apycot.hg-scm.org/ you will find a production setup that now runs several variants of the tests-suite for all official repo and checks code style and documentation. Notification by email or RSS is available. For more details check out the FAQ.

apycot is open source and uses the cubicweb platform, if you want to set up one for your project, check out the step by step documentation.

http://www.cubicweb.org/image/749160?vid=download

CubicWeb 3.6 is (almost) out!

2010/02/10 by Sylvain Thenault

And that's great news, after several months of development (things started moving in the beginning of august 2009...), it should be available on our Debian repositories and ftp site in the next few hours.

So, we can say this release contains a (too) large set of improvements and refactorings. I'll talk about the most important ones here.

Appobject/Entity classes namespace cleanup

First of all, the namespace cleanup... 3.6 is a step towards cleaning the entity classes (hence more generally appobject), which are used for a lot of things, making it impossible to tell for sure what could be used or not as an attribute or relation name. We decided to declare identifiers starting with \_cw or cw\_ reserved for the core classes. A lot of methods have been deprecated to cleanup the base appobject class namespace. The remaining methods on entity classes will be removed in future version, by the introduction of an ORM for database related methods, and by the (most probable) introduction of ZCA adapters for other aspects. The most notable renaming are:

  • .req -> ._cw
  • .rset -> .cw_rset
  • .row -> .cw_row
  • .col -> .cw_col

This is probably what you'll see first when upgrading to 3.6: a huge stack of deprecation warnings on your screen :)

Another step towards a nice and powerful form system

  • cleaner reponsibilities separation between form, field and widget

  • fields and widgets are now responsible for handling POSTed values (the editcontroller was handling this, making things really unflexible). The editcontroller has been rewritten and now properly gets values from fields. Another benefit is that you can now easily have a widget handling multiple inputs (see the new datetime picker for instance, or the custom widget for Bookmark.path)

  • refactored automatic forms:

    • rewrite 'generic relations' as a field
    • inlined forms are now encapsulated into a field

    so you get much more control on these parts of automatic forms by using mechanism provided generally by fields

    • clearer form relations tags: removed autoform_is_inlined, more understandable autoform_field_section

Hooks refactoring

Hooks are now regular appobjects, with selectors (don't forget to reuse Hook.__select__, remember that !). They should simply implement __call__ with no argument (well, only self) and will get info previously passed as argument as instance attributes, according to the matching event.

Test API cleanup

EnvBasedTC, ControllerTC, WebTest, RepoBasedTC are all gone. Simply use CubicWebTC, with an unified API similar to what you use in cubicweb-ctl shell and in usual development.

The Bytes File System Storage

You can now specify a custom storage for attributes of entities stored in the system source. This mechanism is used to provide a way to store Bytes attributes (such as File.data for instance) as files on the file-system instead of BLOBs in the database. You can configure which attributes should use this storage for your instance and then everything is transparent.

Schema definition changes (yams 0.27)

In your schema definition file:

  • "symetric" should be correctly spelled "symmetric" :)
  • "permissions" was renamed to "__permissions__"

Also, permissions for relations are now supported per definition, not per type, at the cost of a visible impact when writing/reading the schema.

Note about backward compatibility

We worked hard to keep backward compatibility, but you shouldn't upgrade to 3.6 without checking that everything is fine... Check notably:

  • forms, if you're using custom forms by overriding internal methods
  • import for date functions from cubicweb.utils (they moved to logilab.common.date)

And also

CubicWeb 3.6 comes with a set of 37 cubes "3.6"-ready to avoid too much warnings!

Enjoy!


CubicWeb documentation mini-sprint report

2010/02/10 by Sylvain Thenault

We held a one day sprint last week in our Paris office, trying to improve CubicWeb's documentation.

There is a huge work to do on this, much more than we can do on a one day sprint, even with many people. But you have to begin with something :)

So, after a quick meeting to define priorities:

  • Stéphanie, Charles and later Sandrine (from her US home-office), began to add some documentation and screenshots to cubes. They started with the following cubes: addressbook, person, basket, tag, folder, forgotpwd, forge, tracker, vcsfile, keyword, blog and comment.
  • Julien explored sphinx abilities to build the index and extract docstrings. He applied this to improve the documentation of selectors.
  • Adrien (ach) and Celso, our friend from Mexico, tackled the task to improve the tutorial from a beginner's point of view.
  • Arthur added some pieces of documentation found in our intranet, mailing-list...
  • Pyves worked on a cubicweb-ctl command to generate schema images (png) for cubes, to include them in the cube's documentation.
  • Adrien (adim) and I helped the various teams.

Huum, I think I did not forgot anyone...

If there is still a lot to do (we need more doc sprints, stay tuned), this is really a nice start! This site should soon be updated to include more valuable cubes description and online documentation extracted from the contributed doc.


CubicWeb documentation sprint in feb. 2010

2010/01/22 by Nicolas Chauvat
http://farm4.static.flickr.com/3042/2871708248_950831962c_s.jpg

On February 2nd, 2010 Logilab will host in its head offices a one-day sprint dedicated to the improvement of the CubicWeb documentation.

Get in touch with Logilab if you want to participate in person or via the net: contact at logilab dot fr.

Photo by Adam Hyde from the FLOSS blog


MS SQL Server backuping gotcha

2010/01/19

While working on the port of CubicWeb to the Windows platform, including supporting MS Sql Server as the database backend, I got bitten by a weird behavior of that database engine. When working with cubicweb, most administrations command are wrappped by the cubicweb-ctl utility and database backups are performed by running cubicweb-ctl db-dump <instancename>. If the instance uses PostgreSQL as the backend, this will call the pg_dump utility.

When porting to Sql Server, I could not find such a utility, but I found that Transact SQL has a BACKUP DATABASE command, so I was able to call it using Python's pyodbc module. I tested it interactively, and was satisfied with the result:

>>> from logilab.common.db import get_connection
>>> cnx = get_connection(driver='sqlserver2005', database='mydb', host='localhost', extra_args='autocommit;trusted_connection')
>>> cursor = cnx.cursor()
>>> cursor.execute('BACKUP DATABASE ? TO DISK = ?', ('mydb', 'C:\\Data\\mydb.dump'))
>>> cnx.close()

However, testing that very same code through cubicweb-ctl produced no file in C:\\Data\\. To make a (quite) long story short, the thing is that the BACKUP DATABASE command is asynchronous (or maybe the odbc driver is) and the call to cursor.execute(...) will return immediately, before the backup actually starts. When running interactively, by the time I got to type cnx.close() the backup was finished but when running in a function, the connection was closed before the backup started (which effectively killed the backup operation).

I worked around this by monitoring the size of the backup file in a loop and waiting until that size gets stable before closing the connection:

import os
import time
from logilab.common.db import get_connection

filename = 'c:\\data\\toto.dump'
dbname = 'mydb'
cnx = get_connection(driver='sqlserver2005',
                     host='localhost',
                     database=dbname,
                     extra_args='autocommit;trusted_connection')
cursor = cnx.cursor()
cursor.execute("BACKUP DATABASE ? TO DISK= ? ", (dbname, filename,))
prev_size = -1
err_count = 0
same_size_count = 0
while err_count < 10 and same_size_count < 10:
    time.sleep(1)
    try:
        size = os.path.getsize(filename)
        print 'file size', size
    except OSError, exc:
        err_count +=1
        print exc
    if size > prev_size:
        same_size_count = 0
        prev_size = size
    else:
       same_size_count += 1
cnx.close()

I hope sharing this will save some people time...

Note: get_connection() comes from logilab.common.db which is a wrapper module which tries to simplify writing code for different database backends by handling once for all various idiosyncrasies. If you want pure pyodbc code, you can replace it with:

from pyodbc import connect
cnx = connect(driver='SQL Server Native Client 10.0',
              host='locahost',
              database=dbname,
              trusted_connection='yes',
              autocommit=True)

The autocommit=True part is especially important, because BACKUP DATABASE will fail if run from within a transaction.