Blog entries may 2012 [2]

What's new in CubicWeb 3.15

2012/05/14 by Sylvain Thenault

CubicWeb 3.15 introduces a bunch of new functionalities. In short (more details below):

  • ability to use ZMQ instead of Pyro to connect to repositories
  • ZMQ inter-instances messages bus
  • new LDAP source using the datafeed approach, much more flexible than the legacy 'ldapuser' source
  • full undo support

Plus some refactorings regarding Ajax function calls, WSGI, the registry, etc. Read more for the detail.

New functionalities

  • Add ZMQ server, based on the cutting edge ZMQ socket library. This allows to access distant instances, in a similar way as Pyro.
  • Publish/subscribe mechanism using ZMQ for communication among cubicweb instances. The new zmq-address-sub and zmq-address-pub configuration variables define where this communication occurs. As of this release this mechanism is used for entity cache invalidation.
  • Improved WSGI support. While there are still some caveats, most of the code which was twisted only is now generic and allows related functionalities to work with a WSGI front-end.
  • Full undo/transaction support: undo of modifications has finally been implemented, and the configuration simplified (basically you activate it or not on an instance basis).
  • Controlling HTTP status code returns is now much easier:
    • WebRequest now has a status_out attribute to control the response status ;
    • most web-side exceptions take an optional status argument.

API changes

  • The base registry implementation has been moved to a new logilab.common.registry module (see #1916014). This includes code from :

    • cubicweb.vreg (everything that was in there)
    • cw.appobject (base selectors and all).

    In the process, some renaming was done:

    • the top level registry is now RegistryStore (was VRegistry), but that should not impact CubicWeb client code;
    • former selectors functions are now known as "predicate", though you still use predicates to build an object'selector;
    • for consistency, the objectify_selector decorator has hence been renamed to objectify_predicate;
    • on the CubicWeb side, the selectors module has been renamed to predicates.

    Debugging refactoring dropped the need for the lltrace decorator. There should be full backward compat with proper deprecation warnings. Notice the yes predicate and objectify_predicate decorator, as well as the traced_selection function should now be imported from the logilab.common.registry module.

  • All login forms are now submitted to <app_root>/login. Redirection to requested page is now handled by the login controller (it was previously handled by the session manager).

  • Publisher.publish has been renamed to Publisher.handle_request. This method now contains a generic version of the logic previously handled by Twisted. Controller.publish is not affected.

Unintrusive API changes

  • New 'ldapfeed' source type, designed to replace 'ldapuser' source with data-feed (i.e. copy based) source ideas.
  • New 'zmqrql' source type, similar to 'pyrorql' but using ømq instead of Pyro.
  • A new registry called 'services' has appeared, where you can register server-side cubicweb.server.Service child classes. Their call method can be invoked from a web-side AppObject instance using the new self._cw.call_service method or a server-side one using self.session.call_service. This is a new way to call server-side methods, much cleaner than monkey patching the Repository class, which becomes a deprecated way to perform similar tasks.
  • a new ajaxfunction registry now hosts all remote functions (i.e. functions callable through the asyncRemoteExec JS api). A convenience ajaxfunc decorator will let you expose your python functions easily without all the appobject standard boilerplate. Backwards compatibility is preserved.
  • the 'json' controller is now deprecated in favor of the 'ajax' one.
  • WebRequest.build_url can now take a __secure__ argument. When True, cubicweb tries to generate an https url.

User interface changes

A new 'undohistory' view exposes the undoable transactions and gives access to undo some of them.


Thoughts on CubicWeb 4.0

2012/05/14 by Sylvain Thenault

This is a fairly technical post talking about the structural changes I would like to see in CubicWeb's near future. Let's call that CubicWeb 4.0! It also drafts ideas on how to go from here to there. Draft, really. But that will eventually turn into a nice roadmap hopefully.

The great simplification

Some parts of cubicweb are sometimes too hairy for different reasons (some good, most bad). This participates in the difficulty to get started quickly. The goal of CubicWeb 4.0 should be to make things simpler :

  • Fix some bad old design.
  • Stop reinventing the wheel and use widely used libraries in the Python Web World. This extends to benefitting from state of the art libraries to build nice and flexible UI such as Bootstrap, on top of the JQuery foundations (which could become as prominent as the Python standard library in CubicWeb, the development team should get ready for it).
  • If there is a best way to do something, just do it and refrain from providing configurability and options.

On the road to Bootstrap

First, a few simple things could be done to simplify the UI code:

  • drop xhtml support: always return text/html content type, stop bothering with this stillborn stuff and use html5
  • move away everything that should not be in the framework: calendar?, embedding, igeocodable, isioc, massmailing, owl?, rdf?, timeline, timetable?, treeview?, vcard, wdoc?, xbel, xmlrss?

Then we should probably move the default UI into some cubes (i.e. the content of cw.web.views and cw.web.data). Besides making the move to Bootstrap easier, this should also have the benefit of making clearer that this is the default way to build an (automatic) UI in CubicWeb, but one may use other, more usual, strategies (such as using a template language).

At a first glance, we should start with the following core cubes:

  • corelayout, the default interface layout and generic components. Modules to backport there: application (not an appobject yet), basetemplates, error, boxes, basecomponents, facets, ibreadcrumbs, navigation, undohistory.
  • coreviews, the default generic views and forms. Modules to backport there: actions, ajaxedit, baseviews, autoform, dotgraphview, editcontroller, editforms, editviews, forms, formrenderers, primary, json, pyviews, tableview, reledit, tabs.
  • corebackoffice, the concrete views for the default back-office that let you handle users, sources, debugging, etc. through the web. Modules to backport here: cwuser, debug, bookmark, cwproperties, cwsources, emailaddress, management, schema, startup, workflow.
  • coreservices, the various services, not directly related to display of something. Modules to backport here: ajaxcontroller, apacherewrite, authentication, basecontrollers, csvexport, idownloadable, magicsearch, sessions, sparql, sessions, staticcontrollers, urlpublishing, urlrewrite.

This is a first draft that will need some adjustements. Some of the listed modules should be split (e.g. actions, boxes,) and their content moved to different core cubes. Also some modules in cubicweb.web packages may be moved to the relevant cube.

Each cube should provide an interface so that one could replace it with another one. For instance, move from the default coreviews and corelayout cube to bootstrap based ones. This should allow a nice migration path from the current UI to a Bootstrap based UI. Bootstrap should probably be introduced bottom-up: start using it for tables, lists, etc. then go up until the layout defined in the main template. The Orbui experience should greatly help us by pointing at hot spots that will have to be tackled, as well as by providing a nice code base from which we should start.

Regarding current implementation, we should take care that Contextual components are a powerful way to build "pluggable" UI, but we should probably add an intermediate layer that would make more obvious / explicit:

  • what the available components are
  • what the available slots are
  • which component should go in which slot when possible

Also at some point, we should take care to separate view's logic from HTML generation: our experience with client works shows that a common need is to use the logic but produce a different HTML. Though we should wait for more use of Bootstrap and related HTML simplification to see if the CSS power doesn't somewhat fulfill that need.

On the road to proper tasks management

The current looping task / repo thread mecanism is used for various sort of things and has several problems:

  • tasks don't behave similarly in a multi-instances configuration (some should be executed in a single instance, some in a subset); the tasks system has been originally written in a single instance context; as of today this is (sometimes) handled using configuration options (that will have to be properly set in each instance configuration file);
  • tasks is a repository only api but we also need web-side tasks;
  • there is probably some abuse of the system that may lead to unnecessary resources usage.

Analyzing a sample http://www.logilab.org/ instance, below are the running looping task by categories. Tasks that have to run on each web instance:

  • clean_sessions, automatically closes unused repository sessions. Notice cw.etwist.server also records a twisted task to clean web sessions. Some changes are imminent on this, they will be addressed in the upcoming refactoring session (that will become more and more necessary to move on several points listed here).
  • regular_preview_dir_cleanup (preview cube), cleanup files in the preview filesystem directory. Could be executed by a (some of the) web instance(s) provided that the preview directory is shared.

Tasks that should run on a single instance:

  • update_feeds, update copy based sources (e.g. datafeed, ldapfeed). Controlled by 'synchronize' source configuration (persistent source attribute that may be overridden by instance using CWSourceHostConfig entities)
  • expire_dataimports, delete CWDataImport entities older than an amount of time specified in the 'logs-lifetime' configuration option. Not controlled yet.
  • cleanup_auth_cookies (rememberme cube), delete CWAuthCookie entities whose life-time is exhausted. Not controlled yet.
  • cleaning_revocation_key (forgotpwd cube), delete Fpasswd entities with past revocation_date. Not controlled yet.
  • cleanup_plans (narval cube), delete Plan entities instance older than an amount of time specified in the configuration. If 'plan-cleanup-delay' is set to an empty value, the task isn't started.
  • refresh_local_repo_caches (vcsfile cube), pull or clone vcs repositories cache if the Repository entity ask to import_revision_content (hence web instance should have up to date cache to display files content) or if 'repository-import' configuration option is set to 'yes'; import vcs repository content as entities if 'repository-import' configuration option and it is coming from the system source.

Some deeper thinking is needed here so we can improve things. That includes thinking about:

  • the inter-instances messages bus based on zmq and introduced in 3.15,
  • the Celery project (http://celeryproject.org/), an asynchronous task queue, widely used and written in Python,

Remember the more cw independent the tasks are, the better it is. Though we still want an 'all-integrated' approach, e.g. not relying on external configuration of Unix specific tools such as CRON. Also we should see if a hard-dependency on Celery or a similar tool could be avoided, and if not if it should be considered as a problem (for devops).

On the road to an easier configuration

First, we should drop the different behaviour according to presence of a '.hg' in cubicweb's directory. It currently changes the location where cubicweb external resources (js, css, images, gettext catalogs) are searched for. Speaking of implementation:

  • shared_dir returns the cubicweb.web package path instead of the path to the shared cube,
  • i18n_lib_dir returns the cubicweb/i18n directory path instead of the path to the shared/i18n cube,
  • migration_scripts_dir returns the cubicweb/misc/migration directory path instead of share/cubicweb/migration.

Moving web related objects as proposed in the Bootstrap section would resolve the problem for the content web/data and most of i18n (though some messages will remain and additional efforts will be needed here). By going further this way, we may also clean up some schema code by moving cubicweb/schemas and cubicweb/misc/migration to a cube (though only a small benefit is to be expected here).

We should also have fewer environment variables... Let's see what we have today:

  • CW_INSTANCES_DIR, where to look for instances configuration
  • CW_INSTANCES_DATA_DIR, where to look for instances persistent data files
  • CW_RUNTIME_DIR, where to look for instances run-time data files
  • CW_MODE, set to 'system' or 'user' will predefine above environment variables differently
  • CW_CUBES_PATH, additional directories where to look for cubes
  • CW_CUBES_DIR, location of the system 'cubes' directory
  • CW_INSTALL_PREFIX, installation prefix, from which we can compute path to 'etc', 'var', 'share', etc.

I would propose the following changes:

  • CW_INSTANCES_DIR is turned into CW_INSTANCES_PATH, and defaults to ~/etc/cubicweb.d if it exists and /etc/cubicweb.d (on Unix platforms) otherwise;
  • CW_INSTANCES_DATA_DIR and CW_RUNTIME_DIR are replaced by configuration file options, with smart values generated at instance creation time;
  • the above change should make CW_MODE useless;
  • CW_CUBES_DIR is to be dropped, CW_CUBES_PATH should be enough;
  • regarding CW_INSTALL_PREFIX, I'm lacking experience with non-hg-or-debian installations and don't know if this can be avoided or not.

Last but not least (for the moment), the 'web' / 'repo' / 'all-in-one' configurations, and the fact that the associated configuration file changes stinks. Ideas to stop doing this:

  • one configuration file per instance, with all options provided by installed parts of the framework used by the application.
  • activate 'services' (or not): web server, repository, zmq server, pyro server. Default services to be started are stored in the configuration file.

There is probably more that can be done here (less configuration options?), but that would already be a great step forward.

On the road to...

The following projects should be investigated to see if we could benefit from them:

Discussion

Remember the following goals: migration of legacy code should go smoothly. In a perfect world every application should be able to run with CubicWeb 4.0 until the backwards compatibility code is removed (and CubicWeb 4.0 will probably be released as 4.0 at that time).

Please provide feedbacks:

  • do you think choices proposed above are good/bad choices? Why?
  • do you know some additional libraries that should be investigated?
  • do you have other changes in mind that could/should be done in cw 4.0?