Thoughts on CubicWeb 4.0

This is a fairly technical post talking about the structural changes I would like to see in CubicWeb's near future. Let's call that CubicWeb 4.0! It also drafts ideas on how to go from here to there. Draft, really. But that will eventually turn into a nice roadmap hopefully.

The great simplification

Some parts of cubicweb are sometimes too hairy for different reasons (some good, most bad). This participates in the difficulty to get started quickly. The goal of CubicWeb 4.0 should be to make things simpler :

  • Fix some bad old design.
  • Stop reinventing the wheel and use widely used libraries in the Python Web World. This extends to benefitting from state of the art libraries to build nice and flexible UI such as Bootstrap, on top of the JQuery foundations (which could become as prominent as the Python standard library in CubicWeb, the development team should get ready for it).
  • If there is a best way to do something, just do it and refrain from providing configurability and options.

On the road to Bootstrap

First, a few simple things could be done to simplify the UI code:

  • drop xhtml support: always return text/html content type, stop bothering with this stillborn stuff and use html5
  • move away everything that should not be in the framework: calendar?, embedding, igeocodable, isioc, massmailing, owl?, rdf?, timeline, timetable?, treeview?, vcard, wdoc?, xbel, xmlrss?

Then we should probably move the default UI into some cubes (i.e. the content of cw.web.views and cw.web.data). Besides making the move to Bootstrap easier, this should also have the benefit of making clearer that this is the default way to build an (automatic) UI in CubicWeb, but one may use other, more usual, strategies (such as using a template language).

At a first glance, we should start with the following core cubes:

  • corelayout, the default interface layout and generic components. Modules to backport there: application (not an appobject yet), basetemplates, error, boxes, basecomponents, facets, ibreadcrumbs, navigation, undohistory.
  • coreviews, the default generic views and forms. Modules to backport there: actions, ajaxedit, baseviews, autoform, dotgraphview, editcontroller, editforms, editviews, forms, formrenderers, primary, json, pyviews, tableview, reledit, tabs.
  • corebackoffice, the concrete views for the default back-office that let you handle users, sources, debugging, etc. through the web. Modules to backport here: cwuser, debug, bookmark, cwproperties, cwsources, emailaddress, management, schema, startup, workflow.
  • coreservices, the various services, not directly related to display of something. Modules to backport here: ajaxcontroller, apacherewrite, authentication, basecontrollers, csvexport, idownloadable, magicsearch, sessions, sparql, sessions, staticcontrollers, urlpublishing, urlrewrite.

This is a first draft that will need some adjustements. Some of the listed modules should be split (e.g. actions, boxes,) and their content moved to different core cubes. Also some modules in cubicweb.web packages may be moved to the relevant cube.

Each cube should provide an interface so that one could replace it with another one. For instance, move from the default coreviews and corelayout cube to bootstrap based ones. This should allow a nice migration path from the current UI to a Bootstrap based UI. Bootstrap should probably be introduced bottom-up: start using it for tables, lists, etc. then go up until the layout defined in the main template. The Orbui experience should greatly help us by pointing at hot spots that will have to be tackled, as well as by providing a nice code base from which we should start.

Regarding current implementation, we should take care that Contextual components are a powerful way to build "pluggable" UI, but we should probably add an intermediate layer that would make more obvious / explicit:

  • what the available components are
  • what the available slots are
  • which component should go in which slot when possible

Also at some point, we should take care to separate view's logic from HTML generation: our experience with client works shows that a common need is to use the logic but produce a different HTML. Though we should wait for more use of Bootstrap and related HTML simplification to see if the CSS power doesn't somewhat fulfill that need.

On the road to proper tasks management

The current looping task / repo thread mecanism is used for various sort of things and has several problems:

  • tasks don't behave similarly in a multi-instances configuration (some should be executed in a single instance, some in a subset); the tasks system has been originally written in a single instance context; as of today this is (sometimes) handled using configuration options (that will have to be properly set in each instance configuration file);
  • tasks is a repository only api but we also need web-side tasks;
  • there is probably some abuse of the system that may lead to unnecessary resources usage.

Analyzing a sample http://www.logilab.org/ instance, below are the running looping task by categories. Tasks that have to run on each web instance:

  • clean_sessions, automatically closes unused repository sessions. Notice cw.etwist.server also records a twisted task to clean web sessions. Some changes are imminent on this, they will be addressed in the upcoming refactoring session (that will become more and more necessary to move on several points listed here).
  • regular_preview_dir_cleanup (preview cube), cleanup files in the preview filesystem directory. Could be executed by a (some of the) web instance(s) provided that the preview directory is shared.

Tasks that should run on a single instance:

  • update_feeds, update copy based sources (e.g. datafeed, ldapfeed). Controlled by 'synchronize' source configuration (persistent source attribute that may be overridden by instance using CWSourceHostConfig entities)
  • expire_dataimports, delete CWDataImport entities older than an amount of time specified in the 'logs-lifetime' configuration option. Not controlled yet.
  • cleanup_auth_cookies (rememberme cube), delete CWAuthCookie entities whose life-time is exhausted. Not controlled yet.
  • cleaning_revocation_key (forgotpwd cube), delete Fpasswd entities with past revocation_date. Not controlled yet.
  • cleanup_plans (narval cube), delete Plan entities instance older than an amount of time specified in the configuration. If 'plan-cleanup-delay' is set to an empty value, the task isn't started.
  • refresh_local_repo_caches (vcsfile cube), pull or clone vcs repositories cache if the Repository entity ask to import_revision_content (hence web instance should have up to date cache to display files content) or if 'repository-import' configuration option is set to 'yes'; import vcs repository content as entities if 'repository-import' configuration option and it is coming from the system source.

Some deeper thinking is needed here so we can improve things. That includes thinking about:

  • the inter-instances messages bus based on zmq and introduced in 3.15,
  • the Celery project (http://celeryproject.org/), an asynchronous task queue, widely used and written in Python,

Remember the more cw independent the tasks are, the better it is. Though we still want an 'all-integrated' approach, e.g. not relying on external configuration of Unix specific tools such as CRON. Also we should see if a hard-dependency on Celery or a similar tool could be avoided, and if not if it should be considered as a problem (for devops).

On the road to an easier configuration

First, we should drop the different behaviour according to presence of a '.hg' in cubicweb's directory. It currently changes the location where cubicweb external resources (js, css, images, gettext catalogs) are searched for. Speaking of implementation:

  • shared_dir returns the cubicweb.web package path instead of the path to the shared cube,
  • i18n_lib_dir returns the cubicweb/i18n directory path instead of the path to the shared/i18n cube,
  • migration_scripts_dir returns the cubicweb/misc/migration directory path instead of share/cubicweb/migration.

Moving web related objects as proposed in the Bootstrap section would resolve the problem for the content web/data and most of i18n (though some messages will remain and additional efforts will be needed here). By going further this way, we may also clean up some schema code by moving cubicweb/schemas and cubicweb/misc/migration to a cube (though only a small benefit is to be expected here).

We should also have fewer environment variables... Let's see what we have today:

  • CW_INSTANCES_DIR, where to look for instances configuration
  • CW_INSTANCES_DATA_DIR, where to look for instances persistent data files
  • CW_RUNTIME_DIR, where to look for instances run-time data files
  • CW_MODE, set to 'system' or 'user' will predefine above environment variables differently
  • CW_CUBES_PATH, additional directories where to look for cubes
  • CW_CUBES_DIR, location of the system 'cubes' directory
  • CW_INSTALL_PREFIX, installation prefix, from which we can compute path to 'etc', 'var', 'share', etc.

I would propose the following changes:

  • CW_INSTANCES_DIR is turned into CW_INSTANCES_PATH, and defaults to ~/etc/cubicweb.d if it exists and /etc/cubicweb.d (on Unix platforms) otherwise;
  • CW_INSTANCES_DATA_DIR and CW_RUNTIME_DIR are replaced by configuration file options, with smart values generated at instance creation time;
  • the above change should make CW_MODE useless;
  • CW_CUBES_DIR is to be dropped, CW_CUBES_PATH should be enough;
  • regarding CW_INSTALL_PREFIX, I'm lacking experience with non-hg-or-debian installations and don't know if this can be avoided or not.

Last but not least (for the moment), the 'web' / 'repo' / 'all-in-one' configurations, and the fact that the associated configuration file changes stinks. Ideas to stop doing this:

  • one configuration file per instance, with all options provided by installed parts of the framework used by the application.
  • activate 'services' (or not): web server, repository, zmq server, pyro server. Default services to be started are stored in the configuration file.

There is probably more that can be done here (less configuration options?), but that would already be a great step forward.

On the road to...

The following projects should be investigated to see if we could benefit from them:

Discussion

Remember the following goals: migration of legacy code should go smoothly. In a perfect world every application should be able to run with CubicWeb 4.0 until the backwards compatibility code is removed (and CubicWeb 4.0 will probably be released as 4.0 at that time).

Please provide feedbacks:

  • do you think choices proposed above are good/bad choices? Why?
  • do you know some additional libraries that should be investigated?
  • do you have other changes in mind that could/should be done in cw 4.0?