Blog entries

CubicWeb roadmap meeting on July 3rd, 2014

2014/06/26 by Nicolas Chauvat

The Logilab team holds a roadmap meeting every two months to plan its CubicWeb development effort. The previous roadmap meeting was in May 2014.

Here is the report about the July 3rd, 2014 meeting. Christophe de Vienne (Unlish) and Dimitri Papadopoulos (CEA) joined us to express their concerns and discuss the future of CubicWeb.

Versions

Version 3.17

This version is stable but old and maintainance will continue only as long as some customers will be willing to pay for it (current is 3.17.15 with 3.17.16 in development).

Version 3.18

This version is stable and maintained (current is 3.18.5 with 3.18.6 in development).

Version 3.19

This version was published at the end of April and has now been tested on our internal servers. It includes support for Cross Origin Resource Sharing (CORS) and a heavy refactoring that modifies sessions and sources to lay the path for CubicWeb 4.

For details read the release notes or the list of tickets for CubicWeb 3.19.0. Current is 3.19.2

Version 3.20

This version is under development. It will try to reduce as much as possible the stock of patches in the state "reviewed", "awaiting review" and "in progress". If you have had something in the works that has not been accepted yet, please ready it for 3.20 and get it merged.

It should still include the work done for CWEP-002 (computed attributes and relations.

For details read list of tickets for CubicWeb 3.20.0.

Version 3.21 (or maybe 4.0?)

Removal of the dbapi, merging of Connection and ClientConnection, CWEP-003 (adding a FROM clause to RQL).

Cubes

Cubes published over the past two months

New cubes

  • cubicweb-frbr: Cube providing a schema based on FRBR entities
  • cubicweb-clinipath
  • cubicweb-fastimport

CWEPs

Here is the status of open CubicWeb Evolution Proposals:

CWEP-0002 only missing a bit of migration support, to be finished soon for inclusion in 3.20.

CWEP-0003 has been reviewed and is waiting for a bit of reshaping that should occurs soon. It's targeted for 3.21.

New CWEPs are expected to be written for clarifying the API of the _cw object, supporting persistent sessions and improving the performance of massive imports.

Work in progress

Design

The new logo is now published in the 3.19 line. David showed us his experimentation that modernize a forge's ui with a bit of CSS. There is still a bit of pressure on the bootstrap side though, as it still rely on heavy monkey-patching in the cubicweb-bootstrap cube.

Data import

Also, Dimitry expressed is concerns with the lack of proper data import API. We should soon have some feedback from Aurelien's cubicweb-fastimport experimentation, which may be an answer to Dimitry's need. In the end, we somewhat agreed that there were different needs (eg massive-no-consistency import vs not-so-big-but-still-safe), that cubicweb.dataimport was an attempt to answer them all and then cubicweb-dataio and cubicweb-fastimport were more specific responses. In the end we may reasonably hope that an API will emerge.

Removals

On his way to persistent sessions, Aurélien made a huge progress toward silence of warnings in the 3.19 tests. dbapi has been removed, ClientConnection / Connection merged. We decided to take some time to think about the recurring task management as it is related to other tricky topics (application / instance configuration) and it's not directly related to persistent session.

Rebasing on Pyramid

Last but not least, Christophe demonstrated that CubicWeb could basically live with Pyramid. This experimentation will be pursued as it sounds very promising to get the good parts from the two framework.

Agenda

Logilab's next roadmap meeting will be held at the beginning of september 2014 and Christophe and Dimitri were invited.


Logilab's roadmap for CubicWeb on May 15th, 2014

2014/05/21 by Nicolas Chauvat

The Logilab team holds a roadmap meeting every two months to plan its CubicWeb development effort. Here is the report about the May 15th, 2014 meeting. The previous report posted to the blog was the march 2014 roadmap.

Versions

Version 3.17

This version is stable but old and maintainance will continue only as long as some customers will be willing to pay for it (current is 3.17.15).

Version 3.18

This version is stable and maintained (current is 3.18.4).

Version 3.19

This version was published at the end of April. It includes support for Cross Origin Resource Sharing (CORS) and a heavy refactoring that modifies sessions and sources to lay the path for CubicWeb 4.

For details read the release notes or the list of tickets for CubicWeb 3.19.0.

Version 3.20

This version is under development. It will try to reduce as much as possible the stock of patches in the state "reviewed", "awaiting review" and "in progress". If you have had something in the works that has not been accepted yet, please ready it for 3.20 and get it merged.

It should also include the work done for CWEP-002 (computed attributes and relations) and the merging of Connection and ClientConnection if it happens to be simple enough to get done quickly (in case the removal of dbapi would really help, this merging will wait for 3.21).

For details read list of tickets for CubicWeb 3.20.0.

Version 3.21 (or maybe 4.0?)

Removal of the dbapi and merging of CWEP-003 (adding a FROM clause to RQL).

Cubes

Here is a list of cubes that had versions published over the past two months: accidents, awstats, book, bootstrap, brainomics, cmt, collaboration, condor, container, dataio, expense, faq, file, forge, forum, genomics, geocoding, inlineedit, inventory, keyword, link, mailinglist, mediaplayer, medicalexp, nazcaui, ner, neuroimaging, newsaggregator, processing, questionnaire, rqlcontroller, semnews, signedrequest, squareui, task, testcard, timesheet, tracker, treeview, vcsfile, workorder.

Here are a the new cubes we are pleased to announce:

rqlcontroller receives via a POST a list of RQL queries and executes them. This is a way to build web services.

wsme is helping build a web service API on top of a CubicWeb database.

signedrequest is a simple token based authentication system. This is a way for scripts or callback urls to access an instance without login/pwd information.

relationwidget is a widget usable in forms to edit relationships between objects. It depends on CubicWeb 3.19.

searchui is an experiment on adding blocks to the list of facets that allow building complex RQL queries step by step by clicking with the mouse instead of directly writing the RQL with the keyboard.

ckan is using the REST API of a CKAN data portal to mirror its content.

CWEPs

Here is the status of open CubicWeb Evolution Proposals:

CWEP-0002 is now in good shape and the goal is to have it merged into 3.20. It lacks some documentation and a migration script.

CWEP-0003 has made good progress during the latest sprint, but will need a thorough review before being merged. It will probably not be ready for 3.20 and have to wait for 3.21.

New CWEPs are expected to be written for clarifying the API of the _cw object, supporting persistent sessions and improving the performance of massive imports.

Visual identity

CubicWeb has a new logo that will appear before the end of may on its revamped homepage at http://www.cubicweb.org

Last but not least

As already said on the mailing list, other developers and contributors are more than welcome to share their own goals in order to define a roadmap that best fits everyone's needs.

Logilab's next roadmap meeting will be held at the beginning of july 2014.


What's new in CubicWeb 3.19

2014/05/05 by Aurelien Campeas

New functionalities

  • implement Cross Origin Resource Sharing (CORS) (see #2491768)
  • system_source.create_eid can return a range of IDs, to reduce overhead of batch entity creation

Behaviour Changes

  • The anonymous property of Session and Connection is now computed from the related user login. If it matches the anonymous-user in the config the connection is anonymous. Beware that the anonymous-user config is web specific. Therefore, no session may be anonymous in a repository only setup.

New Repository Access API

Connection replaces Session

A new explicit Connection object replaces Session as the main repository entry point. A Connection holds all the necessary methods to be used server-side (execute, commit, rollback, call_service, entity_from_eid, etc...). One obtains a new Connection object using session.new_cnx(). Connection objects need to have an explicit begin and end. Use them as a context manager to never miss an end:

with session.new_cnx() as cnx:
    cnx.execute('INSERT Elephant E, E name "Babar"')
    cnx.commit()
    cnx.execute('INSERT Elephant E, E name "Celeste"')
    cnx.commit()
# Once you get out of the "with" clause, the connection is closed.

Using the same Connection object in multiple threads will give you access to the same Transaction. However, Connection objects are not thread safe (hence at your own risks).

repository.internal_session is deprecated in favor of repository.internal_cnx. Note that internal connections are now safe by default, i.e. the integrity hooks are enabled.

Backward compatibility is preserved on Session.

dbapi vs repoapi

A new API has been introduced to replace the dbapi. It is called repoapi.

There are three relevant functions for now:

  • repoapi.get_repository returns a Repository object either from an URI when used as repoapi.get_repository(uri) or from a config when used as repoapi.get_repository(config=config).
  • repoapi.connect(repo, login, **credentials) returns a ClientConnection associated with the user identified by the credentials. The ClientConnection is associated with its own Session that is closed when the ClientConnection is closed. A ClientConnection is a Connection-like object to be used client side.
  • repoapi.anonymous_cnx(repo) returns a ClientConnection associated with the anonymous user if described in the config.

repoapi.ClientConnection replaces dbapi.Connection and company

On the client/web side, the Request is now using a repoapi.ClientConnection instead of a dbapi.Connection. The ClientConnection has multiple backward compatible methods to make it look like a dbapi.Cursor and dbapi.Connection.

Sessions used on the Web side are now the same as the ones used Server side. Some backward compatibility methods have been installed on the server side Session to ease the transition.

The authentication stack has been altered to use the repoapi instead of the dbapi. Cubes adding new elements to this stack are likely to break.

New API in tests

All current methods and attributes used to access the repo on CubicWebTC are deprecated. You may now use a RepoAccess object. A RepoAccess object is linked to a new Session for a specified user. It is able to create Connection, ClientConnection and web side requests linked to this session:

access = self.new_access('babar') # create a new RepoAccess for user babar
with access.repo_cnx() as cnx:
    # some work with server side cnx
    cnx.execute(...)
    cnx.commit()
    cnx.execute(...)
    cnx.commit()

with access.client_cnx() as cnx:
    # some work with client side cnx
    cnx.execute(...)
    cnx.commit()

with access.web_request(elephant='babar') as req:
    # some work with web request
    elephant_name = req.form['elephant']
    req.execute(...)
    req.cnx.commit()

By default testcase.admin_access contains a RepoAccess object for the default admin session.

API changes

  • RepositorySessionManager.postlogin is now called with two arguments, request and session. And this now happens before the session is linked to the request.
  • SessionManager and AuthenticationManager now take a repo object at initialization time instead of a vreg.
  • The async argument of _cw.call_service has been dropped. All calls are now synchronous. The zmq notification bus looks like a good replacement for most async use cases.
  • repo.stats() is now deprecated. The same information is available through a service (_cw.call_service('repo_stats')).
  • repo.gc_stats() is now deprecated. The same information is available through a service (_cw.call_service('repo_gc_stats')).
  • repo.register_user() is now deprecated. The functionality is now available through a service (_cw.call_service('register_user')).
  • request.set_session no longer takes an optional user argument.
  • CubicwebTC does not have repo and cnx as class attributes anymore. They are standard instance attributes. set_cnx and _init_repo class methods become instance methods.
  • set_cnxset and free_cnxset are deprecated. The database connection acquisition and release cycle is now more transparent.
  • The implementation of cascading deletion when deleting composite entities has changed. There comes a semantic change: merely deleting a composite relation does not entail any more the deletion of the component side of the relation.
  • _cw.user_callback and _cw.user_rql_callback are deprecated. Users are encouraged to write an actual controller (e.g. using ajaxfunc) instead of storing a closure in the session data.
  • A new entity.cw_linkable_rql method provides the rql to fetch all entities that are already or may be related to the current entity using the given relation.

Deprecated Code Drops

  • The session.hijack_user mechanism has been dropped.
  • EtypeRestrictionComponent has been removed, its functionality has been replaced by facets a while ago.
  • the old multi-source support has been removed. Only copy-based sources remain, such as datafeed or ldapfeed.

Logilab's roadmap for CubicWeb on March 7th, 2014

2014/03/10 by Nicolas Chauvat

The Logilab team holds a roadmap meeting every two months to plan its CubicWeb development effort. Here is the report about the Mar 7th, 2014 meeting. The previous report posted to the blog was the january 2014 roadmap.

Version 3.17

This version is stable but old and maintainance will stop in a few weeks (current is 3.17.13 and 3.17.14 is upcoming).

Version 3.18

This version is stable and maintained (current is 3.18.3 and 3.18.4 is upcoming).

Version 3.19

This version is about to be published. It includes a heavy refactoring that modifies sessions and sources to lay the path for CubicWeb 4.

For details read list of tickets for CubicWeb 3.19.0.

Version 3.20

This version will try to reduce as much as possible the stock of patches in the state "reviewed", "awaiting review" and "in progress". If you have had something in the works that has not been accepted yet, please ready it for 3.20 and get it merged.

It should also include the work done for CWEP-002 (computed attributes and relations) and CWEP-003 (adding a FROM clause to RQL).

For details read list of tickets for CubicWeb 3.20.0.

Cubes

Here is a list of cubes that had versions published over the past two months: addressbook, awstats, blog, bootstrap, brainomics, comment, container, dataio, genomics, invoice, mediaplayer, medicalexp, neuroimaginge, person, preview, questionnaire, securityprofile, simplefacet, squareui, tag, tracker, varnish, vcwiki, vtimeline.

Here are a the new cubes we are pleased to announce:

collaboration is a building block that reuses container and helps to define collaborative workflows where entities are cloned, modified and shared.

Our priorities for the next two months are collaboration and container, then narval/apycot, then mercurial-server, then rqlcontroller and signedrequest, then imagesearch.

Mid-term goals

The work done for CWEP-0002 (computed attributes and relations) is expected to land in CubicWeb 3.20.

The work done for CWEP-0003 (explicit data source federation using FROM in RQL) is expected to land in CubicWeb 3.20.

Tools to diagnose performance issues would be very useful. Maybe in 3.21 ?

Caching session data would help and some work was done on this topic during the sprint in february. Maybe in 3.22 ?

WSGI has made progress lately, but still needs work. Maybe in 3.23 ?

RESTfulness is a goal. Maybe in 3.24 ?

Maybe 3.25 will be in fact 4.0 ?

Events

A spring sprint will take place in Logilab's offices in Paris from April 28th to 30th. We invite all the interested parties to join us there!

Last but not least

As already said on the mailing list, other developers and contributors are more than welcome to share their own goals in order to define a roadmap that best fits everyone's needs.

Logilab's next roadmap meeting will be held at the beginning of may 2014.


CubicWeb sprint / winter 2014

2014/02/12 by Nicolas Chauvat

This sprint took place at Logilab's offices in Paris on Feb 13/14. People from CEA, Unlish, Crealibre and Logilab teamed up to push CubicWeb forward.

We did not forget the priorities from the roadmap:

  • CubicWeb 3.17.13 and 3.18.3 were released, and CubicWeb 3.19 made progress
  • the branch about ComputedAttributes and ComputedRelations (CWEP-002) is ready to be merged,
  • the branch about the FROM clause (CWEP-003) made progress (the CWEP was reviewed and part of the resulting spec was implemented),
  • in order to reduce work in progress, the number of patches in state reviewed or pending-review was brought down to 243 (from 302, that is 60 or 20%, which is not bad).

CubicWeb using Postgresql at its best

2014/02/08 by Nicolas Chauvat

We had a chat today with a core contributor to Postgresql from whom we may buy consulting services in the future. We discussed how CubicWeb could get the best out of Postgresql:

  • making use of the LISTEN/NOTIFY mechanism built into PG could be useful (to warn the cache about modified items for example) and PgQ is its good friend;
  • views (materialized or not) are another way to implement computed attributes and relations (see CWEP number 002) and it could be that the Entities table is in fact a view of other tables;
  • implementing RQL as an in-database language could open the door to new things (there is PL/pgSQL, PL/Python, what if we had PL/RQL?);
  • Foreign Data Wrappers written with Multicorn would be another way to write data feeds (see LDAP integration for an example);
  • managing dates can be tricky when users reside in different timezones and UTC is important to keep in mind (unicode/str is a good analogy);
  • for transitive closures that are often needed when implementing access control policies with __permissions, Postgresql can go a long way with queries like "WITH ... (SELECT UNION ALL SELECT RETURNING *) UPDATE USING ...";
  • the fastest way to load tabular data that does not need too much pre-processing is to create a temporary table in memory, then COPY-FROM the data into that table, then index it, then write the transform and load step in SQL (maybe with PL/Python);
  • when executing more than 10 updates in a row, it is better to write into a temporary table in memory, then update the actual tables with UPDATE USING (let's check if the psycopg driver does that when executemany is called);
  • reaching 10e8 rows in a table is at the time of this writing the stage when you should start monitoring your db seriously and start considering replication, partition and sharding.
  • full-text search is much better in Postgresql than the general public thinks it is and recent developments made it orders of magnitude faster than tools like Lucene or Solr and ElasticSearch;
  • when dealing with complex queries (searching graphs maybe), an option to consider is to implement a specific data type, use it into a materialized view and use GIN or GIST indexes over it;
  • for large scientific data sets, it could be interesting to link the numpy library into Postgresql and turn numpy arrays into a new data type;
  • Oh, and one last thing: the object-oriented tables of Postgresql are not such a great idea, unless you have a use case that fits them perfectly and does not hit their limitations (CubicWeb's is_instance_of does not seem to be one of these).

Hopin' I got you thinkin' :)

http://developer.postgresql.org/~josh/graphics/logos/elephant.png

Cubicweb sprints winter/spring 2014

2014/01/24 by David Douard

The Logilab team is pleased to announce two Cubicweb sprints to be held in its Paris offices in the upcoming months:

February 13/14th at Logilab in Paris

The agenda would be the FROM clause for which a CWEP is still awaited, and the RQL rewriter according to the CWEP02.

April 28/30th at Logilab in Paris

Agenda to be defined.

Join the party

All users and contributors of CubicWeb are invited to join the party. Just send an email to contact at Logilab.fr if you plan to come.

http://farm1.static.flickr.com/183/419945378_4ead41a76d_m.jpg

Logilab's roadmap for CubicWeb on January 9th, 2014

2014/01/14 by Nicolas Chauvat

The Logilab team holds a roadmap meeting every two months to plan its CubicWeb development effort. Here is the report about the Jan 9th, 2014 meeting. The previous report posted to the blog was the november 2013 roadmap.

Version 3.17

This version is stable and maintained (current is 3.17.11 and 3.17.12 is upcoming).

Version 3.18

This version was released on Jan 10th. Read the release notes or the details of CubicWeb 3.18.0.

Version 3.19

This version includes a heavy refactoring that modifies sessions and sources to lay the path for CubicWeb 4. It is currently the default development head in the repository and is expected to be released before the end of january.

For details read list of tickets for CubicWeb 3.19.0.

Version 3.20

This version will try to reduce as much as possible the stock of patches in the state "reviewed", "awaiting review" and "in progress". If you have had something in the works that has not been accepted yet, please ready it for 3.20 and get it merged.

For details read list of tickets for CubicWeb 3.20.0.

Cubes

The current trend is to develop more and more new features in dedicated cubes than to add more code to the core of CubicWeb. If you thought CubicWeb development was slowing down, you made a mistake, because cubes are ramping up.

Here is a list of versions that were published in the past two months: timesheet, postgis, leaflet, bootstrap, worker, container, embed, geocoding, vcreview, trackervcs, vcsfile, zone, dataio, mercurial-server, queueing, questionnaire, genomics, medicalexp, neuroimaging, brainomics, elections.

Here are a the new cubes we are pleased to announce:

Bootstrap works and we do not create a new application without it.

relationwidget provides a modal window to edit relations in forms (use uicfg to activate it).

resourcepicker provides a modal window to insert links to images and files into structured text.

rqlcontroller allows to use the INSERT, DELETE and SET keywords when sending RQL queries over HTTP. It returns JSON. Get used to it and you may forget about asking for specific web services in your apps, for it is a generic web service.

imagesearch is an image gallery with facets. You may use it as a demo of a visual search tool.

Mid-term goals

A new repository was created to have all the CubicWeb Evolution Proposals in one place.

CWEP-0002 is a work in progress about computed relations and computed attributes, or maybe more. It will be a focus of the next sprint and is targeted at CubicWeb 3.20.

A new CWEP is expected about the adding FROM keyword to RQL to implement explicit data source federation. It will be a focus of the next sprint and is targeted at CubicWeb 3.21.

Tools to diagnose performance issues would be very useful. Maybe in 3.22 ?

Caching session data would help. Maybe in 3.23 ?

WSGI has made progress lately, but still needs work. Maybe in 3.24 ?

RESTfulness is a goal. Maybe in 3.25 ?

Maybe 3.26 will be in fact 4.0 ?

Events

A sprint will take place in Logilab's offices in Paris around mid-february or at the end of april. We invite all the interested parties to join us there!

Last but not least

As already said on the mailing list, other developers and contributors are more than welcome to share their own goals in order to define a roadmap that best fits everyone's needs.

Logilab's next roadmap meeting will be held at the beginning of march 2014.


What's new in CubicWeb 3.18

2014/01/10 by Aurelien Campeas

The migration script does not handle sqlite nor mysql instances.

New functionalities

  • add a security debugging tool (see #2920304)
  • introduce an add permission on attributes, to be interpreted at entity creation time only and allow the implementation of complex update rules that don't block entity creation (before that the update attribute permission was interpreted at entity creation and update time) (see #2965518)
  • the primary view display controller (uicfg) now has a set_fields_order method similar to the one available for forms
  • new method ResultSet.one(col=0) to retrieve a single entity and enforce the result has only one row (see #3352314)
  • new method RequestSessionBase.find to look for entities (see #3361290)
  • the embedded jQuery copy has been updated to version 1.10.2, and jQuery UI to version 1.10.3.
  • initial support for wsgi for the debug mode, available through the new wsgi cubicweb-ctl command, which can use either python's builtin wsgi server or the werkzeug module if present.
  • a rql-table directive is now available in ReST fields
  • cubicweb-ctl upgrade can now generate the static data resource directory directly, without a manual call to gen-static-datadir.

API changes

  • not really an API change, but the entity write permission checks are now systematically deferred to an operation, instead of a) trying in a hook and b) if it failed, retrying later in an operation
  • The default value storage for attributes is no longer String, but Bytes. This opens the road to storing arbitrary python objects, e.g. numpy arrays, and fixes a bug where default values whose truth value was False were not properly migrated.
  • symmetric relations are no more handled by an rql rewrite but are now handled with hooks (from the activeintegrity category); this may have some consequences for applications that do low-level database manipulations or at times disable (some) hooks.
  • unique together constraints (multi-columns unicity constraints) get a name attribute that maps the CubicWeb contraint entities to the corresponding backend index.
  • BreadCrumbEntityVComponent's open_breadcrumbs method now includes the first breadcrumbs separator
  • entities can be compared for equality and hashed
  • the on_fire_transition predicate accepts a sequence of possible transition names
  • the GROUP_CONCAT rql aggregate function no longer repeats duplicate values, on the sqlite and postgresql backends

Deprecation

  • pyrorql sources have been deprecated. Multisource will be fully dropped in the next version. If you are still using pyrorql, switch to datafeed NOW!
  • the old multi-source system
  • find_one_entity and find_entities in favor of find (see #3361290)
  • the TmpFileViewMixin and TmpPngView classes (see #3400448)

Deprecated Code Drops

  • ldapuser have been dropped; use ldapfeed now (see #2936496)
  • action GotRhythm was removed, make sure you do not import it in your cubes (even to unregister it) (see #3093362)
  • all 3.8 backward compat is gone
  • all 3.9 backward compat (including the javascript side) is gone
  • the twisted (web-only) instance type has been removed

For a complete list of tickets, read CubicWeb 3.18.0.


Logilab's roadmap for CubicWeb on November 8th, 2013

2013/11/11 by Nicolas Chauvat

The Logilab team holds a roadmap meeting every two months to plan its CubicWeb development effort. Here is the report about the Nov 8th, 2013 meeting. The previous report posted to the blog was the september 2013 roadmap.

Version 3.17

This version is stable and maintained (cubicweb 3.17.11 is upcoming).

Version 3.18

This version was supposed to be released in september or october, but is stalled at the integration stage. All open tickets were moved to 3.19 and existing patches that are not ready to be merged will be more aggressively delayed to 3.19. The goal is to release 3.18 as soon as possible.

For details read list of tickets for CubicWeb 3.18.0.

Version 3.19

This version will probably be published early next year (read january or february 2014). it is planned to include a heavy refactoring that modifies sessions and sources to lay the path for CubicWeb 4.

For details read list of tickets for CubicWeb 3.19.0.

Squareui

Logilab is now developping all its new projects based on Squareui (and Bootstrap 3.0). Squareui can be considered as a usable beta, but not as feature-complete.

Logilab is looking for a UX designer to work on the general ergonomy of CubicWeb. Read the job offer.

Mid-term goals

The mid-term goals include better REST support (Representational State Transfer), complete WSGI (Python's Web Server Gateway Interface) and the FROM clause for RQL queries (to reinvent db federation outside of the core).

On the front-end side, it would be nice to be able to improve forms, maybe with client-side javascript and better support for a "json on server, js in browser" separation of concerns.

Cubes

A cube oauth was contributed in large part by Unlish, a startup that is using CubicWeb to implement its service.

A cube vcwiki is being developed by Logilab, to manage the content of a wiki with a version control system (built with the cube vcsfile).

Last but not least

As already said on the mailing list, other developers and contributors are more than welcome to share their own goals in order to define a roadmap that best fits everyone's needs.

Logilab's next roadmap meeting will be held at the beginning of january 2014.


Apache authentication

2013/10/10 by Dimitri Papadopoulos

An Apache front end might be useful, as Apache provides standard log files, monitoring or authentication. In our case, we have Apache authenticate users before they are cleared to access our CubicWeb application. Still, we would like user accounts to be managed within a CubicWeb instance, avoiding separate sets of identifiers, one for Apache and the other for CubicWeb.

We have to address two issues:

  • have Apache authenticate users against accounts in the CubicWeb database,
  • have CubicWeb trust Apache authentication.

Apache authentication against CubicWeb accounts

A possible solution would be to access the identifiers associated to a CubicWeb account at the SQL level, directly from the SQL database underneath a CubicWeb instance. The login password can be found in the cw_login and cw_upassword columns of the cw_cwuser table. The benefit is that we can use existing Apache modules for authentication against SQL databases, typically mod_authn_dbd. On the other hand this is highly dependant on the underlying SQL database.

Instead we have chosen an alternate solution, directly accessing the CubicWeb repository. Since we need Python to access the repository, our sysasdmins have deployed mod_python on our Apache server.

We wrote a Python authentication module that accesses the repository using ZMQ. Thus ZMQ needs be enabled. To enable ZMQ uncomment and complete the following line in all-in-one.conf:

zmq-repository-address=zmqpickle-tcp://localhost:8181

The Python authentication module looks like:

from mod_python import apache
from cubicweb import dbapi
from cubicweb import AuthenticationError

def authenhandler(req):
    pw = req.get_basic_auth_pw()
    user = req.user

    database = 'zmqpickle-tcp://localhost:8181'
    try:
        cnx = dbapi.connect(database, login=user, password=pw)
    except AuthenticationError:
        return apache.HTTP_UNAUTHORIZED
    else:
        cnx.close()
        return apache.OK

CubicWeb trusts Apache

Our sysadmins set up Apache to add x-remote-user to the HTTP headers forwarded to CubicWeb - more on the relevant Apache configuration in the next paragraph.

We then add the cubicweb-trustedauth cube to the dependencies of our CubicWeb application. We simply had to add to the __pkginfo__.py file of our CubicWeb application:

__depends__ =  {
    'cubicweb': '>= 3.16.1',
    'cubicweb-trustedauth': None,
}

This cube gets CubicWeb to trust the x-remote-user header sent by the Apache front end. CubicWeb bypasses its own authentication mechanism. Users are directly logged into CubicWeb as the user with a login identical to the Apache login.

Apache configuration and deployment

Our Apache configuration looks like:

<Location /apppath >
  AuthType Basic
  AuthName "Restricted Area"
  AuthBasicAuthoritative Off
  AuthUserFile /dev/null
  require valid-user

  PythonAuthenHandler cubicwebhandler

  RewriteEngine On
  RewriteCond %{REMOTE_USER} (.*)
  RewriteRule . - [E=RU:%1]
<Location /apppath >

RequestHeader set X-REMOTE-USER %{RU}e

ProxyPass          /apppath  http://127.0.0.1:8080
ProxyPassReverse   /apppath  http://127.0.0.1:8080

The CubicWeb application is accessed as http://ourserver/apppath/.

The Python authentication module is deployed as /usr/lib/python2.7/dist-packages/cubicwebhandler/handler.py where cubicwebhandler is the attribute associated to PythonAuthenHandler in the Apache configuration.


Brainomics / CrEDIBLE conference report

2013/10/09 by Vincent Michel

Cubicweb and the Brainomics project were presented last week at the CrEDIBLE workshop (October 2-4, 2013, Sophia-Antipolis) on "Federating distributed and heterogeneous biomedical data and knowledge". We would like to thank the organizers for this nice opportunity to show the features of CubicWeb and Brainomics in the context of biomedical data.

http://credible.i3s.unice.fr/lib/tpl/credible/images/credible.png

Workshop highlights

  • A short presentation of SHI3LD that defines data access based on conditions that are based on ASK request. The other part was a state of the art of Open data license, and the (poor) existence of licenses expressed in RDF. Future work seems to be an interesting combination of both SHI3LD and RDF-based licenses for data access.
  • MIDAS, an open-source software for sharing medical data. This project could be an interesting source of inspiration for the file sharing part of CubicWeb, even if the (really complicated in my opinion) case of large files downloads is not addressed for now.
  • Federated queries based on FedX - the optimization techniques based on source selection & exclusive groups seems a good approach for avoiding large data transfers and finding some (sub-)optimal ways to join the different data sources. This should be taken into account in the future work on the "FROM" clause in CubicWeb.
  • WebPIE/QueryPIE: a map-reduce-based approach for large-scale reasoning.

CubicWeb and Brainomics

The slides of the presentation can be download as a PDF or viewed on slideshare.

Some people seem confused on the RQL to SQL translation. This relies on a simple translation logic that is implemented in the rql2sql file. This is only an implementation trick, not so different from the one used in RDBMS-based triplestores that have to convert SPARQL into SQL.

RQL inference : there is no magic behind the RQL inference process. As opposed to triplestores that store RDF triples that contain their own schema, and thus cannot easily know the full data model in these triples without looking at all the triples, RQL relies on a relational database with an fixed (at a given moment) data model, thus allowing inference and simple checks. In particular, in this example, we want All the Cities of `Île de France` with more than 100 000 inhabitants ?, which is expressed in RQL:

Any X WHERE X region Y, X population > 100000,
            Y uri "http://fr.dbpedia.org/resource/Île-de-France"

and SPARQL:

select ?ville where {
?ville db-owl:region <http://fr.dbpedia.org/resource/Île-de-France> .
?ville db-owl:populationTotal ?population .
FILTER (?population > 100000)
}

Beside the fact that RQL is less verbose that SPARQL (syntax matters), the simplicity of RQL relies on the fact that it can automatically infer (similarly to SPARQL) that if X is related to Y by the region relation and has a population attribute, it should be a city. If city and district both have the region relation and a population attribute, the RQL inference allows to fetch them both transparently, otherwise one can be specific by using the is relation:

Any X WHERE X is City, X region Y, X population > 100000,
            Y uri "http://fr.dbpedia.org/resource/Île-de-France"

RQL also allows subqueries, union, full-text search, stored procedures, ... (see the doc).

These really interesting discussions convinced us that we should write a journal paper for detailing the theoretical and technical concepts behind RQL and the YAMS schema.


Logilab will be in Toulouse métropole Open Data Barcamp tomorrow

2013/10/08 by Sylvain Thenault

Meet us tomorrow at the Toulouse's Cantine where several people from Logilab will be there for the open data barcamp organized by Toulouse Metropole.

More infos on barcamp.org. We'll probably talk abouthow CubicWeb manages to import large amounts of open-data to reuse.


Logilab's roadmap for CubicWeb on September 6th, 2013

2013/09/17 by Nicolas Chauvat

The Logilab team holds a roadmap meeting every two months to plan its CubicWeb development effort. Here is the report about the Sept 6th, 2013 meeting. The previous report posted to the blog was the february 2013 roadmap.

Version 3.17

This version is now stable and maintained (release 3.17.7 is upcoming). It added a couple features and focused on putting CW to the diet by extracting some functionnalities provided by the core into external cubes: sioc, embed, massmailing, geocoding, etc.

For details read what's new in CubicWeb 3.17.

Version 3.18

This version is now freezed and will be published as soon as all the patches are tested and merged. Since we have a lot of work for clients until the end of the year at Logilab, the community should feel free to help (as usual) if it wants this version to be released rather sooner than later.

This version will remove the ldapuser source that is replaced by ldapfeed, implement Cross Origin Resource Sharing, drop some very old compatibility code, deprecate the old version of the multi-source system and provide various other features and bugfixes.

For details read list of tickets for CubicWeb 3.18.0.

Version 3.19

This version will probably be publish early next year (read january or february 2014) unless someone who is not working at Logilab takes responsibility for its release.

It should include the heavy refactoring work done by Pierre-Yves and Sylvain over the past year, that modifies sessions and sources to lay the path for CubicWeb 4.

For details read list of tickets for CubicWeb 3.19.0 or take a look at this head.

Squareui

Since Orbui changes the organization of the default user interface on screen, it was decided to share the low-level bootstrap related views that could be shared and build a SquareUI cube that would conform design choices of the default UI.

Logilab is now developping all its new projects based on Squareui 0.2. Read about it on the mailing list archives.

Mid-term goals

The mid-term goals include better REST support (Representational State Transfer), complete WSGI (Python's Web Server Gateway Interface) and the FROM clause for RQL queries (to reinvent db federation outside of the core).

Cubes

Our current plan is to extract as much as possible to cubes. We started CubicWeb many years ago with the Python motto "batteries included", but have since realized that having too much in the core contributes to making CubicWeb difficult to learn.

Since we would very much like the community to grow, we are now aiming for something more balanced, like Mercurial does. The core is designed such that most features can be developed as an extension. Once they are stable, popular extensions can be moved to the main library that is distributed with the core, and be activated with a switch in the configuration file.

Several cubes are under active development: oauth, signedrequest, dataio, etc.

Last but not least

As already said on the mailing list, other developers and contributors are more than welcome to share their own goals in order to define a roadmap that best fits everyone's needs.

Logilab's next roadmap meeting will be held at the beginning of November 2013.


Brainomics - A management system for exploring and merging heterogeneous brain mapping data

2013/09/12 by Arthur Lutz

At OBHM 2013, the 19th Annual Meeting of the Organization for Human Brain Mapping, Logilab presented a poster which explains the work done using CubicWeb on brain imaging and genetics data in collaboration with INRIA, INSERM and the CEA during the Brainomics project co-financed by Agence nationale de la Rercherche.

http://www.cubicweb.org/file/3123353/raw/Screenshot%20from%202013-09-12%2010%3A27%3A27.png

You can download this poster and try the demo online.


What's new in CubicWeb 3.17

2013/06/21 by Aurelien Campeas

What's new in CubicWeb 3.17?

New functionalities

  • add a command to compare db schema and file system schema (see #464991)
  • Add CubicWebRequestBase.content with the content of the HTTP request (see #2742453)
  • Add directive bookmark to ReST rendering (see #2545595)
  • Allow user defined final type (see #124342)

API changes

  • drop typed_eid() in favour of int() (see #2742462)
  • The SIOC views and adapters have been removed from CubicWeb and moved to the sioc cube.
  • The web page embedding views and adapters have been removed from CubicWeb and moved to the embed cube.
  • The email sending views and controllers have been removed from CubicWeb and moved to the massmailing cube.
  • RenderAndSendNotificationView is deprecated in favor of ActualNotificationOp the new operation uses the more efficient data idiom.
  • Looping task can now have an interval <= 0. Negative interval disable the looping task entirely.
  • We now serve html instead of xhtml. (see #2065651)

Deprecation

  • ldapuser has been deprecated. It will be removed in a future version. If you are still using ldapuser switch to ldapfeed NOW!
  • hijack_user has been deprecated. It will be dropped soon.

Deprecated Code Drops

  • The progress views and adapters have been removed from CubicWeb. These classes were deprecated since 3.14.0. They are still available in the iprogress cube.
  • The part of the API deprecated since 3.7 was dropped.

We're going to PGDay France, the Postgresql Community conference

2013/06/11 by Arthur Lutz

A few people of the CubicWeb team are going to attend the French PostgreSQL community conference in Nantes (France) on the 13th of june.

http://www.cubicweb.org/file/2932005/raw/hdr_left.png

We're excited to learn more about the following topics that are relevant to CubicWeb's development and features :

https://www.pgday.fr/_media/pgfr2.png

Obviously we'll pay attention to all the talks during the day. If you're attending, we hope to see you there.


OpenData meets the Semantic Web at WOD2013

2013/06/10 by Arthur Lutz

With a few people from Logilab we went to the 2nd International Workshop on Open Data (WOD), on the 3rd of june.

Although the main focus was an academic take on OpenData, a lot of talks were related to the Semantic Web technologies and especially LinkedData.

http://www.logilab.org/file/144837/raw/banniere-wod2013.png

The full program (and papers) is on the following website. Here is a quick review of the things we though worth sharing.

  • privacy oriented ontologies : http://l2tap.org/
  • interesting automations done to suggest alignments when initial data is uploaded to an opendata website
  • some opendata platforms have built-in APIs to get files, one example is Socrata : http://dev.socrata.com/
  • some work is being done to scale processing of linked data in the cloud (did you know you could access ready available datasets in the Amazon cloud ? DBPedia for example )
  • the data stored in wikipedia can be a good source of vocabulary on certain machine learning tasks (and in the future, wikidata project)
  • there is an RDF extension to Google Refine (or OpenRefine), but we haven't managed to get it working out of the box,
  • WebSmatch uses morphological operators (erosion / dilation) to identify grids and zones in Excel Spreadsheets and then aligns column data on known reference values (e.g. country lists).

We naturally enjoyed the presentation made by Romain Wenz about http://data.bnf.fr with the unavoidable mention of Victor Hugo (and CubicWeb).

Thanks to the organizers of the conference and to the National French Library for hosting the event.


data.bnf.fr gets the Stanford Prize for Innovation in Research Libraries

2013/03/01 by Nicolas Chauvat

data.bnf.fr and Gallica just got awarded the Stanford Prize for Innovation in Research Libraries 2013. The CubicWeb community is very pleased to see that data.bnf.fr, which is built with CubicWeb, is being recognized at the top international level as leading innovation its domain! Read the comments of the judges for more details.


CubicWeb at Data Tuesday on Feb 26th 2013

2013/02/15 by Nicolas Chauvat

CubicWeb was showcased at Data Tuesday on Feb 26th 2013. The other presentations were interesting, especially shacache.org, the soon-to-be-launched OpenMeteoData and the very useful scikit.learn.


CubicWeb rewarded at Dataconnexion 2013

2013/02/06 by Nicolas Chauvat

CubicWeb got rewarded yesterday at the award ceremony of the Dataconnexions 2013 contest.

http://www.cubicweb.org/2710848?vid=download

Dataconnexions is a contest organized by Etalab, the organization part of the French State that is in charge of data.gouv.fr, that catalogs the open data published by the french administration.

Congratulations to all the developers and users of CubicWeb and welcome to the people who will join the CW community thanks to the media coverage we are now experiencing.

Read the announce to the press and the slides.


Logilab's roadmap for CubicWeb as of February 2013

2013/02/04 by Nicolas Chauvat

The Logilab team now holds a roadmap meeting every two months to plan its CubicWeb development effort. Here are the decisions that were taken on Feb 1st, 2013.

Version 3.17

This version should be published before the end of March and will finish all the things that are work in progress. It will include:

  • the refactoring necessary to introduce persistant sessions,
  • the shrinking of web/views: everything that does not deserve its own cube (like sioc, embed, geocoding, etc) will go into a cube named legacyui (this will open the door to squareui),
  • stop serving pages with "content-type: application/xhtml",
  • handling postgresql schemas (will require a new version of logilab.database),
  • a new logo.

Squareui

Once the cube legacyui extracted (in version 3.17), it will be possible to move forward swiftly with squareui. Due to its other duties, one can not expect the core CW team to develop squareui. People interested will be in charge and ideally the squareui cube could be released when cubicweb 3.17 will be published.

Cleaning up the backlog

The lead CW developers will spend about 20% of their time cleaning up the ticket backlog at the forge (900 open tickets and 50 in progress !)

The first step will be to reduce the number of tickets "in progress", then to organize the open tickets and merge the duplicates.

Version 3.18

This version is due at the end of may 2013. It will include:

  • persisting sessions,
  • WSGI,
  • RESTfulness: support for HTTP verbs PUT / DELETE, enforcement of the semantics of GET / POST (may be difficult to maintain backward-compatibility)

Mid-term goals

The mid-term goals are:

  • possibility to add new base types (Array, HStore, Geometry, TSVector, etc.) that would use extensions from the SQL backend

  • FROM clause in rql queries

  • websockets

  • defining attribute on relations and defining "virtual" relations or rules:

    class Contribution(EntityType):
        author = SubjectRelation('Person', cardinality='1*', inlined=True)
        book = SubjectRelation('Book', cardinality='1*', inlined=True)
        role = SubjectRelation('Role', cardinality='1*', inlined=True)
    
    preface_writer = VirtualRelation('C is Contribution, C author S, C book O, '
                                     'C role R, R name "preface writer"')
    

    And:

    Any P WHERE B is Book, P preface_writer B
    

    Will we need a materialized view in the database, a standard relation maintained by hooks, rewrite the RQL on-the-fly ? Time will tell.

  • cards with logic (mustache js templates for example)

  • coffeescript ? brython ? javascript ? prototype something with CubicDB + WebService that outputs json + user interface in full javascript

  • package separately Cubic(Web)DB et CubicWeb ?

  • think about the overall architecture (using WSGI, persistent sessions, etc.), and find solutions that fit a distributed architecture (look at paste.deploy, circus, etc.)

  • clean up the javascript en web/data/*.js

  • configurable metadata, managing the size of the entities table

  • more SPARQL

  • namespaces for the data models of the cubes

As already said on the mailing list, other developers and contributors are more than welcome to share their own goals in order to define a roadmap that best fits everyone's needs.

Logilab's next roadmap meeting will be held at the beginning of April 2013.


What's new in CubicWeb 3.16

2013/01/23 by Aurelien Campeas

What's new in CubicWeb 3.16?

New functionalities

  • Add a new dataimport store (SQLGenObjectStore). This store enables a fast import of data (entity creation, link creation) in CubicWeb, by directly flushing information in SQL. This may only be used with PostgreSQL, as it requires the 'COPY FROM' command.

API changes

  • Orm: set_attributes and set_relations are unified (and deprecated) in favor of cw_set that works in all cases.

  • db-api/configuration: all the external repository connection information is now in an URL (see #2521848), allowing to drop specific options of pyro nameserver host, group, etc and fix broken ZMQ source. Configuration related changes:

    • Dropped 'pyro-ns-host', 'pyro-instance-id', 'pyro-ns-group' from the client side configuration, in favor of 'repository-uri'. NO MIGRATION IS DONE, supposing there is no web-only configuration in the wild.
    • Stop discovering the connection method through repo_method class attribute of the configuration, varying according to the configuration class. This is a first step on the way to a simpler configuration handling.

    DB-API related changes:

    • Stop indicating the connection method using ConnectionProperties.
    • Drop _cnxtype attribute from Connection and cnxtype from Session. The former is replaced by a is_repo_in_memory property and the later is totaly useless.
    • Turn repo_connect into _repo_connect to mark it as a private function.
    • Deprecate in_memory_cnx which becomes useless, use _repo_connect instead if necessary.
  • the "tcp://" uri scheme used for ZMQ communications (in a way reminiscent of Pyro) is now named "zmqpickle-tcp://", so as to make room for future zmq-based lightweight communications (without python objects pickling).

  • Request.base_url gets a secure=True optional parameter that yields an https url if possible, allowing hook-generated content to send secure urls (e.g. when sending mail notifications)

  • Dataimport ucsvreader gets a new boolean ignore_errors parameter.

Unintrusive API changes

  • Drop of cubicweb.web.uicfg.AutoformSectionRelationTags.bw_tag_map, deprecated since 3.6.

User interface changes

  • The RQL search bar has now some auto-completion support. It means relation types or entity types can be suggested while typing. It is an awesome improvement over the current behaviour !
  • The action box associated with table views (from tableview.py) has been transformed into a nice-looking series of small tabs; it means that the possible actions are immediately visible and need not be discovered by clicking on an almost invisible icon on the upper right.
  • The uicfg module has moved to web/views/ and ui configuration objects are now selectable. This will reduce the amount of subclassing and whole methods replacement usually needed to customize the ui behaviour in many cases.
  • Remove changelog view, as neither cubicweb nor known cubes/applications were properly feeding related files.

Other changes

  • 'pyrorql' sources will be automatically updated to use an URL to locate the source rather than configuration option. 'zmqrql' sources were broken before this change, so no upgrade is needed...
  • Debugging filters for Hooks and Operations have been added.
  • Some cubicweb-ctl commands used to show the output of msgcat and msgfmt; they don't anymore.

December 2012 CubicWeb Sprint Report

2012/12/21 by Nicolas Chauvat

For two days, on dec 13th/14th 2012, ten hackers gathered at Logilab to improve the user interface of CubicWeb. This hackathon was initiated by Crealibre. About a year ago, they started the Orbui project, a new user interface for CubicWeb based on the Bootstrap HTML/CSS framework.

http://www.orbui.com/images/itisa960.png

Several projects at Logilab and Crealibre proved that Orbui was heading in the right direction, but that it had to fight with the default user interface of Cubicweb. Orbui makes different design/ergonomic choices and needs different HTML/CSS structure and Javascript components.

Sylvain published a roadmap back in may with a section titled "on the road to Bootstrap". After more than half a day of heated debate on the firts day, it was decided to follow the direction he pointed to. We started extracting from CubicWeb the default user interface and turning it into a set of cubes:

  • cubicweb-legacyui: css, views and templates extracted from CubicWeb 3.16, so as to provide full backward compatibility
  • cubicweb-bootstrap: empty cube with only bootstrap version 2.2.2 in data/
  • cubicweb-squareui: bootstrapified version of legacyui (slightly altered to benefit from the bootstrap css without breaking backward compatibility too hard)

At the end of the sprint, one could add_cube('squareui') on an existing application and keep it usable... and get "some kind of responsiveness" for free, thus proving that we were on the right track.

A lot of work is still ahead of us, but we have moved a few step forward towards the goal of making it easier to implement different UIs on top of CubicWeb 3.17.

For the curious, here is what the skeleton of legacyui.views.maintemplate (aka cw.web.views.maintemplate) looks like:

<body> (MainTemplate.template_body_header)
  <table id="header"> (HTMLPageHeader.main_header)
    for header in self.headers:
       <td id="header-{left,center,right}">
           render selected components(ctxcomponents, header-{left,center,right})
       </td>
  </table>
  <div id="stateheader"> HTMLPageHeader.call
     <div class="stateMessage"> HTMLPageHeader.state_header
  </div>
  <div id="page"> MainTemplate.template_body_header
    <table id="mainLayout"> MainTemplate.template_body_header
      if boxes (selected components(ctxcomponents, left): MainTemplate.nav_column
        <td id="navColumnLeft">
          <div class="navboxes">
             render boxes
          </div>
        </td>
      <td id="contentColumn"> MainTemplate.template_body_header
         render selected components(rqlinput)
         render selected components(applmessages)
         if navtop (selected components(ctxcomponents, navtop): HTMLContentHeader.call
           <div id="contentheader">
             render components
           </div>
           <div class='clear'/>
         <div id="pageContent"> MainTemplate.call
           if vtitle:
              <div class="vtitle" />
           if etypenavigation:
              render etypenavigation
           view pagination
           <div id="contentmain">
              render view
           </div>
           view pagination
         </div>
         if navbottom (selected components(ctxcomponents, navbottom): HTMLContentFooter.call
           <div id="contentfooter">
             render components
           </div>
      </td>
      if boxes (selected components(ctxcomponents, right): MainTemplate.nav_column
        <div id="navColumnRight">
          <div class="navboxes">
             render boxes
          </div>
    </table>
  </div>
  <div id="footer"> HTMLPageFooter.call
     render actions selected (actions, 'footer')
  </div>
</body>

and here is what the skeleton from squareui.views.maintemplate looks like:

<body>
<div class="container-fluid">
  <div id="header" class="row-fluid">
    <!-- .header -->
  </div>
  <div class="row-fluid">
    <div id="navColumnLeft" class="span3">
      <!-- .leftcolumn -->
    </div>
    <div id="contentColumn" class="span6">
      <!-- .contentcol -->
      <div class="row-fluid">
        <div id="contentheader" class="span12">
          <!-- .contentheader -->
        </div>
      </div>
      <div class="row-fluid">
        <div id="contentmain" class="span12">
          <!-- .contentmain -->
        </div>
      </div>
      <div class="row-fluid">
        <div id="contentfooter" class="span12">
          <!-- .contentfooter -->
        </div>
      </div>
    </div>
    <div id="navColumnRight" class="span3">
      <!-- .rightcolumn -->
    </div>
  </div>
  <div id="footer" class="row-fluid">
    <!-- .footer -->
  </div>
</div>
</body>

Stay tuned for the updates on this (important) topic!


Candidature au concours dataconnexions#2

2012/12/20 by Nicolas Chauvat

Au nom de la communauté des utilisateurs et développeurs de CubicWeb, je viens de déposer la candidature suivante au concours dataconnexions#2.

1. Questionnaire de description du Projet

Intitulé du projet

CubicWeb - plate-forme libre de développement pour le web sémantique

Catégorie de concours choisie

Choisir parmi: Grand public / Professionnel / Utilité publique / Mobilité et territoires

Utilité publique (?)

Quel problème tentez-vous de résoudre ?

Décrivez le (ou les) problème(s) que votre projet tente de résoudre, ainsi que son (leur) importance : taille du marché, fréquence d’utilisation potentielle, population concernée, bénéfices éventuels de service public, etc. (maximum 1000 signes).

L'avènement du web sémantique et de l'Open Data nécessite de disposer d'outils adaptés pour développer des applications centrées sur les données.

Ces outils doivent permettre d'importer des données facilement, de les mettre en relation lorsqu'elles proviennent de sources disjointes, de les republier et de faciliter leur interrogation et leur visualisation.

Idéalement, ces outils doivent utiliser et respecter les standards ouverts d'internet afin de simplifier les communications et les échanges, mais aussi faciliter le développement pour les terminaux multiples (ordinateur, tablette, smartphone).

Comment tentez-vous de le résoudre ?

Décrivez votre produit, service ou visualisation, dans sa forme actuelle et le cas échéant après les développements futurs éventuels que vous envisagez. Précisez le ou les jeux de données publiques que vous utilisez à cet effet (maximum 1000 signes).

CubicWeb est une plate-forme libre de développement pour le web sémantique.

CubicWeb permet aux développeurs de se concentrer sur les spécificités de leur application plutôt que d'avoir à réinventer les briques essentielles de l'import, la fusion, la publication, l'interrogation et la visualisation de données.

CubicWeb est un logiciel libre développé ouvertement sur internet par une communauté réduite mais déjà internationale. CubicWeb est disponible sous licence LGPL, respecte les standards du W3C (RDF, SPARQL, HTML5, CSS3, Responsive Design) et sait gérer nativement plusieurs modèles de données faisant office de standards de fait (FOAF, SIOC, DOAP, etc).

Quel est votre modèle d’affaire ?

Décrivez le modèle d’affaire de votre projet, c’est-à-dire les conditions de sa pérennité et de son développement : plan d’affaires et projections commerciales dans le cas d’un projet entrepreneurial ; objectifs, donneurs clés, partie prenantes dans le cas d’un projet d’ordre civique (maximum 1000 signes).

Plusieurs sociétés commerciales s'appuient aujourd'hui sur CubicWeb pour vendre des services informatiques. L'objectif de cette communauté est de croître pour bénéficier d'une audience plus large et d'une mutualisation plus importante des coûts de maintenance et de développement de la plate-forme CubicWeb.

Parmi les utilisateurs de CubicWeb, on compte à ce jour la Bibliothèque nationale de France, EDF, GDF-Suez, le Commissariat à l'Energie Atomique, le Centre National d'Etudes Spatiales, l'Institut Radioprotection et Sûreté Nucléaire, l'INRIA, des laboratoires de recherche médicale et des entreprises du domaine informatique.

Quel est l’état d’avancement de votre projet ?

Décrivez les étapes que vous avez franchies, les ressources mobilisées, les indicateurs et métriques déjà établies, etc. (maximum 1000 signes).

Le projet CubicWeb est issu d'un effort de R&D commencé en 2001 par la société Logilab, qui avait comme objectif de se doter d'un outil permettant le développement d'applications centrées sur les données et respectant les standards du web sémantique en cours d'élaboration au W3C.

Depuis 2008, CubicWeb est un logiciel libre dont le développement est mené ouvertement sur internet.

Qui vous accompagne sur ce projet ?

Décrivez l’équipe qui vous accompagne dans votre projet (le cas échéant), vos compétences, expériences et réalisations, ainsi que les partenaires éventuels qui vous soutiennent (maximum 1000 signes).

N/A.

Comment DataConnexions peut-­il vous aider ?

Détaillez toutes les précisions additionnelles que vous souhaiteriez apporter au sujet de votre projet, et expliquez en quoi DataConnexions peut contribuer à pérenniser son développement (maximum 1000 signes).

Plusieurs sociétés commerciales s'appuient aujourd'hui sur CubicWeb pour vendre des services informatiques. Les utilisations industrielles de CubicWeb sont variées et concernent des applications importantes, voire critiques.

CubicWeb est un outil peu (re)connu et sa communauté est aujourd'hui réduite, malgré ses solides références et le récent engouement pour l'Open Data.

DataConnexions pourrait être une tribune et une vitrine permettant à CubicWeb de trouver de nouveaux développeurs d'applications préférant bénéficier de l'expérience capitalisée dans cet outil libre plutôt que de rédécouvrir et déjouer un par un les pièges rencontrés au cours des dix ans qui ont été nécessaires à sa réalisation.

L'objectif de cette candidature est donc de faire croître la communauté des utilisateurs et contributeurs de CubicWeb.

2. Vidéo de présentation

Lien permettant de télécharger une vidéo décrivant le Projet et ses fonctionnalités, d’une durée maximale de 3 minutes

Ce n’est pas la qualité de la vidéo qui est jugée, mais le projet lui-même. La vidéo doit permettre de rendre compte des fonctionnalités du projet. Les candidats sont encouragés à réaliser une capture d’écran ou un « screencast » (par exemple avec des outils tels que CamStudio, Jing ou Screenr).

Démonstration de l'utilisation de CubicWeb pour importer et visualiser la liste des gares françaises téléchargée depuis data.gouv.fr. Sélection des gares par le filtre à facettes et affichage sur fond de carte openstreetmap, puis export en RDF, JSON et CSV.

CubicWeb est une plate-forme libre de développement pour le web sémantique, qui permet aux développeurs de se concentrer sur les spécificités de leur application plutôt que d'avoir à réinventer les briques essentielles de l'import, la fusion, la publication, l'interrogation et la visualisation de données.

Lien vers vidéo sur youtube. Miroir de la vidéo sur vimeo.com.

3. Accès en ligne au projet

Lien permettant d’accéder au Projet, ou au code informatique compilé et interprétable du Projet

Par exemple : URL permettant de consulter, ou, le cas échéant, de télécharger l’application, accompagnée, si nécessaire, d’instructions à cet effet. L’application devra être facile à installer et aisément démontrable sur sa plateforme de destination.

http://www.cubicweb.org

4. Supports de communication

Description Non Confidentielle

Décrivez le Projet dans des termes compatibles avec une diffusion au grand public : non confidentiels, compréhensibles par le plus grand nombre, et mettant en avant l’intérêt du projet (maximum 1000 signes).

cf "comment tentez-vous de le résoudre"

Elément visuel de description

Lien vers un élément visuel décrivant et mettant en valeur le projet et ses fonctionnalités (capture d’écran, page d’accueil, schéma de description).

/file/2544364?vid=download

Logo du projet

Lien vers le logo du projet.

/file/2544362?vid=download

Links roundup from dotjs.eu

2012/12/05 by Arthur Lutz

A few people from Logilab attended the dotjs conference in Paris last week. The conference wasn't exactly what we expected, we were hoping for more technical talks. Nevertheless, some of the things we saw were quite interesting. Some of them could be relevant to CubicWeb.

http://www.cubicweb.org/file/2532779?vid=download

Here is a raw roundup of links collected last friday :


CubicWeb sprint in Paris - 2012/12/13-14

2012/11/11 by Nicolas Chauvat

Topics

To be decided. Some possible topics are :

  • Work on CubicWeb front end : Anything related to Themaintemplate, primaryview, reledit, tables handling etc.
  • Share the Evolution and more integration of the OrbUI project for CW
  • Things to do for HTML5 and bootstrap integration
  • Work on ideas from Thoughts on CubicWeb 4
  • ...

other ideas are welcome, please bring them up on cubicweb@lists.cubicweb.org

Location

This sprint will take place in decembre 2012 from thursday the 13th to friday the 14th. You are more than welcome to come along, help out and contribute. An introduction is planned for newcomers.

Network resources will be available for those bringing laptops.

Address : 104 Boulevard Auguste-Blanqui, Paris. Ring "Logilab" (googlemap)

Metro : Glacière

Contact : http://www.logilab.fr/contact

Dates : 13/12/2012 to 14/12/2012

Participants

  • Celso Flores (Crealibre - Mexico)
  • Carine Fourrier (Crealibre - Mexico)
  • ...

Building your URLs in cubicweb

2012/09/25 by Stéphane Bugat

Building your URLs in cubicweb

Aim

In cubicweb, you often have to build url's that redirect the current view to a specific entity view or allow the execution of a given action. Moreover, you often want also to fallback to the previous view once the specific action or edition is done, or redirect also to another entity's specific view.

To do so, cubicweb provides you with a set of powerful tools, however as there is often more than one way to do it, this blog entry is here to help you in choosing the preferred way.

Tools at your disposal

The universal URL builder: build_url()

build_url is accessible in any context, so for instance in the rendering of a given entity view you can call self._cw.build_url to build you URLs easily, which is the most common case. In class methods (for instance, when declaring the rendering methods of an EntityTableView), you can access it through the context of instantiated appobject which are usually given as argument, e.g. entity._cw.build_url. For test purposes you can also call session.build_url in cubicweb shells.

build_url basically take a first optional, the path, relative to the base url of the site, and arbitrary named arguments that will be encoded as url parameters. Unless you wish to direct to a custom controller, or to match an URL rewrite url, you don't have to specify the path.

Extra parameters given to build_url will vary according to your needs, however most common arguments understood by default cubicweb views are the followings:

  • vid: the built view __regid__;
  • rql: the RQL query used to retreive data on which the view should be applied;
  • eid: the identifier of an entity, which you should use instead of rql when the view apply to a single entity (most often);
  • __message: an information message to display inside the view;
  • __linkto: in case of an entity creation url, will allow to set some specific relations between both entities;
  • __redirectpath: the URL of the entity of the redirection;
  • __redirectvid: the view id of the redirection.

__redirectvid and __redirectpath are used to control redirection after posting a form and are more detailed in the cubicweb documentation, chapter related to the edition control (http://docs.cubicweb.org/devweb/edition/editcontroller.html).

Exploring entities associated URLs

Generally, an entity has two important methods that retrieve its absolute or relative urls:

  • entity.rest_path() will return something like <type>/<eid> where <type> corresponds to the entity type and <eid> the entity eid;
  • entity.absolute_url() will return the full url of the entity http://<baseurl>/<type>/<eid>. In case you want to access a specific view of the entity, just pass the vid='myviewid' argument. You can give arbitrary arguments to this method that will be encoded as url parameters.

Getting a proper RQL

Passing the rql to the build_url method requires to have a proper RQL expression. To do so, there is a convenience method, printable_rql(), that is accessible in rset resulting from RQL queries. This allows to apply a view to the same result set as the one currently process, simply using rql = self.cw_rset.printable_rql().

Getting URLs from the current view

There are several ways to get URL of the current view, the canonical one being to use self._cw.relative_path(includeparams=True) which will return the path of the current view relative to the base url of the site (otherwise use self._cw.url(), including parameters or not according to value given as includeparams).

You can also retrieve values given to individual parameters using self._cw.form, eg:

  • self._cw.form.get('vid', '') will return only the view id;
  • self._cw.form.get('rql', '') will return only the RQL;
  • self._cw.form.get('__redirectvid', '') will return the redirection view if defined;
  • self._cw.form.get('__redirectpath', '') will return the redirection path if defined.

How to redirect to non-entity view?

This case often appears when you want to create a link to a startup view or a controller. It the first case, you simply build you URL like this:

self._cw.build_url('view', vid='my_view_id')

The latter case appears when you want to call a controller directly without having to define a form in your view. This can happen for instance when you want to create a URL that will set a relation between 2 objects and do not need any confirmation for that. The URL construction is done like this:

self._cw.build_url('my_controller_id', arg1=value1, arg2=value2, ...)

Any extra arguments passed to the build_url method will be available in the controller as key, values pairs of the self._cw.forms dictionary. This is especially useful when you want to define some kind of hidden attributes but there is not form to put them into.

And, last but not least, a convenient way to get the root URL of the instance:

self._cw.base_url()

Some concrete cases

Get the URL of the outofcontext view of an entity:

link = entity.absolute_url(vid='outofcontext')

Create a link to a given controller then fall back to the current view:

  • In your entity view:
self.w(u'<a href="%s">Click me</a>' % xml_escape(
        self._cw.build_url('mycontrollerid',
                arg1=value1, arg2=value2,
                rql=self.cw_rset.printable_rql(),
                __redirectvid=self._cw.form.get('vid',''))))
  • In your controller:
def publish(self, rset):
     value1, value2 = self._cw.form['arg1'], self._cw.form['arg2']
     # do some stuff with value1 and value2 here...
     raise Redirect(self._cw.build_url(rql=self._cw.form['rql'],
         vid=self._cw.form['__redirectvid'],
         __message=_('you message')))

Create a link to add a given entity and relate this entity to the current one with a relation 'child_of', then go back to the current entity's view:

entity = self.cw_rset.get_entity(0,0)
self.w(u'<a href="%s">Click me</a>' % xml_escape(
        self._cw.build_url('add/Mychildentity',
                __linkto='child_of:%s:object' % entity.eid,
                __redirectpath=entity.rest_path(),
                __redirectvid=self._cw.form.get('vid', ''))))

Same example, but we suppose that we are in a multiple rset entity view, and we want to go back afterwards to this view:

entity = self.cw_rset.get_entity(0,0)
self.w(u'<a href="%s">Click me</a>' % xml_escape(
        self._cw.build_url('add/Mychildentity',
                rql=self.cw_rset.printable_rql(),
                __linkto='child_of:%s:object' % entity.eid,
                __redirectvid=self._cw.form.get('vid', ''))))

Create links to all 'menuactions' in a view:

actions = self._cw.vreg['actions'].possible_actions(self._cw, rset=self.cw_rset)
action_links = [unicode(self.action_link(x)) for x in actions.get('menuactions', ())]
self.w( u'  |  '.join(action_links))

How to create your own forms and controllers?

2012/09/05 by Stéphane Bugat

Aim

Sometimes you need to associate to a given view your own specific form and the associated controller. We will see in this blog entry how it can be done in cubicweb on a concrete case.

The case

Let's suppose you're working on a social network project where you have to develop friend-of-a-frient (foaf) relationships between persons. For that purpose, we use the cubicweb-person cube and create in our scheme relations between persons like X in_contact_with Y:

class in_contact_with(RelationDefinition):
      subject = 'Person'
      object = 'Person'
      cardinality = '**'
      symmetric = True

We will also assume that a given Person corresponds to a unique CWUser through the relation is_user.

Although it is not evident, we would like that any connected person can chose to disconnect himself from another person at any time. For that, we will create a table view that will display the list of connected users, with a custom column giving the ability to "disconnect" with the person.

Before disconnecting with this particular person, we would like also to have a confirmation form.

How to proceed

The following steps were defined to address the above issue:

  1. Define a "contact view" that will display the list of known contacts of the connected user ;
  2. In this contact view, allow the user to click on a specific contact so as to remove him ;
  3. Create a deletion confirmation view, that will contain:
    • A form holding the buttons for deletion confirmation or cancel;
    • A controller responsible for the actual deletion or the cancelling.

The contact view

Rendering a table view of connected persons

To display the list of connected persons to the current person, but also to add custom columns that do not refer specifically to attributes of a given entity, the best choice is to use EntityTableView (see here for more information):

class ContactView(EntityTableView):
    __regid__ = 'contacts_tableview'
    __select__ = is_instance('Person')
    columns = ['person', 'firstname', 'surname', 'email', 'phone', 'remove']
    layout_args = {'display_filter': 'top', 'add_view_actions': None}

    def cell_remove(w, entity):
        """link to the suppression of the relation between both contacts"""
        icon_url = entity._cw.data_url('img/user_delete.png')
        action_url = entity._cw.build_url(eid=entity.eid,
                vid='suppress_contact_view',
                __redirectpath=entity._cw.relative_path(),
                __redirectvid=entity._cw.form.get('__redirectvid', ''))
        w(u'<a href="%(actionurl)s" title="%(title)s">'
                u'<img alt="%(title)s" src="%(url)s" /></a>'
                % {'actionurl': xml_escape(action_url),
                   'title': _('remove from contacts'),
                   'url':icon_url})

    column_renderers = {
            'person': MainEntityColRenderer(),
            'email': RelatedEntityColRenderer(
                getrelated=lambda x:x.primary_email and x.primary_email[0] \
                        or None),
            'phone': RelatedEntityColRenderer(
                getrelated=lambda x:x.phone and x.phone[0] or None),
            'remove': EntityTableColRenderer(
                renderfunc=cell_remove,
                header=''),}

A few explanations about the above view:

  • By default, the column attribute contains a list of displayable attributes of the entity. If one element of the list does not correspond to an attribute, which is the case for 'remove' here, it has to have rendering function defined in the dictionnary column_renderers.
  • However, when the column header refers to a related entity attribute, we can easily use the rendering function RelatedEntityColRenderer, as it is the case for the email and phone display.
  • As for concerns the 'remove' column, we render a clickable image in the cell_remove method. Here we have chosen an icon from famfamsilk that is putted in our data/ directory, but feel free to chose a predefined icon in the cubicweb shared data directory.

The redirection URL associated to each image has to be a link to a specific action allowing the user to remove the selected person from its contacts. It is built using the self._cw.build_url() convenience function. The redirection view, 'suppress_contact_view', will be defined later on. The eid argument passed refers to the id of the contact person the user wants to remove.

Calling the contact view

The above view has to be called with a given rset which corresponds to the list of known contacts for the connected user. In our case, we have defined a StartupView for the contact management, in which in the call function we have added the following piece of code:

person = self._cw.user.related('is_user', 'object').get_entity(0,0)
rset = self._cw.execute(
        'Any X WHERE X is Person, X in_contact_with Y, '
        'Y eid %(eid)s', {'eid': person.eid})
self.w(u'<h3>' + _('Number of contacts in my network:'))
self.w(unicode(len(rset)) + u'</h3>')
if len(rset) != 0:
    self.wview('contacts_tableview', rset)

The Person corresponding to the connected user is retrieved thanks to the use of the related method and the is_user relation. The contact table view is displayed inside the parent StartupView.

Creation of the deletion confirmation view

Defining the confirmation view for contact deletion

The corresponding view is a simple View class instance, that will display a confirmation message and the related buttons. It could be defined as follows:

class SuppressContactView(View):
    __regid__ = 'suppress_contact_view'

    def cell_call(self, row, col):
        entity = self.cw_rset.get_entity(row, col)
        msg = self._cw._('Are you sure you want to remove %(name)s from your contacts?')
        self.w(u'<p>' + msg % {'name': entity.dc_long_title()} + u'</p>')
        form = self._cw.vreg['forms'].select('suppress_contact_form',
                self._cw, rset=self.cw_rset)
        form.add_hidden(u'eidto', entity.eid)
        form.add_hidden(u'eidfrom', self._cw.user.related('is_user',
            'object').get_entity(0,0).eid)
        form.render(w=self.w)

Inside the cell_call() method of this view, we will have to render a form which aims at displaying both buttons (confirm deletion or cancel deletion). This form will be described later on.

The Person contact to remove is retrieved easily thanks to cw_rset. The Person corresponding to the connected user is here also retrieved thanks to the is_user relation. To make both of them available in the form, we add them at the instanciation of the form using the convenience function add_hidden(key,val).

Defining the deletion form

The deletion form as mentioned previously is only here to hold both buttons for the deletion confirmation or the cancelling. Both buttons are declared thanks to the form_buttons attribute of the form, which is instanciated from forms.FieldsForm:

class SuppressContactForm(forms.FieldsForm):
    __regid__ = 'suppress_contact_form'
    domid = 'delete_contact_form'
    form_renderer_id = 'base'

    @property
    def action(self):
        return self._cw.build_url('suppress_contact_controller')

    form_buttons = [
            fw.Button(stdmsgs.BUTTON_DELETE, cwaction='delete'),
            fw.Button(stdmsgs.BUTTON_CANCEL, cwaction='cancel')]

Specifying a given domid will ensure that your form will have a specific DOM identifier,the controller defined in the action method will be called without any ambiguity. The form_renderer_id is precised here so as to avoid additional display of informations which don't make sense here.

Defining the controller

The custom controller is instanciated from the Controller class in cubicweb.web.controller. The declaration of the controller should have the same domid than the calling form, as mentioned previously. The related actions are described in the publish() method of the controller:

class SuppressContactController(Controller):
    __regid__ = 'suppress_contact_controller'
    domid = 'delete_contact_form'

    def publish(self, rset=None):
        if '__action_cancel' in self._cw.form.keys():
            msg = self._cw._('Deletion canceled')
            raise Redirect(self._cw.build_url(
                vid='contact_management_view',
                __message=msg))
        elif '__action_delete' in self._cw.form.keys():
            xid = self._cw.form['eidfrom']
            dead_contact = self._cw.entity_from_eid(xid)
            yid = self._cw.form['eidto']
            self._cw.execute(
                    'DELETE X in_contact_with Y'
                    '  WHERE X eid %(xid)s, Y eid %(yid)s',
                    {'xid': xid, 'yid': yid})
            msg = self._cw._('%s removed from your contacts') %\
                dead_contact.dc_long_title()
            raise Redirect(self._cw.build_url(
                vid='contact_management_view',
                __message=msg))

Retrieving of the user action is performed by testing if the '__action_<action>', where <action> refers to the cwaction in the button declaration, is present in the form keys. In the case of a cancelling, we simply redirect to the contact management view with a message specifying that the deletion has been cancelled. In the case of a deletion confirmation, both Person id's for the connected user and for the contact to remove are retrieved from the form hidden arguments.

The deletion is performed using an RQL request on the relation in_contact_with. We also redirect the view to the contact management view, this time with another message confirming the deletion of the contact link.


Logilab at the LawFactory

2012/07/16 by Vincent Michel

We have been playing along with political data for a while, using CubicWeb to store and query various sets of open data (e.g. NosDeputes, data.gouv.fr), and testing different visualization tools. In particular, we have extended our prototype of News Analysis (see the presentation we made last year at Euroscipy), in order to use these political datasets as reference for the named entities extraction part. Last week's conference "The Law Factory" at Sciences Po was a really nice opportunity to meet people with similar interests in opendata for political sciences, and to find out which questions we should be asking our data ! Check out the talk of our presentation and a few screencasts (no sound) :

Comments are welcome !

Interresting things seen at #OLPC

Among the different things that we have seen, we want to emphasize on:

  • Law is Code (http://gitorious.org/law-is-code/) - This project by the team of Regards Citoyens, aims at analysing the laws and amendments, by extracting information from the French National Assembly website, and by pushing the contributions of the members of parlement to a given law in a git repository. If we can find the time, we'll turn that into a mercurial repository and integrate it into our above application using cubicweb-vcsfile.
http://www.cubicweb.org/file/2423768?vid=download
  • Both national websites (Assemblée Nationale, Sénat), do not allow (yet...) to get data any other way than parsing the sites. However, it seems that the people involved are aware of the issues of opendata, and this may changed in the next months. In particular, the Senat use two databases (Basile and Ameli), and opening them to the public could be really interesting
  • Different projects about African parlements can be found on the following website : http://www.parliaments.info
  • Check out, ITCparliement which gives tools to analyse and share data from many different parliments.

Saturday, at La Cantine Numérique, the discussions focused on the possibilities to share tools, and the possible collaborations. I think that this is the crucial point: How people can share tools and use them in a efficient way, without being an IT expert ?

How does this inspire us for CubicWeb ?

In this way, we have are thinking about some evolutions of CubicWeb that can fullfill (part) of these requirements:

  • easier installation, especially on Windows, and easier Postgresql configuration. This could perhaps be made by allowing some graphical interface for creating/managing the instances and the databases.
  • a graphical tool for schema construction. Even if the construction of a data model in CubicWeb is quite simple, and rely on the straightforward Python syntax, it could be interesting to expose a graphical tool for adding/removing/modifying entities from the schema, as well as some attributes or relations.
  • easier ways to import data. This point is not trivial, and we don't want to develop a specific language for defining import rules, that could be used for 80% of the cases, but will be painful to extend to the 20% exotic cases. We would rather develop some helpers to ease the building of some import scripts in Python, and to upload some CubicWeb instances already filled with open databases.

Demo of CubicWeb as a follow up

As a follow up of the conference, we are openning a demo site using CubicWeb to expose data of the past legislative and presidential elections (2002, 2007, 2012)

https://www.cubicweb.org/file/2425136?&vid=download

The data used is published under Licence Ouverte / Open Licence by http://data.gouv.fr.

This demo site allows you to deeply explore the data, with different visualisations, and complex queries. Again, comments are welcome, especially if you want to retrieve some information but you don't know how to! This demo site will probably evolve in the next weeks, and we will use it to test different cubes that we have been building.

PS: We are sorry we cannot open the propotype of news aggregator for now, as there are still licensing issues concerning the reusability of the different news sources that we get articles from.


What's new in CubicWeb 3.15

2012/05/14 by Sylvain Thenault

CubicWeb 3.15 introduces a bunch of new functionalities. In short (more details below):

  • ability to use ZMQ instead of Pyro to connect to repositories
  • ZMQ inter-instances messages bus
  • new LDAP source using the datafeed approach, much more flexible than the legacy 'ldapuser' source
  • full undo support

Plus some refactorings regarding Ajax function calls, WSGI, the registry, etc. Read more for the detail.

New functionalities

  • Add ZMQ server, based on the cutting edge ZMQ socket library. This allows to access distant instances, in a similar way as Pyro.
  • Publish/subscribe mechanism using ZMQ for communication among cubicweb instances. The new zmq-address-sub and zmq-address-pub configuration variables define where this communication occurs. As of this release this mechanism is used for entity cache invalidation.
  • Improved WSGI support. While there are still some caveats, most of the code which was twisted only is now generic and allows related functionalities to work with a WSGI front-end.
  • Full undo/transaction support: undo of modifications has finally been implemented, and the configuration simplified (basically you activate it or not on an instance basis).
  • Controlling HTTP status code returns is now much easier:
    • WebRequest now has a status_out attribute to control the response status ;
    • most web-side exceptions take an optional status argument.

API changes

  • The base registry implementation has been moved to a new logilab.common.registry module (see #1916014). This includes code from :

    • cubicweb.vreg (everything that was in there)
    • cw.appobject (base selectors and all).

    In the process, some renaming was done:

    • the top level registry is now RegistryStore (was VRegistry), but that should not impact CubicWeb client code;
    • former selectors functions are now known as "predicate", though you still use predicates to build an object'selector;
    • for consistency, the objectify_selector decorator has hence been renamed to objectify_predicate;
    • on the CubicWeb side, the selectors module has been renamed to predicates.

    Debugging refactoring dropped the need for the lltrace decorator. There should be full backward compat with proper deprecation warnings. Notice the yes predicate and objectify_predicate decorator, as well as the traced_selection function should now be imported from the logilab.common.registry module.

  • All login forms are now submitted to <app_root>/login. Redirection to requested page is now handled by the login controller (it was previously handled by the session manager).

  • Publisher.publish has been renamed to Publisher.handle_request. This method now contains a generic version of the logic previously handled by Twisted. Controller.publish is not affected.

Unintrusive API changes

  • New 'ldapfeed' source type, designed to replace 'ldapuser' source with data-feed (i.e. copy based) source ideas.
  • New 'zmqrql' source type, similar to 'pyrorql' but using ømq instead of Pyro.
  • A new registry called 'services' has appeared, where you can register server-side cubicweb.server.Service child classes. Their call method can be invoked from a web-side AppObject instance using the new self._cw.call_service method or a server-side one using self.session.call_service. This is a new way to call server-side methods, much cleaner than monkey patching the Repository class, which becomes a deprecated way to perform similar tasks.
  • a new ajaxfunction registry now hosts all remote functions (i.e. functions callable through the asyncRemoteExec JS api). A convenience ajaxfunc decorator will let you expose your python functions easily without all the appobject standard boilerplate. Backwards compatibility is preserved.
  • the 'json' controller is now deprecated in favor of the 'ajax' one.
  • WebRequest.build_url can now take a __secure__ argument. When True, cubicweb tries to generate an https url.

User interface changes

A new 'undohistory' view exposes the undoable transactions and gives access to undo some of them.


Thoughts on CubicWeb 4.0

2012/05/14 by Sylvain Thenault

This is a fairly technical post talking about the structural changes I would like to see in CubicWeb's near future. Let's call that CubicWeb 4.0! It also drafts ideas on how to go from here to there. Draft, really. But that will eventually turn into a nice roadmap hopefully.

The great simplification

Some parts of cubicweb are sometimes too hairy for different reasons (some good, most bad). This participates in the difficulty to get started quickly. The goal of CubicWeb 4.0 should be to make things simpler :

  • Fix some bad old design.
  • Stop reinventing the wheel and use widely used libraries in the Python Web World. This extends to benefitting from state of the art libraries to build nice and flexible UI such as Bootstrap, on top of the JQuery foundations (which could become as prominent as the Python standard library in CubicWeb, the development team should get ready for it).
  • If there is a best way to do something, just do it and refrain from providing configurability and options.

On the road to Bootstrap

First, a few simple things could be done to simplify the UI code:

  • drop xhtml support: always return text/html content type, stop bothering with this stillborn stuff and use html5
  • move away everything that should not be in the framework: calendar?, embedding, igeocodable, isioc, massmailing, owl?, rdf?, timeline, timetable?, treeview?, vcard, wdoc?, xbel, xmlrss?

Then we should probably move the default UI into some cubes (i.e. the content of cw.web.views and cw.web.data). Besides making the move to Bootstrap easier, this should also have the benefit of making clearer that this is the default way to build an (automatic) UI in CubicWeb, but one may use other, more usual, strategies (such as using a template language).

At a first glance, we should start with the following core cubes:

  • corelayout, the default interface layout and generic components. Modules to backport there: application (not an appobject yet), basetemplates, error, boxes, basecomponents, facets, ibreadcrumbs, navigation, undohistory.
  • coreviews, the default generic views and forms. Modules to backport there: actions, ajaxedit, baseviews, autoform, dotgraphview, editcontroller, editforms, editviews, forms, formrenderers, primary, json, pyviews, tableview, reledit, tabs.
  • corebackoffice, the concrete views for the default back-office that let you handle users, sources, debugging, etc. through the web. Modules to backport here: cwuser, debug, bookmark, cwproperties, cwsources, emailaddress, management, schema, startup, workflow.
  • coreservices, the various services, not directly related to display of something. Modules to backport here: ajaxcontroller, apacherewrite, authentication, basecontrollers, csvexport, idownloadable, magicsearch, sessions, sparql, sessions, staticcontrollers, urlpublishing, urlrewrite.

This is a first draft that will need some adjustements. Some of the listed modules should be split (e.g. actions, boxes,) and their content moved to different core cubes. Also some modules in cubicweb.web packages may be moved to the relevant cube.

Each cube should provide an interface so that one could replace it with another one. For instance, move from the default coreviews and corelayout cube to bootstrap based ones. This should allow a nice migration path from the current UI to a Bootstrap based UI. Bootstrap should probably be introduced bottom-up: start using it for tables, lists, etc. then go up until the layout defined in the main template. The Orbui experience should greatly help us by pointing at hot spots that will have to be tackled, as well as by providing a nice code base from which we should start.

Regarding current implementation, we should take care that Contextual components are a powerful way to build "pluggable" UI, but we should probably add an intermediate layer that would make more obvious / explicit:

  • what the available components are
  • what the available slots are
  • which component should go in which slot when possible

Also at some point, we should take care to separate view's logic from HTML generation: our experience with client works shows that a common need is to use the logic but produce a different HTML. Though we should wait for more use of Bootstrap and related HTML simplification to see if the CSS power doesn't somewhat fulfill that need.

On the road to proper tasks management

The current looping task / repo thread mecanism is used for various sort of things and has several problems:

  • tasks don't behave similarly in a multi-instances configuration (some should be executed in a single instance, some in a subset); the tasks system has been originally written in a single instance context; as of today this is (sometimes) handled using configuration options (that will have to be properly set in each instance configuration file);
  • tasks is a repository only api but we also need web-side tasks;
  • there is probably some abuse of the system that may lead to unnecessary resources usage.

Analyzing a sample http://www.logilab.org/ instance, below are the running looping task by categories. Tasks that have to run on each web instance:

  • clean_sessions, automatically closes unused repository sessions. Notice cw.etwist.server also records a twisted task to clean web sessions. Some changes are imminent on this, they will be addressed in the upcoming refactoring session (that will become more and more necessary to move on several points listed here).
  • regular_preview_dir_cleanup (preview cube), cleanup files in the preview filesystem directory. Could be executed by a (some of the) web instance(s) provided that the preview directory is shared.

Tasks that should run on a single instance:

  • update_feeds, update copy based sources (e.g. datafeed, ldapfeed). Controlled by 'synchronize' source configuration (persistent source attribute that may be overridden by instance using CWSourceHostConfig entities)
  • expire_dataimports, delete CWDataImport entities older than an amount of time specified in the 'logs-lifetime' configuration option. Not controlled yet.
  • cleanup_auth_cookies (rememberme cube), delete CWAuthCookie entities whose life-time is exhausted. Not controlled yet.
  • cleaning_revocation_key (forgotpwd cube), delete Fpasswd entities with past revocation_date. Not controlled yet.
  • cleanup_plans (narval cube), delete Plan entities instance older than an amount of time specified in the configuration. If 'plan-cleanup-delay' is set to an empty value, the task isn't started.
  • refresh_local_repo_caches (vcsfile cube), pull or clone vcs repositories cache if the Repository entity ask to import_revision_content (hence web instance should have up to date cache to display files content) or if 'repository-import' configuration option is set to 'yes'; import vcs repository content as entities if 'repository-import' configuration option and it is coming from the system source.

Some deeper thinking is needed here so we can improve things. That includes thinking about:

  • the inter-instances messages bus based on zmq and introduced in 3.15,
  • the Celery project (http://celeryproject.org/), an asynchronous task queue, widely used and written in Python,

Remember the more cw independent the tasks are, the better it is. Though we still want an 'all-integrated' approach, e.g. not relying on external configuration of Unix specific tools such as CRON. Also we should see if a hard-dependency on Celery or a similar tool could be avoided, and if not if it should be considered as a problem (for devops).

On the road to an easier configuration

First, we should drop the different behaviour according to presence of a '.hg' in cubicweb's directory. It currently changes the location where cubicweb external resources (js, css, images, gettext catalogs) are searched for. Speaking of implementation:

  • shared_dir returns the cubicweb.web package path instead of the path to the shared cube,
  • i18n_lib_dir returns the cubicweb/i18n directory path instead of the path to the shared/i18n cube,
  • migration_scripts_dir returns the cubicweb/misc/migration directory path instead of share/cubicweb/migration.

Moving web related objects as proposed in the Bootstrap section would resolve the problem for the content web/data and most of i18n (though some messages will remain and additional efforts will be needed here). By going further this way, we may also clean up some schema code by moving cubicweb/schemas and cubicweb/misc/migration to a cube (though only a small benefit is to be expected here).

We should also have fewer environment variables... Let's see what we have today:

  • CW_INSTANCES_DIR, where to look for instances configuration
  • CW_INSTANCES_DATA_DIR, where to look for instances persistent data files
  • CW_RUNTIME_DIR, where to look for instances run-time data files
  • CW_MODE, set to 'system' or 'user' will predefine above environment variables differently
  • CW_CUBES_PATH, additional directories where to look for cubes
  • CW_CUBES_DIR, location of the system 'cubes' directory
  • CW_INSTALL_PREFIX, installation prefix, from which we can compute path to 'etc', 'var', 'share', etc.

I would propose the following changes:

  • CW_INSTANCES_DIR is turned into CW_INSTANCES_PATH, and defaults to ~/etc/cubicweb.d if it exists and /etc/cubicweb.d (on Unix platforms) otherwise;
  • CW_INSTANCES_DATA_DIR and CW_RUNTIME_DIR are replaced by configuration file options, with smart values generated at instance creation time;
  • the above change should make CW_MODE useless;
  • CW_CUBES_DIR is to be dropped, CW_CUBES_PATH should be enough;
  • regarding CW_INSTALL_PREFIX, I'm lacking experience with non-hg-or-debian installations and don't know if this can be avoided or not.

Last but not least (for the moment), the 'web' / 'repo' / 'all-in-one' configurations, and the fact that the associated configuration file changes stinks. Ideas to stop doing this:

  • one configuration file per instance, with all options provided by installed parts of the framework used by the application.
  • activate 'services' (or not): web server, repository, zmq server, pyro server. Default services to be started are stored in the configuration file.

There is probably more that can be done here (less configuration options?), but that would already be a great step forward.

On the road to...

The following projects should be investigated to see if we could benefit from them:

Discussion

Remember the following goals: migration of legacy code should go smoothly. In a perfect world every application should be able to run with CubicWeb 4.0 until the backwards compatibility code is removed (and CubicWeb 4.0 will probably be released as 4.0 at that time).

Please provide feedbacks:

  • do you think choices proposed above are good/bad choices? Why?
  • do you know some additional libraries that should be investigated?
  • do you have other changes in mind that could/should be done in cw 4.0?

Follow up of IRI conference about Museums and the Web #museoweb

2012/04/12 by Arthur Lutz

I attented the conference organised by IRI in a series of conferences about "Muséologie, muséographie et nouvelles formes d’adresse au public" (hashtag #museoweb). This particular occurence was about "Le Web devient audiovisuel" (the web is also audio and video content). Here are a few notes and links we gathered. The event was organised by Alexandre Monnin @aamonnz.

http://polemictweet.com/2011-2012-museo-audiovisuel/images/slide4_museo_fr.png

Yves Raimond from the BBC

Yves Raimond @moustaki made a presentation about his work at the BBC around semantic web technologies and speech recognition over large quantities of digitized archives. Parts of the BCC web sites use semantic web data as the database and do mashups with external sources of data (musicbrainz, dbpedia, wikipedia). For example Tom Waits has an html web page : http://www.bbc.co.uk/music/artists/c3aeb863-7b26-4388-94e8-5a240f2be21b add .rdf at the end of the URL http://www.bbc.co.uk/music/artists/c3aeb863-7b26-4388-94e8-5a240f2be21b.rdf

He also made an introduction about the ABC-IP The Automatic Broadcast Content Interlinking Project and the Kiwi-API project that uses CMU Sphinx on Amazon Web Services to process large quantities of archives. A screenshot of Kiwi-API is shown on the BBC R&D blog. The code should be open sourced soon and should appear on the BBC R&D github page.

Following his presentation, the question was asked if using Wikipedia content on an institutional web site would be possible in France, I pointed to the use of Wikipedia on http://data.bnf.fr , for example at the bottom of the Victor Hugo page.

Raphaël Troncy about Media Fragments

Raphaël Troncy @rtroncy made a presentation about "Media Fragments" which will enable sharing parts of a video on the web. Two major features : the sharing of specific extracts and the optimization of bandwith use when streaming the extract (usefull for mobile devices for example). It is a W3C working draft : http://www.w3.org/TR/media-frags-reqs/. Here are a few links of demos and players :

Part of the presentation was about the ACAV project done jointly with Dailymotion : http://www.capdigital.com/projet-acav/

The slides of his presentation are available here : http://www.slideshare.net/troncy/addressing-and-annotating-multimedia-fragments

IRI presentation

Vincent Puig @vincentpuig and Raphaël Velt @raphv made a presentation of various projects led by IRI :

http://www.iri.centrepompidou.fr/wp-content/themes/IRI-Theme/images/logo-iri-petit_fr_fr.png

Final words

The technologies seen during this conference are often related to semantic web technologies or at least web standards. Some of the visualizations are quite impressive and could mean new uses of the Web and an inspiration for CubicWeb projects.

A few of the people present at the conference will be attending or presenting talks at SemWeb.Pro which will take place in Paris on the 2nd and 3rd of may 2012.


Undoing changes in CubicWeb

2012/02/29 by Anthony Truchet

Many desktop applications offer the possibility for the user to undo the recent changes : a similar undo feature has now been integrated into the CubicWeb framework.

Because a semantic web application and a common desktop application are not the same thing at all, especially as far as undoing is concerned, we will first introduce what is the undo feature for now.

What's undoing in a CubicWeb application

A CubicWeb application acts upon an Entity-Relationship model, described by a schema. This ensures some data integrity properties. It also implies that changes are made by group called transaction : so as to insure the data integrity the transaction is completely applied or none of it is applied. What may appear as a simple atomic action to a user can actually consist in several actions for the framework. The end-user has no need to know the details of all actions in those transactions. Only the so-called public actions will appear in the description of the an undoable transaction.

Lets take a simple example: posting a "comment" for a blog entry will create the entity itself and the link to the blog entry.

The undo feature for CubicWeb end-users

For now there are two ways to access the undo feature when it has been activated in the instance configuration file with the option undo-support=yes. Immediately after having done something the undo** link appears in the "creation" message.

Screenshot of the undo link in the message

Otherwise, one can access at any time the undo-history view accessible from the start-up page.

Screenshot of the undo link in the message

This view shows the transactions, and each provides its own undo link. Only the transactions the user has permissions to see and undo will be shown.

Screenshot of the **undo** link in the message

If the user attempts to undo a transaction which can't be undone or whose undoing fails, then a message will explain the situation and no partial undoing will be left behind.

What's next

The undo feature is functional but the interface and configuration options are quite limited. One major, planned, improvement would be enable the user to filter which transactions or actions he sees in the undo-history view. Another critical improvement would be to selectively enable the undo feature on part of the entity-relationship schema to avoid storing too much data and reduce the underlying overhead.

Feedback on this undo feature for specific CubicWeb applications is welcome. More detailed information regarding the undo feature will be published in the CubicWeb book when the patches make it through the review process.


CubicWeb Sprint report for the "ZMQ" team

2012/02/27 by Julien Cristau

There has been a growing interest in ZMQ in the past months, due to its ability to efficiently deal with message passing, while being light and robust. We have worked on introducing ZMQ in the CubicWeb framework for various uses :

  • As a replacement/alternative to the Pyro source, that is used to connect to distant instances. ZMQ may be used as a lighter and more efficient alternative to Pyro. The main idea here is to use the send_pyobj/recv_pyobj API of PyZMQ (python wrapper of ZMQ) to execute methods on the distant Repository in a totally transparent way for CubicWeb.
http://www.cubicweb.org/file/2219158?vid=download
  • As a JSONServer. Indeed, ZMQ could be used to share data between a server and any requests done through ZMQ. The request is just a string of RQL, and the response is the result set formatted in Json.
  • As the building block for a simple notification (publish/subscribe) system between CubicWeb instances. A component can register its interest in a particular topic, and receive a callback whenever a corresponding message is received. At this point, this mechanism is used in CubicWeb to notify other instances that they should invalidate their caches when an entity is deleted.

CubicWeb Sprint report for the "WSGI" team

2012/02/20 by Pierre-Yves David

Cubicweb has had WSGI support for several years, but this support was incomplete.

The WSGI team was in charge of turning WSGI support into a full featured backend that could replace Twisted in real production scenarii.

Because we only had first class support for Twisted, some of the CubicWeb logic related to HTTP handling was implemented on the twisted side with twisted concepts. Our first task was to move this logic in CubicWeb itself. The handling of HTTP status in our response was improved in the process.

Our second task was to focus on the "non-HTTP" part of CubicWeb (because the repository also manages background tasks). The developement mode for WSGI is now able to handle and run such tasks. For this purpose we have begun a process that aims to remove server related code from the repository object.

We also Tested several WSGI middleware. One of the most promising is Firepython, integrating python logging and debugging feature with Firebug. werkzeug debugger seems neat too.

http://www.cubicweb.org/file/2194267?vid=download

All these improvements open the road to a simple and efficient multi-process architecture in CubicWeb.


CubicWeb Sprint report for the "Benchmarks" team

2012/02/17 by Arthur Lutz

One team during the CubicWeb sprint looked at issues around monitoring benchmark values for CubicWeb development. This is a huge task, so we tried to stay focused on a few aspects:

  • production reponse times (using tools such as smokeping and munin)
  • response times of test executions in continuous integration tests
  • response times of test instances runinng in continuous integration

We looked at using cpu.clock() instead of cpu.time() in the xunit files that report test results so as to be a bit more independent of the load of the machine (but subprocesses won't be counted for).

Graphing test times in hudson/jenkins already exists (/job/PROJECT/BUILDID/testReport/history/?) and can also be graphed by TestClass and by individual test. What is missing so far is a specific dashboard were one could select the significant graphs to look at.

By the end of the first day we had a "lorem ipsum" test instance that is created on the fly on each hudson/jenkins build and a jmeter bench running on it, it's results processed by the performance plugin.

http://www.cubicweb.org/file/2184036?vid=download

By the end of the second day we had some visualisation of existing data collected by apycot using jqplot javascript visulation (cubicweb-jqplot):

http://www.cubicweb.org/file/2184035?vid=download

By the end of the sprint, we got patches submitted for the following cubes :

  • apycot
  • cubicweb-jqplot
  • the original jqplot library (update : patch accepted a few days later)

On the last hour of the sprint, since we had a "lorem ipsum" test application running each time the tests went through the continuous integration, we hacked up a proof of concept to get automatic screenshots of this temporary test application. So far, we get screenshots for firefox only, but it opens up possibilities for other browsers. Inspiration could be drawn from https://browsershots.org/


"Data Fast-food": quick interactive exploratory processing and visualization of complex datasets with CubicWeb

2012/01/19 by Vincent Michel

With the emergence of the semantic web in the past few years, and the increasing number of high quality open data sets (cf the lod diagram), there is a growing interest in frameworks that allow to store/query/process/mine/visualize large data sets.

We have seen in previous blog posts how CubicWeb may be used as an efficient knowledge management system for various types of data, and how it may be used to perform complex queries. In this post, we will see, using Geonames data, how CubicWeb may perform simple or complex data mining and machine learning procedures on data, using the datamining cube. This cube adds powerful tools to CubicWeb that make it easy to interactively process and visualize datasets.

At this point, it is not meant to be used on massive datasets, for it is not fully optimized yet. If you try to perform a TF-IDF (term frequency–inverse document frequency) with a hierarchical clustering on the full dbpedia abstracts dataset, be prepared to wait. But it is a promising way to enrich the user experience while playing with different datasets, for quick interactive exploratory datamining processing (what I've called the "Data fast-food"). This cube is based on the scikit-learn toolbox that has recently gained a huge popularity in the machine learning and Python community. The release of this cube drastically increases the interest of CubicWeb for data management.

The Datamining cube

For a given query, similarly to SQL, CubicWeb returns a result set. This result set may be presented by a view to display a table, a map, a graph, etc (see documentation and previous blog posts).

The datamining cube introduces the possibility to process the result set before presenting it, for example to apply machine learning algorithms to cluster the data.

The datamining cube is based on two concepts:

  • the concept of processor: basically, a processor transforms a result set in a numpy array, given some criteria defining the mathematical processing, and the columns/rows of the result set to be taken into account. The numpy-array is a polyvalent structure that is widely used for numerical computation. This array could thus be efficiently used with any kind of datamining algorithms. Note that, in our context of knowledge management, it is more convenient to return a numpy array with additional meta-information, such as indices or labels, the result being stored in what we call a cw-array. Meta-information may be useful for display, but is not compulsory.
  • the concept of array-view: the "views" are basic components of CubicWeb, distinguish querying and displaying the data is key in this framework. So, on a given result set, many different views can be applied. In the datamining cube, we simply overload the basic view of CubicWeb, so that it works with cw-array instead of result sets. These array-views are associated to some machine learning or datamining processes. For example, one can apply the k-means (clustering process) view on a given cw-array.

A very important feature is that the processor and the array-view are called directly through the URL using the two related parameters arid (for ARray ID) and vid (for View ID, standard in CubicWeb).

http://www.cubicweb.org/file/2154793?vid=download

Processors

We give some examples of basic processors that may be found in the datamining cube:

  • AttributesAsFloatArrayProcessor (arid='attr-asfloat'): This processor turns all Int, BigInt and Float attributes in the result set to floats, and returns the corresponding array. The number of rows is equal to the number of rows in the result set, and the number of columns is equal to the number of convertible attributes in the result set.
  • EntityAsFloatArrayProcessor (arid='entity-asfloat'): This processor performs similarly to the AttributesAsFloatArrayProcessor, but keeps the reference to the entities used to create the numpy-array. Thus, this information could be used for display (map, label, ...).
  • AttributesAsTokenArrayProcessor (arid='attr-astoken'): This processor turns all String attributes in the result set in a numpy array, based on a Word-n-gram analyze. This may be used to tokenize a set of strings.
  • PivotTableCountArrayProcessor (arid='pivot-table-count'): This processor is used to create a pivot table, with a count function. Other functions, such as sum or product also exist. This may be used to create some spreadsheet-like views.
  • UndirectedRelationArrayProcessor (arid='undirected-rel'): This processor creates a binary numpy array of dimension (nb_entities, nb_entities), that represents the relations (or corelations) between entities. This may be used for graph-based vizualisation.

We are also planning to extend the concept of processor to sparse matrix (scipy.sparse), in order to deal with very high dimensional data.

Array Views

The array views that are found in the datamining cube, are, for most of them, used for simple visualization. We used HTML-based templates and the Protovis Javascript Library.

We will not detail all the views, but rather show some examples. Read the reference documentation for a complete and detailed description.

Examples on numerical data

Histogram

The request:

Any LO, LA WHERE X latitude LA, NOT X latitude NULL, X longitude LO,  NOT X longitude NULL,
X country C, NOT X elevation NULL, C name "France"

that may be translated as:

All couples (latitude, longitude) of the locations in France, with an elevation not null

and, using vid=protovis-hist and arid=attr-asfloat

http://www.cubicweb.org/file/2154795?vid=download

Scatter plot

Using the notion of view, we can display differently the same result set, for example using a scatter plot (vid=protovis-scatterplot).

http://www.cubicweb.org/file/2156233?vid=download

Another example with the request:

Any P, E WHERE X is Location, X elevation E, X elevation >1, X population P,
X population >10, X country CO, CO name "France"

that may be translated as:

All couples (population, elevation) of locations in France,
with a population higher than 10 (inhabitants),and an elevation higher than 1 (meter)

and, using the same vid (vid=protovis-scatterplot) and the same arid (arid=attr-asfloat)

http://www.cubicweb.org/file/2154802?vid=download

If a third column is given in the result set (and thus in the numpy array), it will be encoded in the size/color of each dot of the scatter plot. For example with the request:

Any LO, LA, E WHERE X latitude LA, NOT X latitude NULL, X longitude LO,  NOT X longitude NULL,
X country C, NOT X elevation NULL, X elevation E, C name "France"

that may be translated as:

All tuples (latitude, longitude, elevation) of the locations in France, with an elevation not null

and, using the same vid (vid=protovis-scatterplot) and the same arid (arid=attr-asfloat), we can visualize the elevation on a map, encoded in size/color

http://www.cubicweb.org/file/2154805?vid=download

Another example with the request:

Any LO, LA LIMIT 50000 WHERE X is Location, X population  >1000, X latitude LA, X longitude LO,
X country CO, CO name "France"

that may be translated as:

All couples (latitude, longitude) of 50000 locations in France, with a population higher than 100 (inhabitants)
http://www.cubicweb.org/file/2156095?vid=download

There also exist some AreaChart view, LineArray view, ...

Examples on relational data

Relational Matrix (undirected graph)

The request:

Any X,Y WHERE X continent CO, CO name "North America", X neighbour_of Y

that may be translated as:

All neighbour countries in North America

and using the vid='protovis-binarymap' and arid='undirected-rel'

http://www.cubicweb.org/file/2154796?vid=download

Relational Matrix (directed graph)

If we do not want a symmetric matrix, i.e. if we want to keep the direction of a link (X,Y is not the same relation as Y,X), we can use the directed*rel array processor. For example, with the following request:

Any X,Y LIMIT 20 WHERE X continent Y

that may be translated as:

20 countries and their continent

and using the vid='protovis-binarymap' and arid='directed-rel'

http://www.cubicweb.org/file/2154797?vid=download

Force directed graph

For a dynamic representation of relations, we can use a force directed graph. The request:

Any X,Y WHERE X neighbour_of Y

that may be translated as:

All neighbour countries in the World.

and using the vid='protovis-forcedirected' and arid='undirected-rel', we can see the full graph, with small independent components (e.g. UK and Ireland)

http://www.cubicweb.org/file/2154800?vid=download

Again, a third column in the result set could be used to encode some labeling information, for example the continent.

The request:

Any X,Y,CO WHERE X neighbour_of Y, X continent CO

that may be translated as:

All neighbour countries in the World, and their corresponding continent.

and again, using the vid='protovis-forcedirected' and arid='undirected-rel', we can see the full graph with the continents encoded in color (Americas in green, Africa in dark blue, ...)

http://www.cubicweb.org/file/2154801?vid=download

Dendrogram

For hierarchical information, one can use the Dendrogram view. For example, with the request:

Any X,Y WHERE X continent Y

that may be translated as:

All couple (country, continent) in the World

and using vid='protovis-dendrogram' and arid='directed-rel', we have the following dendrogram (we only show a part due to lack of space)

http://www.cubicweb.org/file/2154806?vid=download

Unsupervised Learning

We have also developed some machine learning view for unsupervised learning. This is more a proof of concept than a fully optimized development, but we can already do some cool stuff. Each machine learning processing is referenced by a mlid. For example, with the request:

Any LO, LA WHERE X is Location, X elevation E, X elevation >1, X latitude LA, X longitude LO,
X country CO, CO name "France"

that may be translated as:

All couples (latitude, longitude) of the locations in France, with an elevation higher than 1

and using vid='protovis-scatterplot' arid='attr-asfloat' and mlid='kmeans', we can construct a scatter plot of all couples of latitude and longitude in France, and create 10 clusters using the kmeans clustering. The labeling information is thus encoded in color/size:

http://www.cubicweb.org/file/2154804?vid=download

Download

Finally, we have also implement a download view, based on the Pickle of the numpy-array. It is thus possible to access remotely any data within a Python shell, allowing to process them as you want. Changing the request can be done very easily by changing the rql parameter in the URL. For example:

import pickle, urllib
data = pickle.loads(urllib.open('http://mydomain?rql=my request&vid=array-numpy&arid=attr-asfloat'))

CubicWeb sprint in Paris - 2012/02/07-10

2011/12/21 by Nicolas Chauvat

Topics

To be decided. Some possible topics are :

  • optimization (still)
  • porting cubicweb to python3
  • porting cubicweb to pypy
  • persistent sessions
  • finish twisted / wsgi refactoring
  • inter-instance communication bus
  • use subprocesses to handle datafeeds
  • developing more debug-tools (debug console, view profiling, etc.)
  • pluggable / unpluggable external sources (as needed for the cubipedia and semantic family)
  • client-side only applications (javascript + http)
  • mercurial storage backend: see this thread of the mailing list
  • mercurial-server integration: see this email to the mailing list

other ideas are welcome, please bring them up on cubicweb@lists.cubicweb.org

Location

This sprint will take place from in february 2012 from tuesday the 7th to friday the 10th. You are more than welcome to come along, help out and contribute. An introduction is planned for newcomers.

Network resources will be available for those bringing laptops.

Address : 104 Boulevard Auguste-Blanqui, Paris. Ring "Logilab" (googlemap)

Metro : Glacière

Contact : http://www.logilab.fr/contact

Dates : 07/02/2012 to 10/02/2012


Geonames in CubicWeb !

2011/12/14 by Vincent Michel

CubicWeb is a semantic web framework written in Python that has been succesfully used in large-scale projects, such as data.bnf.fr (French National Library's opendata) or Collections des musées de Haute-Normandie (museums of Haute-Normandie).

CubicWeb provides a high-level query language, called RQL, operating over a relational database (PostgreSQL in our case), and allows to quickly instantiate an entity-relationship data-model. By separating in two distinct steps the query and the display of data, it provides powerful means for data retrieval and processing.

In this blog, we will demonstrate some of these capabilities on the Geonames data.

Geonames

Geonames is an open-source compilation of geographical data from various sources:

"...The GeoNames geographical database covers all countries and contains over eight million placenames that are available for download free of charge..." (http://www.geonames.org)

The data is available as a dump containing different CSV files:

  • allCountries: main file containing information about 8,000,000 places in the world. We won't detail the various attributes of each location, but we will focus on some important properties, such as population and elevation. Moreover, admin_code_1 and admin_code_2 will be used to link the different locations to the corresponding AdministrativeRegion, and feature_code will be used to link the data to the corresponding type.
  • admin1CodesASCII.txt and admin2Codes.txt detail the different administrative regions, that are parts of the world such as region (Ile-de-France), department (Department of Yvelines), US counties...
  • featureCodes.txt details the different types of location that may be found in the data, such as forest(s), first-order administrative division, aqueduct, research institute, ...
  • timeZones.txt, countryInfo.txt, iso-languagecodes.txt are additional files prodividing information about timezones, countries and languages. They will be included in our CubicWeb database but won't be explained in more details here.

The Geonames website also provides some ways to browse the data: by Countries, by Largest Cities, by Highest mountains, by postal codes, etc. We will see that CubicWeb could be used to automatically create such ways of browsing data while allowing far deeper queries. There are two main challenges when dealing with such data:

  • the number of entries: with 8,000,000 placenames, we have to use efficient tools for storing and querying them.
  • the structure of the data: the different types of entries are separated in different files, but should be merged for efficient queries (i.e. we have to rebuild the different links between entities, e.g Location to Country or Location to AdministrativeRegion).

Data model

With CubicWeb, the data model of the application is written in Python. It defines different entity classes with their attributes, as well as the relationships between the different entity classes. Here is a sample of the schema.py that we have used for Geonames data:

class Location(EntityType):
    name = String(maxsize=1024, indexed=True)
    uri = String(unique=True, indexed=True)
    geonameid = Int(indexed=True)
    latitude = Float(indexed=True)
    longitude = Float(indexed=True)
    feature_code = SubjectRelation('FeatureCode', cardinality='?*', inlined=True)
    country = SubjectRelation('Country', cardinality='?*', inlined=True)
    main_administrative_region = SubjectRelation('AdministrativeRegion',
                              cardinality='?*', inlined=True)
    timezone = SubjectRelation('TimeZone', cardinality='?*', inlined=True)
    ...

This indicates that the main Location class has a name attribute (string), an uri (string), a geonameid (integer), a latitude and a longitude (both floats), and some relation to other entity classes such as FeatureCode (the relation is named feature_code), Country (the relation is named country), or AdministrativeRegion called main_administrative_region.

The cardinality of each relation is classically defined in a similar way as RDBMS, where * means any number, ? means zero or one and 1 means one and only one.

We give below a visualisation of the schema (obtained using the /schema relative url)

http://www.cubicweb.org/file/2124618?vid=download

Import

The data contained in the CSV files could be pushed and stored without any processing, but it is interesting to reconstruct the relations that may exist between different entities and entity classes, so that queries will be easier and faster.

Executing the import procedure took us 80 minutes on regular hardware, which seems very reasonable given the amount of data (~7,000,000 entities, 920MB for the allCountries.txt file), and the fact that we are also constructing many indexes (on attributes or on relations) to improve the queries. This import procedure uses some low-level SQL commands to load the data into the underlying relational database.

Queries and views

As stated before, queries are performed in CubicWeb using RQL (Relational Query Language), which is similar to SPARQL, but with a syntax that is closer to SQL. This language may be used to query directly the concepts while abstracting the physical structure of the underlying database. For example, one can use the following request:

Any X LIMIT 10 WHERE X is Location, X population > 1000000,
    X country C, C name "France"

that means:

Give me 10 locations that have a population greater than 1000000, and that are in a country named "France"

The corresponding SQL query is:

SELECT _X.cw_eid FROM cw_Country AS _C, cw_Location AS _X
WHERE _X.cw_population>1000000
      AND _X.cw_country=_C.cw_eid AND _C.cw_name="France"
LIMIT 10

We can see that RQL is higher-level than SQL and abstracts the details of the tables and the joins.

A query returns a result set (a list of results), that can be displayed using views. A main feature of CubicWeb is to separate the two steps of querying the data and displaying the results. One can query some data and visualize the results in the standard web framework, download them in different formats (JSON, RDF, CSV,...), or display them in some specific view developed in Python.

In particular, we will use the mapstraction.map which is based on the Mapstraction and the OpenLayers libraries to display information on maps using data from OpenStreetMap. This mapstraction.map view uses a feature of CubicWeb called adapter. An adapter adapts a class of entity to some interface, hence views can rely on interfaces instead of types and be able to display entities with different attributes and relations. In our case, the IGeocodableAdapter returns a latitude and a longitude for a given class of entity (here, the mapping is trivial, but there are more complex cases... :) ):

class IGeocodableAdapter(EntityAdapter):
      __regid__ = 'IGeocodable'
      __select__ = is_instance('Location')
      @property
      def latitude(self):
          return self.entity.latitude
      @property
      def longitude(self):
          return self.entity.longitude

We will give some results of queries and views later. It is important to notice that the following screenshoots are taken without any modification of the standard web interface of CubicWeb. It is possible to write specific views and to define a specific CSS, but we only wanted to show how CubicWeb could handle such data. However, the default web template of CubicWeb is sufficient for what we want to do, as it dynamically creates web pages showing attributes and relations, as well as some specific forms and javascript applets adapted directly to the data (e.g. map-based tools). Last but not least, the query and the view could be defined within the url, and thus open a world of new possibilities to the user:

http://baseurl:port/?rql=The query that I want&vid=Identifier-of-the-view

Facets

We will not get into too much details about Facets, but let's just say that this feature may be used to determine some filtering axis on the data, and thus may be used to post-filter a result set. In this example, we have defined four different facets: on the population, on the elevation, one the feature_code and one the main_administrative_region. We will see illustration of these facets below.

We give here an example of the definition of a Facet:

class LocationPopulationFacet(facet.RangeFacet):
    __regid__ = 'population-facet'
    __select__ = is_instance('Location')
    order = 2
    rtype = 'population'

where __select__ defines which class(es) of entities are targeted by this facet, order defines the order of display of the different facets, and rtype defines the target attribute/relation that will be used for filtering.

Geonames in CubicWeb

The main page of the Geoname application is illustrated in the screenshot below. It provides general information on the database, in particular the number of entities in the different classes:

  • 7,984,330 locations.
  • 59,201 administrative regions (e.g. regions, counties, departments...)
  • 7,766 languages.
  • 656 features (e.g. types of location).
  • 410 time zones.
  • 252 countries.
  • 7 continents.
http://www.cubicweb.org/file/2124617?vid=download

Simple query

We will first illustrate the possibilites of CubicWeb with the simple query that we have detailed before (that could be directly pasted in the url...):

Any X LIMIT 10 WHERE X is Location, X population > 1000000,
    X country C, C name "France"

We obtain the following page:

http://www.cubicweb.org/file/2124615?vid=download

This is the standard view of CubicWeb for displaying results. We can see (right box) that we obtain 10 locations that are indeed located in France, with a population of more than 1,000,000 inhabitants. The left box shows the search panel that could be used to launch queries, and the facet filters that may be used for filtering results, e.g. we may ask to keep only results with a population greater than 4,767,709 inhabitants within the previous results:

http://www.cubicweb.org/file/2124616?vid=download

and we obtain now only 4 results. We can also notice that the facets are linked: by restricting the result set using the population facet, the other facets also restricted their possibilities.

Simple query (but with more information !)

Let's say that we now want more information about the results that we have obtained previously (for example the exact population, the elevation and the name). This is really simple ! We just have to ask within the RQL query what we want (of course, the names N, P, E of the variables could be almost anything...):

Any N, P, E LIMIT 10 WHERE X is Location,
    X population P, X population > 1000000,
    X elevation E, X name N, X country C, C name "France"
http://www.cubicweb.org/file/2124619?vid=download

The empty column for the elevation simply means that we don't have any information about elevation.

Anyway, we can see that fetching particular information could not be simpler! Indeed, with more complex queries, we can access countless information from the Geonames database:

Any N,E,LA,LO ORDERBY E DESC LIMIT 10  WHERE X is Location,
      X latitude LA, X longitude LO,
      X elevation E, NOT X elevation NULL, X name N,
      X country C, C name "France"

which means:

Give me the 10 highest locations (the 10 first when sorting by decreasing elevation) with their name, elevation, latitude and longitude that are in a country named "France"
http://www.cubicweb.org/file/2124626?vid=download

We can now use another view on the same request, e.g. on a map (view mapstraction.map):

Any X ORDERBY E DESC LIMIT 10  WHERE X is Location,
       X latitude LA, X longitude LO, X elevation E,
       NOT X elevation NULL, X country C, C name "France"
http://www.cubicweb.org/file/2124631?vid=download

And now, we can add the fact that we want more results (20), and that the location should have a non-null population:

Any N, E, P, LA, LO ORDERBY E DESC LIMIT 20  WHERE X is Location,
       X latitude LA, X longitude LO,
       X elevation E, NOT X elevation NULL, X population P,
       X population > 0, X name N, X country C, C name "France"
http://www.cubicweb.org/file/2124632?vid=download

... and on a map ...

http://www.cubicweb.org/file/2124633?vid=download

Conclusion

In this blog, we have seen how CubicWeb could be used to store and query complex data, while providing (among other...) Web-based views for data vizualisation. It allows the user to directly query data within the URL and may be used to interact with and explore the data in depth. In a next blog, we will give more complex queries to show the full possibilities of the system.