subscribe to this blog

CubicWeb Blog

News about the framework and its uses.

Logilab's roadmap for CubicWeb on March 7th, 2014

2014/03/10 by Nicolas Chauvat

The Logilab team holds a roadmap meeting every two months to plan its CubicWeb development effort. Here is the report about the Mar 7th, 2014 meeting. The previous report posted to the blog was the january 2014 roadmap.

Version 3.17

This version is stable but old and maintainance will stop in a few weeks (current is 3.17.13 and 3.17.14 is upcoming).

Version 3.18

This version is stable and maintained (current is 3.18.3 and 3.18.4 is upcoming).

Version 3.19

This version is about to be published. It includes a heavy refactoring that modifies sessions and sources to lay the path for CubicWeb 4.

For details read list of tickets for CubicWeb 3.19.0.

Version 3.20

This version will try to reduce as much as possible the stock of patches in the state "reviewed", "awaiting review" and "in progress". If you have had something in the works that has not been accepted yet, please ready it for 3.20 and get it merged.

It should also include the work done for CWEP-002 (computed attributes and relations) and CWEP-003 (adding a FROM clause to RQL).

For details read list of tickets for CubicWeb 3.20.0.

Cubes

Here is a list of cubes that had versions published over the past two months: addressbook, awstats, blog, bootstrap, brainomics, comment, container, dataio, genomics, invoice, mediaplayer, medicalexp, neuroimaginge, person, preview, questionnaire, securityprofile, simplefacet, squareui, tag, tracker, varnish, vcwiki, vtimeline.

Here are a the new cubes we are pleased to announce:

collaboration is a building block that reuses container and helps to define collaborative workflows where entities are cloned, modified and shared.

Our priorities for the next two months are collaboration and container, then narval/apycot, then mercurial-server, then rqlcontroller and signedrequest, then imagesearch.

Mid-term goals

The work done for CWEP-0002 (computed attributes and relations) is expected to land in CubicWeb 3.20.

The work done for CWEP-0003 (explicit data source federation using FROM in RQL) is expected to land in CubicWeb 3.20.

Tools to diagnose performance issues would be very useful. Maybe in 3.21 ?

Caching session data would help and some work was done on this topic during the sprint in february. Maybe in 3.22 ?

WSGI has made progress lately, but still needs work. Maybe in 3.23 ?

RESTfulness is a goal. Maybe in 3.24 ?

Maybe 3.25 will be in fact 4.0 ?

Events

A spring sprint will take place in Logilab's offices in Paris from April 28th to 30th. We invite all the interested parties to join us there!

Last but not least

As already said on the mailing list, other developers and contributors are more than welcome to share their own goals in order to define a roadmap that best fits everyone's needs.

Logilab's next roadmap meeting will be held at the beginning of may 2014.


CubicWeb sprint / winter 2014

2014/02/12 by Nicolas Chauvat

This sprint took place at Logilab's offices in Paris on Feb 13/14. People from CEA, Unlish, Crealibre and Logilab teamed up to push CubicWeb forward.

We did not forget the priorities from the roadmap:

  • CubicWeb 3.17.13 and 3.18.3 were released, and CubicWeb 3.19 made progress
  • the branch about ComputedAttributes and ComputedRelations (CWEP-002) is ready to be merged,
  • the branch about the FROM clause (CWEP-003) made progress (the CWEP was reviewed and part of the resulting spec was implemented),
  • in order to reduce work in progress, the number of patches in state reviewed or pending-review was brought down to 243 (from 302, that is 60 or 20%, which is not bad).

CubicWeb using Postgresql at its best

2014/02/08 by Nicolas Chauvat

We had a chat today with a core contributor to Postgresql from whom we may buy consulting services in the future. We discussed how CubicWeb could get the best out of Postgresql:

  • making use of the LISTEN/NOTIFY mechanism built into PG could be useful (to warn the cache about modified items for example) and PgQ is its good friend;
  • views (materialized or not) are another way to implement computed attributes and relations (see CWEP number 002) and it could be that the Entities table is in fact a view of other tables;
  • implementing RQL as an in-database language could open the door to new things (there is PL/pgSQL, PL/Python, what if we had PL/RQL?);
  • Foreign Data Wrappers written with Multicorn would be another way to write data feeds (see LDAP integration for an example);
  • managing dates can be tricky when users reside in different timezones and UTC is important to keep in mind (unicode/str is a good analogy);
  • for transitive closures that are often needed when implementing access control policies with __permissions, Postgresql can go a long way with queries like "WITH ... (SELECT UNION ALL SELECT RETURNING *) UPDATE USING ...";
  • the fastest way to load tabular data that does not need too much pre-processing is to create a temporary table in memory, then COPY-FROM the data into that table, then index it, then write the transform and load step in SQL (maybe with PL/Python);
  • when executing more than 10 updates in a row, it is better to write into a temporary table in memory, then update the actual tables with UPDATE USING (let's check if the psycopg driver does that when executemany is called);
  • reaching 10e8 rows in a table is at the time of this writing the stage when you should start monitoring your db seriously and start considering replication, partition and sharding.
  • full-text search is much better in Postgresql than the general public thinks it is and recent developments made it orders of magnitude faster than tools like Lucene or Solr and ElasticSearch;
  • when dealing with complex queries (searching graphs maybe), an option to consider is to implement a specific data type, use it into a materialized view and use GIN or GIST indexes over it;
  • for large scientific data sets, it could be interesting to link the numpy library into Postgresql and turn numpy arrays into a new data type;
  • Oh, and one last thing: the object-oriented tables of Postgresql are not such a great idea, unless you have a use case that fits them perfectly and does not hit their limitations (CubicWeb's is_instance_of does not seem to be one of these).

Hopin' I got you thinkin' :)

http://developer.postgresql.org/~josh/graphics/logos/elephant.png

Cubicweb sprints winter/spring 2014

2014/01/24 by David Douard

The Logilab team is pleased to announce two Cubicweb sprints to be held in its Paris offices in the upcoming months:

February 13/14th at Logilab in Paris

The agenda would be the FROM clause for which a CWEP is still awaited, and the RQL rewriter according to the CWEP02.

April 28/30th at Logilab in Paris

Agenda to be defined.

Join the party

All users and contributors of CubicWeb are invited to join the party. Just send an email to contact at Logilab.fr if you plan to come.

http://farm1.static.flickr.com/183/419945378_4ead41a76d_m.jpg

Logilab's roadmap for CubicWeb on January 9th, 2014

2014/01/14 by Nicolas Chauvat

The Logilab team holds a roadmap meeting every two months to plan its CubicWeb development effort. Here is the report about the Jan 9th, 2014 meeting. The previous report posted to the blog was the november 2013 roadmap.

Version 3.17

This version is stable and maintained (current is 3.17.11 and 3.17.12 is upcoming).

Version 3.18

This version was released on Jan 10th. Read the release notes or the details of CubicWeb 3.18.0.

Version 3.19

This version includes a heavy refactoring that modifies sessions and sources to lay the path for CubicWeb 4. It is currently the default development head in the repository and is expected to be released before the end of january.

For details read list of tickets for CubicWeb 3.19.0.

Version 3.20

This version will try to reduce as much as possible the stock of patches in the state "reviewed", "awaiting review" and "in progress". If you have had something in the works that has not been accepted yet, please ready it for 3.20 and get it merged.

For details read list of tickets for CubicWeb 3.20.0.

Cubes

The current trend is to develop more and more new features in dedicated cubes than to add more code to the core of CubicWeb. If you thought CubicWeb development was slowing down, you made a mistake, because cubes are ramping up.

Here is a list of versions that were published in the past two months: timesheet, postgis, leaflet, bootstrap, worker, container, embed, geocoding, vcreview, trackervcs, vcsfile, zone, dataio, mercurial-server, queueing, questionnaire, genomics, medicalexp, neuroimaging, brainomics, elections.

Here are a the new cubes we are pleased to announce:

Bootstrap works and we do not create a new application without it.

relationwidget provides a modal window to edit relations in forms (use uicfg to activate it).

resourcepicker provides a modal window to insert links to images and files into structured text.

rqlcontroller allows to use the INSERT, DELETE and SET keywords when sending RQL queries over HTTP. It returns JSON. Get used to it and you may forget about asking for specific web services in your apps, for it is a generic web service.

imagesearch is an image gallery with facets. You may use it as a demo of a visual search tool.

Mid-term goals

A new repository was created to have all the CubicWeb Evolution Proposals in one place.

CWEP-0002 is a work in progress about computed relations and computed attributes, or maybe more. It will be a focus of the next sprint and is targeted at CubicWeb 3.20.

A new CWEP is expected about the adding FROM keyword to RQL to implement explicit data source federation. It will be a focus of the next sprint and is targeted at CubicWeb 3.21.

Tools to diagnose performance issues would be very useful. Maybe in 3.22 ?

Caching session data would help. Maybe in 3.23 ?

WSGI has made progress lately, but still needs work. Maybe in 3.24 ?

RESTfulness is a goal. Maybe in 3.25 ?

Maybe 3.26 will be in fact 4.0 ?

Events

A sprint will take place in Logilab's offices in Paris around mid-february or at the end of april. We invite all the interested parties to join us there!

Last but not least

As already said on the mailing list, other developers and contributors are more than welcome to share their own goals in order to define a roadmap that best fits everyone's needs.

Logilab's next roadmap meeting will be held at the beginning of march 2014.


What's new in CubicWeb 3.18

2014/01/10 by Aurelien Campeas

The migration script does not handle sqlite nor mysql instances.

New functionalities

  • add a security debugging tool (see #2920304)
  • introduce an add permission on attributes, to be interpreted at entity creation time only and allow the implementation of complex update rules that don't block entity creation (before that the update attribute permission was interpreted at entity creation and update time) (see #2965518)
  • the primary view display controller (uicfg) now has a set_fields_order method similar to the one available for forms
  • new method ResultSet.one(col=0) to retrieve a single entity and enforce the result has only one row (see #3352314)
  • new method RequestSessionBase.find to look for entities (see #3361290)
  • the embedded jQuery copy has been updated to version 1.10.2, and jQuery UI to version 1.10.3.
  • initial support for wsgi for the debug mode, available through the new wsgi cubicweb-ctl command, which can use either python's builtin wsgi server or the werkzeug module if present.
  • a rql-table directive is now available in ReST fields
  • cubicweb-ctl upgrade can now generate the static data resource directory directly, without a manual call to gen-static-datadir.

API changes

  • not really an API change, but the entity write permission checks are now systematically deferred to an operation, instead of a) trying in a hook and b) if it failed, retrying later in an operation
  • The default value storage for attributes is no longer String, but Bytes. This opens the road to storing arbitrary python objects, e.g. numpy arrays, and fixes a bug where default values whose truth value was False were not properly migrated.
  • symmetric relations are no more handled by an rql rewrite but are now handled with hooks (from the activeintegrity category); this may have some consequences for applications that do low-level database manipulations or at times disable (some) hooks.
  • unique together constraints (multi-columns unicity constraints) get a name attribute that maps the CubicWeb contraint entities to the corresponding backend index.
  • BreadCrumbEntityVComponent's open_breadcrumbs method now includes the first breadcrumbs separator
  • entities can be compared for equality and hashed
  • the on_fire_transition predicate accepts a sequence of possible transition names
  • the GROUP_CONCAT rql aggregate function no longer repeats duplicate values, on the sqlite and postgresql backends

Deprecation

  • pyrorql sources have been deprecated. Multisource will be fully dropped in the next version. If you are still using pyrorql, switch to datafeed NOW!
  • the old multi-source system
  • find_one_entity and find_entities in favor of find (see #3361290)
  • the TmpFileViewMixin and TmpPngView classes (see #3400448)

Deprecated Code Drops

  • ldapuser have been dropped; use ldapfeed now (see #2936496)
  • action GotRhythm was removed, make sure you do not import it in your cubes (even to unregister it) (see #3093362)
  • all 3.8 backward compat is gone
  • all 3.9 backward compat (including the javascript side) is gone
  • the twisted (web-only) instance type has been removed

For a complete list of tickets, read CubicWeb 3.18.0.


Logilab's roadmap for CubicWeb on November 8th, 2013

2013/11/11 by Nicolas Chauvat

The Logilab team holds a roadmap meeting every two months to plan its CubicWeb development effort. Here is the report about the Nov 8th, 2013 meeting. The previous report posted to the blog was the september 2013 roadmap.

Version 3.17

This version is stable and maintained (cubicweb 3.17.11 is upcoming).

Version 3.18

This version was supposed to be released in september or october, but is stalled at the integration stage. All open tickets were moved to 3.19 and existing patches that are not ready to be merged will be more aggressively delayed to 3.19. The goal is to release 3.18 as soon as possible.

For details read list of tickets for CubicWeb 3.18.0.

Version 3.19

This version will probably be published early next year (read january or february 2014). it is planned to include a heavy refactoring that modifies sessions and sources to lay the path for CubicWeb 4.

For details read list of tickets for CubicWeb 3.19.0.

Squareui

Logilab is now developping all its new projects based on Squareui (and Bootstrap 3.0). Squareui can be considered as a usable beta, but not as feature-complete.

Logilab is looking for a UX designer to work on the general ergonomy of CubicWeb. Read the job offer.

Mid-term goals

The mid-term goals include better REST support (Representational State Transfer), complete WSGI (Python's Web Server Gateway Interface) and the FROM clause for RQL queries (to reinvent db federation outside of the core).

On the front-end side, it would be nice to be able to improve forms, maybe with client-side javascript and better support for a "json on server, js in browser" separation of concerns.

Cubes

A cube oauth was contributed in large part by Unlish, a startup that is using CubicWeb to implement its service.

A cube vcwiki is being developed by Logilab, to manage the content of a wiki with a version control system (built with the cube vcsfile).

Last but not least

As already said on the mailing list, other developers and contributors are more than welcome to share their own goals in order to define a roadmap that best fits everyone's needs.

Logilab's next roadmap meeting will be held at the beginning of january 2014.


Apache authentication

2013/10/10 by Dimitri Papadopoulos

An Apache front end might be useful, as Apache provides standard log files, monitoring or authentication. In our case, we have Apache authenticate users before they are cleared to access our CubicWeb application. Still, we would like user accounts to be managed within a CubicWeb instance, avoiding separate sets of identifiers, one for Apache and the other for CubicWeb.

We have to address two issues:

  • have Apache authenticate users against accounts in the CubicWeb database,
  • have CubicWeb trust Apache authentication.

Apache authentication against CubicWeb accounts

A possible solution would be to access the identifiers associated to a CubicWeb account at the SQL level, directly from the SQL database underneath a CubicWeb instance. The login password can be found in the cw_login and cw_upassword columns of the cw_cwuser table. The benefit is that we can use existing Apache modules for authentication against SQL databases, typically mod_authn_dbd. On the other hand this is highly dependant on the underlying SQL database.

Instead we have chosen an alternate solution, directly accessing the CubicWeb repository. Since we need Python to access the repository, our sysasdmins have deployed mod_python on our Apache server.

We wrote a Python authentication module that accesses the repository using ZMQ. Thus ZMQ needs be enabled. To enable ZMQ uncomment and complete the following line in all-in-one.conf:

zmq-repository-address=zmqpickle-tcp://localhost:8181

The Python authentication module looks like:

from mod_python import apache
from cubicweb import dbapi
from cubicweb import AuthenticationError

def authenhandler(req):
    pw = req.get_basic_auth_pw()
    user = req.user

    database = 'zmqpickle-tcp://localhost:8181'
    try:
        cnx = dbapi.connect(database, login=user, password=pw)
    except AuthenticationError:
        return apache.HTTP_UNAUTHORIZED
    else:
        cnx.close()
        return apache.OK

CubicWeb trusts Apache

Our sysadmins set up Apache to add x-remote-user to the HTTP headers forwarded to CubicWeb - more on the relevant Apache configuration in the next paragraph.

We then add the cubicweb-trustedauth cube to the dependencies of our CubicWeb application. We simply had to add to the __pkginfo__.py file of our CubicWeb application:

__depends__ =  {
    'cubicweb': '>= 3.16.1',
    'cubicweb-trustedauth': None,
}

This cube gets CubicWeb to trust the x-remote-user header sent by the Apache front end. CubicWeb bypasses its own authentication mechanism. Users are directly logged into CubicWeb as the user with a login identical to the Apache login.

Apache configuration and deployment

Our Apache configuration looks like:

<Location /apppath >
  AuthType Basic
  AuthName "Restricted Area"
  AuthBasicAuthoritative Off
  AuthUserFile /dev/null
  require valid-user

  PythonAuthenHandler cubicwebhandler

  RewriteEngine On
  RewriteCond %{REMOTE_USER} (.*)
  RewriteRule . - [E=RU:%1]
<Location /apppath >

RequestHeader set X-REMOTE-USER %{RU}e

ProxyPass          /apppath  http://127.0.0.1:8080
ProxyPassReverse   /apppath  http://127.0.0.1:8080

The CubicWeb application is accessed as http://ourserver/apppath/.

The Python authentication module is deployed as /usr/lib/python2.7/dist-packages/cubicwebhandler/handler.py where cubicwebhandler is the attribute associated to PythonAuthenHandler in the Apache configuration.


Brainomics / CrEDIBLE conference report

2013/10/09 by Vincent Michel

Cubicweb and the Brainomics project were presented last week at the CrEDIBLE workshop (October 2-4, 2013, Sophia-Antipolis) on "Federating distributed and heterogeneous biomedical data and knowledge". We would like to thank the organizers for this nice opportunity to show the features of CubicWeb and Brainomics in the context of biomedical data.

http://credible.i3s.unice.fr/lib/tpl/credible/images/credible.png

Workshop highlights

  • A short presentation of SHI3LD that defines data access based on conditions that are based on ASK request. The other part was a state of the art of Open data license, and the (poor) existence of licenses expressed in RDF. Future work seems to be an interesting combination of both SHI3LD and RDF-based licenses for data access.
  • MIDAS, an open-source software for sharing medical data. This project could be an interesting source of inspiration for the file sharing part of CubicWeb, even if the (really complicated in my opinion) case of large files downloads is not addressed for now.
  • Federated queries based on FedX - the optimization techniques based on source selection & exclusive groups seems a good approach for avoiding large data transfers and finding some (sub-)optimal ways to join the different data sources. This should be taken into account in the future work on the "FROM" clause in CubicWeb.
  • WebPIE/QueryPIE: a map-reduce-based approach for large-scale reasoning.

CubicWeb and Brainomics

The slides of the presentation can be download as a PDF or viewed on slideshare.

Some people seem confused on the RQL to SQL translation. This relies on a simple translation logic that is implemented in the rql2sql file. This is only an implementation trick, not so different from the one used in RDBMS-based triplestores that have to convert SPARQL into SQL.

RQL inference : there is no magic behind the RQL inference process. As opposed to triplestores that store RDF triples that contain their own schema, and thus cannot easily know the full data model in these triples without looking at all the triples, RQL relies on a relational database with an fixed (at a given moment) data model, thus allowing inference and simple checks. In particular, in this example, we want All the Cities of `Île de France` with more than 100 000 inhabitants ?, which is expressed in RQL:

Any X WHERE X region Y, X population > 100000,
            Y uri "http://fr.dbpedia.org/resource/Île-de-France"

and SPARQL:

select ?ville where {
?ville db-owl:region <http://fr.dbpedia.org/resource/Île-de-France> .
?ville db-owl:populationTotal ?population .
FILTER (?population > 100000)
}

Beside the fact that RQL is less verbose that SPARQL (syntax matters), the simplicity of RQL relies on the fact that it can automatically infer (similarly to SPARQL) that if X is related to Y by the region relation and has a population attribute, it should be a city. If city and district both have the region relation and a population attribute, the RQL inference allows to fetch them both transparently, otherwise one can be specific by using the is relation:

Any X WHERE X is City, X region Y, X population > 100000,
            Y uri "http://fr.dbpedia.org/resource/Île-de-France"

RQL also allows subqueries, union, full-text search, stored procedures, ... (see the doc).

These really interesting discussions convinced us that we should write a journal paper for detailing the theoretical and technical concepts behind RQL and the YAMS schema.


Logilab will be in Toulouse métropole Open Data Barcamp tomorrow

2013/10/08 by Sylvain Thenault

Meet us tomorrow at the Toulouse's Cantine where several people from Logilab will be there for the open data barcamp organized by Toulouse Metropole.

More infos on barcamp.org. We'll probably talk abouthow CubicWeb manages to import large amounts of open-data to reuse.