subscribe to this blog

CubicWeb Blog

News about the framework and its uses.

Apache authentication

2013/10/10 by Dimitri Papadopoulos

An Apache front end might be useful, as Apache provides standard log files, monitoring or authentication. In our case, we have Apache authenticate users before they are cleared to access our CubicWeb application. Still, we would like user accounts to be managed within a CubicWeb instance, avoiding separate sets of identifiers, one for Apache and the other for CubicWeb.

We have to address two issues:

  • have Apache authenticate users against accounts in the CubicWeb database,
  • have CubicWeb trust Apache authentication.

Apache authentication against CubicWeb accounts

A possible solution would be to access the identifiers associated to a CubicWeb account at the SQL level, directly from the SQL database underneath a CubicWeb instance. The login password can be found in the cw_login and cw_upassword columns of the cw_cwuser table. The benefit is that we can use existing Apache modules for authentication against SQL databases, typically mod_authn_dbd. On the other hand this is highly dependant on the underlying SQL database.

Instead we have chosen an alternate solution, directly accessing the CubicWeb repository. Since we need Python to access the repository, our sysasdmins have deployed mod_python on our Apache server.

We wrote a Python authentication module that accesses the repository using ZMQ. Thus ZMQ needs be enabled. To enable ZMQ uncomment and complete the following line in all-in-one.conf:

zmq-repository-address=zmqpickle-tcp://localhost:8181

The Python authentication module looks like:

from mod_python import apache
from cubicweb import dbapi
from cubicweb import AuthenticationError

def authenhandler(req):
    pw = req.get_basic_auth_pw()
    user = req.user

    database = 'zmqpickle-tcp://localhost:8181'
    try:
        cnx = dbapi.connect(database, login=user, password=pw)
    except AuthenticationError:
        return apache.HTTP_UNAUTHORIZED
    else:
        cnx.close()
        return apache.OK

CubicWeb trusts Apache

Our sysadmins set up Apache to add x-remote-user to the HTTP headers forwarded to CubicWeb - more on the relevant Apache configuration in the next paragraph.

We then add the cubicweb-trustedauth cube to the dependencies of our CubicWeb application. We simply had to add to the __pkginfo__.py file of our CubicWeb application:

__depends__ =  {
    'cubicweb': '>= 3.16.1',
    'cubicweb-trustedauth': None,
}

This cube gets CubicWeb to trust the x-remote-user header sent by the Apache front end. CubicWeb bypasses its own authentication mechanism. Users are directly logged into CubicWeb as the user with a login identical to the Apache login.

Apache configuration and deployment

Our Apache configuration looks like:

<Location /apppath >
  AuthType Basic
  AuthName "Restricted Area"
  AuthBasicAuthoritative Off
  AuthUserFile /dev/null
  require valid-user

  PythonAuthenHandler cubicwebhandler

  RewriteEngine On
  RewriteCond %{REMOTE_USER} (.*)
  RewriteRule . - [E=RU:%1]
<Location /apppath >

RequestHeader set X-REMOTE-USER %{RU}e

ProxyPass          /apppath  http://127.0.0.1:8080
ProxyPassReverse   /apppath  http://127.0.0.1:8080

The CubicWeb application is accessed as http://ourserver/apppath/.

The Python authentication module is deployed as /usr/lib/python2.7/dist-packages/cubicwebhandler/handler.py where cubicwebhandler is the attribute associated to PythonAuthenHandler in the Apache configuration.


Brainomics / CrEDIBLE conference report

2013/10/09 by Vincent Michel

Cubicweb and the Brainomics project were presented last week at the CrEDIBLE workshop (October 2-4, 2013, Sophia-Antipolis) on "Federating distributed and heterogeneous biomedical data and knowledge". We would like to thank the organizers for this nice opportunity to show the features of CubicWeb and Brainomics in the context of biomedical data.

http://credible.i3s.unice.fr/lib/tpl/credible/images/credible.png

Workshop highlights

  • A short presentation of SHI3LD that defines data access based on conditions that are based on ASK request. The other part was a state of the art of Open data license, and the (poor) existence of licenses expressed in RDF. Future work seems to be an interesting combination of both SHI3LD and RDF-based licenses for data access.
  • MIDAS, an open-source software for sharing medical data. This project could be an interesting source of inspiration for the file sharing part of CubicWeb, even if the (really complicated in my opinion) case of large files downloads is not addressed for now.
  • Federated queries based on FedX - the optimization techniques based on source selection & exclusive groups seems a good approach for avoiding large data transfers and finding some (sub-)optimal ways to join the different data sources. This should be taken into account in the future work on the "FROM" clause in CubicWeb.
  • WebPIE/QueryPIE: a map-reduce-based approach for large-scale reasoning.

CubicWeb and Brainomics

The slides of the presentation can be download as a PDF or viewed on slideshare.

Some people seem confused on the RQL to SQL translation. This relies on a simple translation logic that is implemented in the rql2sql file. This is only an implementation trick, not so different from the one used in RDBMS-based triplestores that have to convert SPARQL into SQL.

RQL inference : there is no magic behind the RQL inference process. As opposed to triplestores that store RDF triples that contain their own schema, and thus cannot easily know the full data model in these triples without looking at all the triples, RQL relies on a relational database with an fixed (at a given moment) data model, thus allowing inference and simple checks. In particular, in this example, we want All the Cities of `Île de France` with more than 100 000 inhabitants ?, which is expressed in RQL:

Any X WHERE X region Y, X population > 100000,
            Y uri "http://fr.dbpedia.org/resource/Île-de-France"

and SPARQL:

select ?ville where {
?ville db-owl:region <http://fr.dbpedia.org/resource/Île-de-France> .
?ville db-owl:populationTotal ?population .
FILTER (?population > 100000)
}

Beside the fact that RQL is less verbose that SPARQL (syntax matters), the simplicity of RQL relies on the fact that it can automatically infer (similarly to SPARQL) that if X is related to Y by the region relation and has a population attribute, it should be a city. If city and district both have the region relation and a population attribute, the RQL inference allows to fetch them both transparently, otherwise one can be specific by using the is relation:

Any X WHERE X is City, X region Y, X population > 100000,
            Y uri "http://fr.dbpedia.org/resource/Île-de-France"

RQL also allows subqueries, union, full-text search, stored procedures, ... (see the doc).

These really interesting discussions convinced us that we should write a journal paper for detailing the theoretical and technical concepts behind RQL and the YAMS schema.


Logilab will be in Toulouse métropole Open Data Barcamp tomorrow

2013/10/08 by Sylvain Thenault

Meet us tomorrow at the Toulouse's Cantine where several people from Logilab will be there for the open data barcamp organized by Toulouse Metropole.

More infos on barcamp.org. We'll probably talk abouthow CubicWeb manages to import large amounts of open-data to reuse.


Logilab's roadmap for CubicWeb on September 6th, 2013

2013/09/17 by Nicolas Chauvat

The Logilab team holds a roadmap meeting every two months to plan its CubicWeb development effort. Here is the report about the Sept 6th, 2013 meeting. The previous report posted to the blog was the february 2013 roadmap.

Version 3.17

This version is now stable and maintained (release 3.17.7 is upcoming). It added a couple features and focused on putting CW to the diet by extracting some functionnalities provided by the core into external cubes: sioc, embed, massmailing, geocoding, etc.

For details read what's new in CubicWeb 3.17.

Version 3.18

This version is now freezed and will be published as soon as all the patches are tested and merged. Since we have a lot of work for clients until the end of the year at Logilab, the community should feel free to help (as usual) if it wants this version to be released rather sooner than later.

This version will remove the ldapuser source that is replaced by ldapfeed, implement Cross Origin Resource Sharing, drop some very old compatibility code, deprecate the old version of the multi-source system and provide various other features and bugfixes.

For details read list of tickets for CubicWeb 3.18.0.

Version 3.19

This version will probably be publish early next year (read january or february 2014) unless someone who is not working at Logilab takes responsibility for its release.

It should include the heavy refactoring work done by Pierre-Yves and Sylvain over the past year, that modifies sessions and sources to lay the path for CubicWeb 4.

For details read list of tickets for CubicWeb 3.19.0 or take a look at this head.

Squareui

Since Orbui changes the organization of the default user interface on screen, it was decided to share the low-level bootstrap related views that could be shared and build a SquareUI cube that would conform design choices of the default UI.

Logilab is now developping all its new projects based on Squareui 0.2. Read about it on the mailing list archives.

Mid-term goals

The mid-term goals include better REST support (Representational State Transfer), complete WSGI (Python's Web Server Gateway Interface) and the FROM clause for RQL queries (to reinvent db federation outside of the core).

Cubes

Our current plan is to extract as much as possible to cubes. We started CubicWeb many years ago with the Python motto "batteries included", but have since realized that having too much in the core contributes to making CubicWeb difficult to learn.

Since we would very much like the community to grow, we are now aiming for something more balanced, like Mercurial does. The core is designed such that most features can be developed as an extension. Once they are stable, popular extensions can be moved to the main library that is distributed with the core, and be activated with a switch in the configuration file.

Several cubes are under active development: oauth, signedrequest, dataio, etc.

Last but not least

As already said on the mailing list, other developers and contributors are more than welcome to share their own goals in order to define a roadmap that best fits everyone's needs.

Logilab's next roadmap meeting will be held at the beginning of November 2013.


Brainomics - A management system for exploring and merging heterogeneous brain mapping data

2013/09/12 by Arthur Lutz

At OBHM 2013, the 19th Annual Meeting of the Organization for Human Brain Mapping, Logilab presented a poster which explains the work done using CubicWeb on brain imaging and genetics data in collaboration with INRIA, INSERM and the CEA during the Brainomics project co-financed by Agence nationale de la Rercherche.

http://www.cubicweb.org/file/3123353/raw/Screenshot%20from%202013-09-12%2010%3A27%3A27.png

You can download this poster and try the demo online.


What's new in CubicWeb 3.17

2013/06/21 by Aurelien Campeas

What's new in CubicWeb 3.17?

New functionalities

  • add a command to compare db schema and file system schema (see #464991)
  • Add CubicWebRequestBase.content with the content of the HTTP request (see #2742453)
  • Add directive bookmark to ReST rendering (see #2545595)
  • Allow user defined final type (see #124342)

API changes

  • drop typed_eid() in favour of int() (see #2742462)
  • The SIOC views and adapters have been removed from CubicWeb and moved to the sioc cube.
  • The web page embedding views and adapters have been removed from CubicWeb and moved to the embed cube.
  • The email sending views and controllers have been removed from CubicWeb and moved to the massmailing cube.
  • RenderAndSendNotificationView is deprecated in favor of ActualNotificationOp the new operation uses the more efficient data idiom.
  • Looping task can now have an interval <= 0. Negative interval disable the looping task entirely.
  • We now serve html instead of xhtml. (see #2065651)

Deprecation

  • ldapuser has been deprecated. It will be removed in a future version. If you are still using ldapuser switch to ldapfeed NOW!
  • hijack_user has been deprecated. It will be dropped soon.

Deprecated Code Drops

  • The progress views and adapters have been removed from CubicWeb. These classes were deprecated since 3.14.0. They are still available in the iprogress cube.
  • The part of the API deprecated since 3.7 was dropped.

We're going to PGDay France, the Postgresql Community conference

2013/06/11 by Arthur Lutz

A few people of the CubicWeb team are going to attend the French PostgreSQL community conference in Nantes (France) on the 13th of june.

http://www.cubicweb.org/file/2932005/raw/hdr_left.png

We're excited to learn more about the following topics that are relevant to CubicWeb's development and features :

https://www.pgday.fr/_media/pgfr2.png

Obviously we'll pay attention to all the talks during the day. If you're attending, we hope to see you there.


OpenData meets the Semantic Web at WOD2013

2013/06/10 by Arthur Lutz

With a few people from Logilab we went to the 2nd International Workshop on Open Data (WOD), on the 3rd of june.

Although the main focus was an academic take on OpenData, a lot of talks were related to the Semantic Web technologies and especially LinkedData.

http://www.logilab.org/file/144837/raw/banniere-wod2013.png

The full program (and papers) is on the following website. Here is a quick review of the things we though worth sharing.

  • privacy oriented ontologies : http://l2tap.org/
  • interesting automations done to suggest alignments when initial data is uploaded to an opendata website
  • some opendata platforms have built-in APIs to get files, one example is Socrata : http://dev.socrata.com/
  • some work is being done to scale processing of linked data in the cloud (did you know you could access ready available datasets in the Amazon cloud ? DBPedia for example )
  • the data stored in wikipedia can be a good source of vocabulary on certain machine learning tasks (and in the future, wikidata project)
  • there is an RDF extension to Google Refine (or OpenRefine), but we haven't managed to get it working out of the box,
  • WebSmatch uses morphological operators (erosion / dilation) to identify grids and zones in Excel Spreadsheets and then aligns column data on known reference values (e.g. country lists).

We naturally enjoyed the presentation made by Romain Wenz about http://data.bnf.fr with the unavoidable mention of Victor Hugo (and CubicWeb).

Thanks to the organizers of the conference and to the National French Library for hosting the event.


data.bnf.fr gets the Stanford Prize for Innovation in Research Libraries

2013/03/01 by Nicolas Chauvat

data.bnf.fr and Gallica just got awarded the Stanford Prize for Innovation in Research Libraries 2013. The CubicWeb community is very pleased to see that data.bnf.fr, which is built with CubicWeb, is being recognized at the top international level as leading innovation its domain! Read the comments of the judges for more details.


CubicWeb at Data Tuesday on Feb 26th 2013

2013/02/15 by Nicolas Chauvat

CubicWeb was showcased at Data Tuesday on Feb 26th 2013. The other presentations were interesting, especially shacache.org, the soon-to-be-launched OpenMeteoData and the very useful scikit.learn.