subscribe to this blog

CubicWeb Blog

News about the framework and its uses.

show 130 results
  • "Data Fast-food": quick interactive exploratory processing and visualization of complex datasets with CubicWeb

    2012/01/19 by Vincent Michel

    With the emergence of the semantic web in the past few years, and the increasing number of high quality open data sets (cf the lod diagram), there is a growing interest in frameworks that allow to store/query/process/mine/visualize large data sets.

    We have seen in previous blog posts how CubicWeb may be used as an efficient knowledge management system for various types of data, and how it may be used to perform complex queries. In this post, we will see, using Geonames data, how CubicWeb may perform simple or complex data mining and machine learning procedures on data, using the datamining cube. This cube adds powerful tools to CubicWeb that make it easy to interactively process and visualize datasets.

    At this point, it is not meant to be used on massive datasets, for it is not fully optimized yet. If you try to perform a TF-IDF (term frequency–inverse document frequency) with a hierarchical clustering on the full dbpedia abstracts dataset, be prepared to wait. But it is a promising way to enrich the user experience while playing with different datasets, for quick interactive exploratory datamining processing (what I've called the "Data fast-food"). This cube is based on the scikit-learn toolbox that has recently gained a huge popularity in the machine learning and Python community. The release of this cube drastically increases the interest of CubicWeb for data management.

    The Datamining cube

    For a given query, similarly to SQL, CubicWeb returns a result set. This result set may be presented by a view to display a table, a map, a graph, etc (see documentation and previous blog posts).

    The datamining cube introduces the possibility to process the result set before presenting it, for example to apply machine learning algorithms to cluster the data.

    The datamining cube is based on two concepts:

    • the concept of processor: basically, a processor transforms a result set in a numpy array, given some criteria defining the mathematical processing, and the columns/rows of the result set to be taken into account. The numpy-array is a polyvalent structure that is widely used for numerical computation. This array could thus be efficiently used with any kind of datamining algorithms. Note that, in our context of knowledge management, it is more convenient to return a numpy array with additional meta-information, such as indices or labels, the result being stored in what we call a cw-array. Meta-information may be useful for display, but is not compulsory.
    • the concept of array-view: the "views" are basic components of CubicWeb, distinguish querying and displaying the data is key in this framework. So, on a given result set, many different views can be applied. In the datamining cube, we simply overload the basic view of CubicWeb, so that it works with cw-array instead of result sets. These array-views are associated to some machine learning or datamining processes. For example, one can apply the k-means (clustering process) view on a given cw-array.

    A very important feature is that the processor and the array-view are called directly through the URL using the two related parameters arid (for ARray ID) and vid (for View ID, standard in CubicWeb).


    We give some examples of basic processors that may be found in the datamining cube:

    • AttributesAsFloatArrayProcessor (arid='attr-asfloat'): This processor turns all Int, BigInt and Float attributes in the result set to floats, and returns the corresponding array. The number of rows is equal to the number of rows in the result set, and the number of columns is equal to the number of convertible attributes in the result set.
    • EntityAsFloatArrayProcessor (arid='entity-asfloat'): This processor performs similarly to the AttributesAsFloatArrayProcessor, but keeps the reference to the entities used to create the numpy-array. Thus, this information could be used for display (map, label, ...).
    • AttributesAsTokenArrayProcessor (arid='attr-astoken'): This processor turns all String attributes in the result set in a numpy array, based on a Word-n-gram analyze. This may be used to tokenize a set of strings.
    • PivotTableCountArrayProcessor (arid='pivot-table-count'): This processor is used to create a pivot table, with a count function. Other functions, such as sum or product also exist. This may be used to create some spreadsheet-like views.
    • UndirectedRelationArrayProcessor (arid='undirected-rel'): This processor creates a binary numpy array of dimension (nb_entities, nb_entities), that represents the relations (or corelations) between entities. This may be used for graph-based vizualisation.

    We are also planning to extend the concept of processor to sparse matrix (scipy.sparse), in order to deal with very high dimensional data.

    Array Views

    The array views that are found in the datamining cube, are, for most of them, used for simple visualization. We used HTML-based templates and the Protovis Javascript Library.

    We will not detail all the views, but rather show some examples. Read the reference documentation for a complete and detailed description.

    Examples on numerical data


    The request:

    Any LO, LA WHERE X latitude LA, NOT X latitude NULL, X longitude LO,  NOT X longitude NULL,
    X country C, NOT X elevation NULL, C name "France"

    that may be translated as:

    All couples (latitude, longitude) of the locations in France, with an elevation not null

    and, using vid=protovis-hist and arid=attr-asfloat

    Scatter plot

    Using the notion of view, we can display differently the same result set, for example using a scatter plot (vid=protovis-scatterplot).

    Another example with the request:

    Any P, E WHERE X is Location, X elevation E, X elevation >1, X population P,
    X population >10, X country CO, CO name "France"

    that may be translated as:

    All couples (population, elevation) of locations in France,
    with a population higher than 10 (inhabitants),and an elevation higher than 1 (meter)

    and, using the same vid (vid=protovis-scatterplot) and the same arid (arid=attr-asfloat)

    If a third column is given in the result set (and thus in the numpy array), it will be encoded in the size/color of each dot of the scatter plot. For example with the request:

    Any LO, LA, E WHERE X latitude LA, NOT X latitude NULL, X longitude LO,  NOT X longitude NULL,
    X country C, NOT X elevation NULL, X elevation E, C name "France"

    that may be translated as:

    All tuples (latitude, longitude, elevation) of the locations in France, with an elevation not null

    and, using the same vid (vid=protovis-scatterplot) and the same arid (arid=attr-asfloat), we can visualize the elevation on a map, encoded in size/color

    Another example with the request:

    Any LO, LA LIMIT 50000 WHERE X is Location, X population  >1000, X latitude LA, X longitude LO,
    X country CO, CO name "France"

    that may be translated as:

    All couples (latitude, longitude) of 50000 locations in France, with a population higher than 100 (inhabitants)

    There also exist some AreaChart view, LineArray view, ...

    Examples on relational data

    Relational Matrix (undirected graph)

    The request:

    Any X,Y WHERE X continent CO, CO name "North America", X neighbour_of Y

    that may be translated as:

    All neighbour countries in North America

    and using the vid='protovis-binarymap' and arid='undirected-rel'

    Relational Matrix (directed graph)

    If we do not want a symmetric matrix, i.e. if we want to keep the direction of a link (X,Y is not the same relation as Y,X), we can use the directed*rel array processor. For example, with the following request:

    Any X,Y LIMIT 20 WHERE X continent Y

    that may be translated as:

    20 countries and their continent

    and using the vid='protovis-binarymap' and arid='directed-rel'

    Force directed graph

    For a dynamic representation of relations, we can use a force directed graph. The request:

    Any X,Y WHERE X neighbour_of Y

    that may be translated as:

    All neighbour countries in the World.

    and using the vid='protovis-forcedirected' and arid='undirected-rel', we can see the full graph, with small independent components (e.g. UK and Ireland)

    Again, a third column in the result set could be used to encode some labeling information, for example the continent.

    The request:

    Any X,Y,CO WHERE X neighbour_of Y, X continent CO

    that may be translated as:

    All neighbour countries in the World, and their corresponding continent.

    and again, using the vid='protovis-forcedirected' and arid='undirected-rel', we can see the full graph with the continents encoded in color (Americas in green, Africa in dark blue, ...)


    For hierarchical information, one can use the Dendrogram view. For example, with the request:

    Any X,Y WHERE X continent Y

    that may be translated as:

    All couple (country, continent) in the World

    and using vid='protovis-dendrogram' and arid='directed-rel', we have the following dendrogram (we only show a part due to lack of space)

    Unsupervised Learning

    We have also developed some machine learning view for unsupervised learning. This is more a proof of concept than a fully optimized development, but we can already do some cool stuff. Each machine learning processing is referenced by a mlid. For example, with the request:

    Any LO, LA WHERE X is Location, X elevation E, X elevation >1, X latitude LA, X longitude LO,
    X country CO, CO name "France"

    that may be translated as:

    All couples (latitude, longitude) of the locations in France, with an elevation higher than 1

    and using vid='protovis-scatterplot' arid='attr-asfloat' and mlid='kmeans', we can construct a scatter plot of all couples of latitude and longitude in France, and create 10 clusters using the kmeans clustering. The labeling information is thus encoded in color/size:


    Finally, we have also implement a download view, based on the Pickle of the numpy-array. It is thus possible to access remotely any data within a Python shell, allowing to process them as you want. Changing the request can be done very easily by changing the rql parameter in the URL. For example:

    import pickle, urllib
    data = pickle.loads('http://mydomain?rql=my request&vid=array-numpy&arid=attr-asfloat'))

  • CubicWeb sprint in Paris - 2012/02/07-10

    2011/12/21 by Nicolas Chauvat


    To be decided. Some possible topics are :

    • optimization (still)
    • porting cubicweb to python3
    • porting cubicweb to pypy
    • persistent sessions
    • finish twisted / wsgi refactoring
    • inter-instance communication bus
    • use subprocesses to handle datafeeds
    • developing more debug-tools (debug console, view profiling, etc.)
    • pluggable / unpluggable external sources (as needed for the cubipedia and semantic family)
    • client-side only applications (javascript + http)
    • mercurial storage backend: see this thread of the mailing list
    • mercurial-server integration: see this email to the mailing list

    other ideas are welcome, please bring them up on


    This sprint will take place from in february 2012 from tuesday the 7th to friday the 10th. You are more than welcome to come along, help out and contribute. An introduction is planned for newcomers.

    Network resources will be available for those bringing laptops.

    Address : 104 Boulevard Auguste-Blanqui, Paris. Ring "Logilab" (googlemap)

    Metro : Glacière

    Contact :

    Dates : 07/02/2012 to 10/02/2012

  • Geonames in CubicWeb !

    2011/12/14 by Vincent Michel

    CubicWeb is a semantic web framework written in Python that has been succesfully used in large-scale projects, such as (French National Library's opendata) or Collections des musées de Haute-Normandie (museums of Haute-Normandie).

    CubicWeb provides a high-level query language, called RQL, operating over a relational database (PostgreSQL in our case), and allows to quickly instantiate an entity-relationship data-model. By separating in two distinct steps the query and the display of data, it provides powerful means for data retrieval and processing.

    In this blog, we will demonstrate some of these capabilities on the Geonames data.


    Geonames is an open-source compilation of geographical data from various sources:

    "...The GeoNames geographical database covers all countries and contains over eight million placenames that are available for download free of charge..." (

    The data is available as a dump containing different CSV files:

    • allCountries: main file containing information about 8,000,000 places in the world. We won't detail the various attributes of each location, but we will focus on some important properties, such as population and elevation. Moreover, admin_code_1 and admin_code_2 will be used to link the different locations to the corresponding AdministrativeRegion, and feature_code will be used to link the data to the corresponding type.
    • admin1CodesASCII.txt and admin2Codes.txt detail the different administrative regions, that are parts of the world such as region (Ile-de-France), department (Department of Yvelines), US counties...
    • featureCodes.txt details the different types of location that may be found in the data, such as forest(s), first-order administrative division, aqueduct, research institute, ...
    • timeZones.txt, countryInfo.txt, iso-languagecodes.txt are additional files prodividing information about timezones, countries and languages. They will be included in our CubicWeb database but won't be explained in more details here.

    The Geonames website also provides some ways to browse the data: by Countries, by Largest Cities, by Highest mountains, by postal codes, etc. We will see that CubicWeb could be used to automatically create such ways of browsing data while allowing far deeper queries. There are two main challenges when dealing with such data:

    • the number of entries: with 8,000,000 placenames, we have to use efficient tools for storing and querying them.
    • the structure of the data: the different types of entries are separated in different files, but should be merged for efficient queries (i.e. we have to rebuild the different links between entities, e.g Location to Country or Location to AdministrativeRegion).

    Data model

    With CubicWeb, the data model of the application is written in Python. It defines different entity classes with their attributes, as well as the relationships between the different entity classes. Here is a sample of the that we have used for Geonames data:

    class Location(EntityType):
        name = String(maxsize=1024, indexed=True)
        uri = String(unique=True, indexed=True)
        geonameid = Int(indexed=True)
        latitude = Float(indexed=True)
        longitude = Float(indexed=True)
        feature_code = SubjectRelation('FeatureCode', cardinality='?*', inlined=True)
        country = SubjectRelation('Country', cardinality='?*', inlined=True)
        main_administrative_region = SubjectRelation('AdministrativeRegion',
                                  cardinality='?*', inlined=True)
        timezone = SubjectRelation('TimeZone', cardinality='?*', inlined=True)

    This indicates that the main Location class has a name attribute (string), an uri (string), a geonameid (integer), a latitude and a longitude (both floats), and some relation to other entity classes such as FeatureCode (the relation is named feature_code), Country (the relation is named country), or AdministrativeRegion called main_administrative_region.

    The cardinality of each relation is classically defined in a similar way as RDBMS, where * means any number, ? means zero or one and 1 means one and only one.

    We give below a visualisation of the schema (obtained using the /schema relative url)


    The data contained in the CSV files could be pushed and stored without any processing, but it is interesting to reconstruct the relations that may exist between different entities and entity classes, so that queries will be easier and faster.

    Executing the import procedure took us 80 minutes on regular hardware, which seems very reasonable given the amount of data (~7,000,000 entities, 920MB for the allCountries.txt file), and the fact that we are also constructing many indexes (on attributes or on relations) to improve the queries. This import procedure uses some low-level SQL commands to load the data into the underlying relational database.

    Queries and views

    As stated before, queries are performed in CubicWeb using RQL (Relational Query Language), which is similar to SPARQL, but with a syntax that is closer to SQL. This language may be used to query directly the concepts while abstracting the physical structure of the underlying database. For example, one can use the following request:

    Any X LIMIT 10 WHERE X is Location, X population > 1000000,
        X country C, C name "France"

    that means:

    Give me 10 locations that have a population greater than 1000000, and that are in a country named "France"

    The corresponding SQL query is:

    SELECT _X.cw_eid FROM cw_Country AS _C, cw_Location AS _X
    WHERE _X.cw_population>1000000
          AND _X.cw_country=_C.cw_eid AND _C.cw_name="France"
    LIMIT 10

    We can see that RQL is higher-level than SQL and abstracts the details of the tables and the joins.

    A query returns a result set (a list of results), that can be displayed using views. A main feature of CubicWeb is to separate the two steps of querying the data and displaying the results. One can query some data and visualize the results in the standard web framework, download them in different formats (JSON, RDF, CSV,...), or display them in some specific view developed in Python.

    In particular, we will use the which is based on the Mapstraction and the OpenLayers libraries to display information on maps using data from OpenStreetMap. This view uses a feature of CubicWeb called adapter. An adapter adapts a class of entity to some interface, hence views can rely on interfaces instead of types and be able to display entities with different attributes and relations. In our case, the IGeocodableAdapter returns a latitude and a longitude for a given class of entity (here, the mapping is trivial, but there are more complex cases... :) ):

    class IGeocodableAdapter(EntityAdapter):
          __regid__ = 'IGeocodable'
          __select__ = is_instance('Location')
          def latitude(self):
              return self.entity.latitude
          def longitude(self):
              return self.entity.longitude

    We will give some results of queries and views later. It is important to notice that the following screenshoots are taken without any modification of the standard web interface of CubicWeb. It is possible to write specific views and to define a specific CSS, but we only wanted to show how CubicWeb could handle such data. However, the default web template of CubicWeb is sufficient for what we want to do, as it dynamically creates web pages showing attributes and relations, as well as some specific forms and javascript applets adapted directly to the data (e.g. map-based tools). Last but not least, the query and the view could be defined within the url, and thus open a world of new possibilities to the user:

    http://baseurl:port/?rql=The query that I want&vid=Identifier-of-the-view


    We will not get into too much details about Facets, but let's just say that this feature may be used to determine some filtering axis on the data, and thus may be used to post-filter a result set. In this example, we have defined four different facets: on the population, on the elevation, one the feature_code and one the main_administrative_region. We will see illustration of these facets below.

    We give here an example of the definition of a Facet:

    class LocationPopulationFacet(facet.RangeFacet):
        __regid__ = 'population-facet'
        __select__ = is_instance('Location')
        order = 2
        rtype = 'population'

    where __select__ defines which class(es) of entities are targeted by this facet, order defines the order of display of the different facets, and rtype defines the target attribute/relation that will be used for filtering.

    Geonames in CubicWeb

    The main page of the Geoname application is illustrated in the screenshot below. It provides general information on the database, in particular the number of entities in the different classes:

    • 7,984,330 locations.
    • 59,201 administrative regions (e.g. regions, counties, departments...)
    • 7,766 languages.
    • 656 features (e.g. types of location).
    • 410 time zones.
    • 252 countries.
    • 7 continents.

    Simple query

    We will first illustrate the possibilites of CubicWeb with the simple query that we have detailed before (that could be directly pasted in the url...):

    Any X LIMIT 10 WHERE X is Location, X population > 1000000,
        X country C, C name "France"

    We obtain the following page:

    This is the standard view of CubicWeb for displaying results. We can see (right box) that we obtain 10 locations that are indeed located in France, with a population of more than 1,000,000 inhabitants. The left box shows the search panel that could be used to launch queries, and the facet filters that may be used for filtering results, e.g. we may ask to keep only results with a population greater than 4,767,709 inhabitants within the previous results:

    and we obtain now only 4 results. We can also notice that the facets are linked: by restricting the result set using the population facet, the other facets also restricted their possibilities.

    Simple query (but with more information !)

    Let's say that we now want more information about the results that we have obtained previously (for example the exact population, the elevation and the name). This is really simple ! We just have to ask within the RQL query what we want (of course, the names N, P, E of the variables could be almost anything...):

    Any N, P, E LIMIT 10 WHERE X is Location,
        X population P, X population > 1000000,
        X elevation E, X name N, X country C, C name "France"

    The empty column for the elevation simply means that we don't have any information about elevation.

    Anyway, we can see that fetching particular information could not be simpler! Indeed, with more complex queries, we can access countless information from the Geonames database:

    Any N,E,LA,LO ORDERBY E DESC LIMIT 10  WHERE X is Location,
          X latitude LA, X longitude LO,
          X elevation E, NOT X elevation NULL, X name N,
          X country C, C name "France"

    which means:

    Give me the 10 highest locations (the 10 first when sorting by decreasing elevation) with their name, elevation, latitude and longitude that are in a country named "France"

    We can now use another view on the same request, e.g. on a map (view

    Any X ORDERBY E DESC LIMIT 10  WHERE X is Location,
           X latitude LA, X longitude LO, X elevation E,
           NOT X elevation NULL, X country C, C name "France"

    And now, we can add the fact that we want more results (20), and that the location should have a non-null population:

    Any N, E, P, LA, LO ORDERBY E DESC LIMIT 20  WHERE X is Location,
           X latitude LA, X longitude LO,
           X elevation E, NOT X elevation NULL, X population P,
           X population > 0, X name N, X country C, C name "France"

    ... and on a map ...


    In this blog, we have seen how CubicWeb could be used to store and query complex data, while providing (among other...) Web-based views for data vizualisation. It allows the user to directly query data within the URL and may be used to interact with and explore the data in depth. In a next blog, we will give more complex queries to show the full possibilities of the system.

  • Importing thousands of entities into CubicWeb within a few seconds with dataimport

    2011/12/09 by Adrien Di Mascio

    In most cubicweb projects I've been developing on, there always comes a time where I need to import legacy data in the new application. CubicWeb provides Store and Controller objects in the dataimport module. I won't talk here about the recommended general procedure described in the module's docstring (I find it a bit convoluted for simple cases) but I will focus on Store objects. Store objects in this module are more or less a thin layer around session objects, they provide high-level helpers such as create_entity(), relate() and keep track of what was inserted, errors occurred, etc.

    In a recent project, I had to create a somewhat fair amount (a few million) of simple entities (strings, integers, floats and dates) and relations. Default object store (i.e. cubicweb.dataimport.RQLObjectStore) is painfully slow, the reason being all integrity / security / metadata hooks that are constantly selected and executed. For large imports, dataimport also provides the cubicweb.dataimport.NoHookRQLObjectStore. This store bypasses all hooks and uses the underlying system source primitives directly, making it around two-times faster than the standard store. The problem is that we're still doing each sql query sequentially and we're talking here of millions of INSERT / UPDATE queries.

    My idea was to create my own ObjectStore class inheriting from NoHookRQLObjectStore that would try to use executemany or even copy_from when possible [1]. It is actually not hard to make groups of similar SQL queries since create_entity() generates the same query for a given set of parameters. For instance:

    create_entity('Person', firstname='John', surname='Doe')
    create_entity('Person', firstname='Tim', surname='BL')

    will generate the following sql queries:

    INSERT INTO cw_Person ( cw_cwuri, cw_eid, cw_modification_date,
                            cw_creation_date, cw_firstname, cw_surname )
           VALUES ( %(cw_cwuri)s, %(cw_eid)s, %(cw_modification_date)s,
                    %(cw_creation_date)s, %(cw_firstname)s, %(cw_surname)s )
    INSERT INTO cw_Person ( cw_cwuri, cw_eid, cw_modification_date,
                            cw_creation_date, cw_firstname, cw_surname )
           VALUES ( %(cw_cwuri)s, %(cw_eid)s, %(cw_modification_date)s,
                    %(cw_creation_date)s, %(cw_firstname)s, %(cw_surname)s )

    The only thing that will differ is the actual data inserted. Well ... ahem ... CubicWeb actually also generates a "few" extra sql queries to insert metadata for each entity:

    INSERT INTO is_instance_of_relation(eid_from,eid_to) VALUES (%s,%s)
    INSERT INTO is_relation(eid_from,eid_to) VALUES (%s,%s)
    INSERT INTO cw_source_relation(eid_from,eid_to) VALUES (%s,%s)
    INSERT INTO owned_by_relation ( eid_to, eid_from ) VALUES ( %(eid_to)s, %(eid_from)s )
    INSERT INTO created_by_relation ( eid_to, eid_from ) VALUES ( %(eid_to)s, %(eid_from)s )

    Those extra queries are actually even exactly the same for each entity insterted, whatever the entity type is, hence craving for executemany or copy_from. Grouping together SQL queries is not that hard [2] but has a drawback : as you don't have an intermediate state (the data is actually inserted only at the very end of the process), you loose the ability to query your database to fetch the entities you've just created during the import.

    Now, a few benchmarks ...

    To create those benchmarks, I decided to use the workorder cube which is a simple cube, yet complete enough : it provides only two entity types (WorkOrder and Order), a relation between them (Order split_into WorkOrder) and uses different kind of attributes (String, Date, Float).

    Once the cube was instantiated, I ran the following script to populate the database with my 3 different stores:

    import sys
    from datetime import date
    from random import choice
    from itertools import count
    from logilab.common.decorators import timed
    from cubicweb import cwconfig
    from cubicweb.dbapi import in_memory_repo_cnx
    def workorders_data(n, seq=count()):
        for i in xrange(n):
            yield {'title': u'wo-title%s' %, 'description': u'foo',
                   'begin_date':, 'end_date':}
    def orders_data(n, seq=count()):
        for i in xrange(n):
            yield {'title': u'o-title%s' %, 'date':, 'budget': 0.8}
    def split_into(orders, workorders):
        for workorder in workorders:
            yield choice(orders), workorder
    def initial_state(session, etype):
        return session.execute('Any S WHERE S is State, WF initial_state S, '
                               'WF workflow_of ET, ET name %(etn)s', {'etn': etype})[0][0]
    def populate(store, nb_workorders, nb_orders, set_state=False):
        orders = [store.create_entity('Order', **attrs)
                  for attrs in orders_data(nb_orders)]
        workorders = [store.create_entity('WorkOrder', **attrs)
                      for attrs in workorders_data(nb_workorders)]
        ## in_state is set by a hook, so NoHookObjectStore will need
        ## to set the relation manually
        if set_state:
            order_state = initial_state(store.session, 'Order')
            workorder_state = initial_state(store.session, 'WorkOrder')
            for order in orders:
                store.relate(order.eid, 'in_state', order_state)
            for workorder in workorders:
                store.relate(workorder.eid, 'in_state', workorder_state)
        for order, workorder in split_into(orders, workorders):
            store.relate(order.eid, 'split_into', workorder.eid)
    if __name__ == '__main__':
        config = cwconfig.instance_configuration(sys.argv[1])
        nb_orders = int(sys.argv[2])
        nb_workorders = int(sys.argv[3])
        repo, cnx = in_memory_repo_cnx(config, login='admin', password='admin')
        session = repo._get_session(cnx.sessionid)
        from cubicweb.dataimport import RQLObjectStore, NoHookRQLObjectStore
        from import CopyFromRQLObjectStore
        print 'testing RQLObjectStore'
        store = RQLObjectStore(session)
        populate(store, nb_workorders, nb_orders)
        print 'testing NoHookRQLObjectStore'
        store = NoHookRQLObjectStore(session)
        populate(store, nb_workorders, nb_orders, set_state=True)
        print 'testing CopyFromRQLObjectStore'
        store = CopyFromRQLObjectStore(session)

    I ran the script and asked to create 100 Order entities, 1000 WorkOrder entities and to link each created WorkOrder to a parent Order

    adim@esope:~/tmp/bench_cwdi$ python bench_cwdi 100 1000
    testing RQLObjectStore
    populate clock: 24.590000000 / time: 46.169721127
    testing NoHookRQLObjectStore
    populate clock: 8.100000000 / time: 25.712352991
    testing CopyFromRQLObjectStore
    populate clock: 0.830000000 / time: 1.180006981

    My interpretation of the above times is :

    • The clock time indicates the time spent on CubicWeb server side (i.e. hooks and data pre/postprocessing around SQL queries). The time time should be the sum of clock time + time spent in postgresql.
    • RQLObjectStore is slow ;-). Nothing new here, but the clock/time ratio means that we're speding a lot of time on the python side (i.e. hooks as I told earlier) and a fair amount of time in postgresql.
    • NoHookRQLObjectStore really takes down the time spent on the python side, the time in postgresql remains about the same as for RQLObjectStore, this is not surprising, queries performed are the same in both cases.
    • CopyFromRQLObjectStore seems blazingly fast in comparison (inserting a few thousands of elements in postgresql with a COPY FROM statement is not a problem). And ... yes, I checked the data was actually inserted, and I even a ran a cubicweb-ctl db-check on the instance afterwards.

    This probably opens new perspective for massive data imports since the client API remains the same as before for the programmer. It's still a bit experimental, can only be used for "dummy", brute-force import scenario where you can preprocess your data in Python before updating the database, but it's probably worth having such a store in the the dataimport module.

    [1]The idea is to promote an executemany('INSERT INTO ...', data) statement into a COPY FROM whenever possible (i.e. simple data types, easy enough to escape). In that case, the underlying database and python modules have to provide support for this functionality. For the record, the psycopg2 module exposes a copy_from() method and soon logilab-database will provide an additional high-level helper for this functionality (see this ticket).
    [2]The code will be posted later or even integrated into CubicWeb at some point. For now, it requires a bit of monkey patching around one or two methods in the source so that SQL is not executed but just recorded for later executions.

  • Reusing OpenData from with CubicWeb in 2 hours

    2011/12/07 by Vincent Michel is great news for the OpenData movement!

    Two days ago, the French government released thousands of data sets on under an open licensing scheme that allows people to access and play with them. Thanks to the CubicWeb semantic web framework, it took us only a couple hours to put some of that open data to good use. Here is how we mapped the french railway system.

    Train stations in french Britany

    Source Datasets

    We used two of the datasets available on

    • Train stations : description of the 6442 train stations in France, including their name, type and geographic coordinates. Here is a sample of the file

      441000;St-Germain-sur-Ille;Desserte Voyageur;48,23955;-1,65358
      441000;Montreuil-sur-Ille;Desserte Voyageur-Infrastructure;48,3072;-1,6741
    • LevelCrossings : description of the 18159 level crossings on french railways, including their type and location. Here is a sample of the file

      558000;PN privé pour voitures avec barrières sans passage piétons accolé;48,05865;1,60697
      395000;PN privé pour voitures avec barrières avec passage piétons accolé public;;48,82544;1,65795

    Data Model

    Given the above datasets, we wrote the following data model to store the data in CubicWeb:

    class Location(EntityType):
        name = String(indexed=True)
        latitude = Float(indexed=True)
        longitude = Float(indexed=True)
        feature_type = SubjectRelation('FeatureType', cardinality='?*')
        data_source = SubjectRelation('DataGovSource', cardinality='1*', inlined=True)
    class FeatureType(EntityType):
        name = String(indexed=True)
    class DataGovSource(EntityType):
        name = String(indexed=True)
        description = String()
        uri = String(indexed=True)
        icon = String()

    The Location object is used for both train stations and level crossings. It has a name (text information), a latitude and a longitude (numeric information), it can be linked to multiple FeatureType objects and to a DataGovSource. The FeatureType object is used to store the type of train station or level crossing and is defined by a name (text information). The DataGovSource object is defined by a name, a description and a uri used to link back to the source data on

    Schema of the data model

    Data Import

    We had to write a few lines of code to benefit from the massive data import feature of CubicWeb before we could load the content of the CSV files with a single command:

    $ cubicweb-ctl import-datagov-location datagov_geo gare.csv-fr.CSV  --source-type=gare
    $ cubicweb-ctl import-datagov-location datagov_geo passage_a_niveau.csv-fr.CSV  --source-type=passage

    In less than a minute, the import was completed and we had:

    • 2 DataGovSource objects, corresponding to the two data sets,
    • 24 FeatureType objects, corresponding to the different types of locations that exist (e.g. Non exploitée, Desserte Voyageur, PN public isolé pour piétons avec portillons or PN public pour voitures avec barrières gardé avec passage piétons accolé manoeuvré à distance),
    • 24601 Locations, corresponding to the different train stations and level crossings.

    Data visualization

    CubicWeb allows to build complex applications by assembling existing components (called cubes). Here we used a cube that wraps the Mapstraction and the OpenLayers libraries to display information on maps using data from OpenStreetMap.

    In order for the Location type defined in the data model to be displayable on a map, it is sufficient to write the following adapter:

    class IGeocodableAdapter(EntityAdapter):
          __regid__ = 'IGeocodable'
          __select__ = is_instance('Location')
          def latitude(self):
              return self.entity.latitude
          def longitude(self):
              return self.entity.longitude

    That was it for the development part! The next step was to use the application to browse the structure of the french train network on the map.

    Train stations in use:

    Train stations not in use:

    Zooming on some parts of the map, for example Brittany, we get to see more details and clicking on the train icons gives more information on the corresponding Location.

    Train stations in use:

    Train stations not in use:

    Since CubicWeb separates querying the data and displaying the result of a query, we can switch the view to display the same data in tables or to export it back to a CSV file.

    Querying Data

    CubicWeb implements a query langage very similar to SPARQL, that makes the data available without the need to learn a specific API.

    • Example 1: http:/some.url.demo/?rql=Any X WHERE X is Location, X name LIKE "%miny"

      This request gives all the Location with a name that ends with "miny". It returns only one element, the Firminy train station.
    • Example 2: http:/some.url.demo/?rql=Any X WHERE X is Location, X name LIKE "%ny"

      This request gives all the Location with a name that ends with "ny", and return 112 trainstations.
    • Example 3: http:/some.url.demo/?rql=Any X WHERE X latitude < 47.8, X latitude>47.6, X longitude >-1.9, X longitude<-1.8

      This request gives all the Location that have a latitude between 47.6 and 47.8, and a longitude between -1.9 and -1.8.

      We obtain 11 Location (9 levelcrossings and 2 trainstations). We can map them using the view that we describe previously.
    • Example 4: http:/domainname:8080/?rql=Any X WHERE X latitude < 47.8, X latitude>47.6, X longitude >-1.9, X longitude<-1.8, X feature_type F, F name "Desserte Voyageur"

      Will limit the previous results set to train stations that are used for passenger service:
    • Example 5: http:/domainname:8080/?rql=Any X WHERE X feature_type F, F name "PN public pour voitures sans barrières sans SAL"&

      Finally, one can map all the level crossings for vehicules without barriers (there are 3704):

    As you could see in the last URL, the map view was chosen directly with the parameter vid, meaning that the URL is shareable and can be easily included in a blog with a iframe for example.

    Data sharing

    The result of a query can also be "displayed" in RDF, thus allowing users to download a semantic version of the information, without having to do the preprocessing themselves:

    <rdf:Description rdf:about="cwuri24684b3a955d4bb8830b50b4e7521450">
      <rdf:type rdf:resource=""/>
      <cw:cw_source rdf:resource="http://some.url.demo/"/>
      <cw:longitude rdf:datatype="">-1.89599</cw:longitude>
      <cw:latitude rdf:datatype="">47.67778</cw:latitude>
      <cw:feature_type rdf:resource="http://some.url.demo/7222"/>
      <cw:data_source rdf:resource="http://some.url.demo/7206"/>


    For someone who knows the CubicWeb framework, a couple hours are enough to create a CubicWeb application that stores, displays, queries and shares data downloaded from

    The full source code for the above will be released before the end of the week.

    If you want to see more of CubicWeb in action, browse or learn how to develop your own application at

  • Roundup of "Powered by Cubicweb" websites

    2011/11/15 by Arthur Lutz

    Here is a (incomplete) list of public websites powered by Cubicweb. A lot of CubicWeb technology is used for private web applications in large companies that we can not list here.

    Demos are listed here :

    You can also find a list of the companies providing services for Cubicweb (with a few extra examples) :

  • What's new in CubicWeb 3.14?

    2011/11/10 by Sylvain Thenault

    The development of CubicWeb 3.14 was rather long and included a lot of API changes detailed here. As usual backward compatibility is provided for public APIs.

    Please note this release depends on yams 0.34 (which is incompatible with prior cubicweb releases regarding instance re-creation).

    API changes

    • Entity.fetch_rql the restriction argument has been deprecated and should be replaced with a call to the new Entity.fetch_rqlst method, get the returned value (a rql Select node) and use the RQL syntax tree API to include the above-mentioned restrictions.

      Backward compat is kept with proper warning.

    • Entity.fetch_order and Entity.fetch_unrelated_order class methods have been replaced by Entity.cw_fetch_order and Entity.cw_fetch_unrelated_order with a different prototype:

      • instead of taking (attr, var) as two string argument, they now take (select, attr, var) where select is the rql syntax tree being constructed and var the variable node.
      • instead of returning some string to be inserted in the 'ORDERBY' clause, it has to modify the syntax tree

      Backward compat is kept with proper warning, except if:

      • custom order method returns something else the a variable name with or without the sorting order (e.g. cases where you sort on the value of a registered procedure as it was done in the tracker for instance). In such case, an error is logged telling that this sorting is ignored until API upgrade.
      • client code uses direct access to one of those methods on an entity (no code known to do that).
    • Entity._rest_attr_info class method has been renamed to Entity.cw_rest_attr_info

      No backward compat since this is a protected method an no code is known to use it outside cubicweb itself.

    • AnyEntity.linked_to has been removed as part of a refactoring of this functionality (link a entity to another one at creation step). It was replaced by a EntityFieldsForm.linked_to property.

      In the same refactoring, cubicweb.web.formfield.relvoc_linkedto, cubicweb.web.formfield.relvoc_init and cubicweb.web.formfield.relvoc_unrelated were removed and replaced by RelationField methods with the same names, that take a form as a parameter.

      No backward compatibility yet. It's still time to cry for it. Cubes known to be affected: tracker, vcsfile, vcreview.

    • CWPermission entity type and its associated require_permission relation type (abstract) and require_group relation definitions have been moved to a new localperms cube. Some functions from the cubicweb.schemas package as well as some views where moved too. This makes cubicweb itself smaller while you get all the local permissions stuff into a single and documented place.

      Backward compat is kept for existing instances, though you should have installed the localperms cubes. A proper error should be displayed when trying to migrate to 3.14 an instance the use CWPermission without the new cube installed. For new instances / test, you should add a dependancy on the new cube in cubes using this feature, along with a dependancy on cubicweb >= 3.14.

    • jQuery has been updated to 1.6.4 and jquery-tablesorter to 2.0.5. No backward compat issue known.

    • Table views refactoring : new RsetTableView and EntityTableView, as well as rewritten an enhanced version of PyValTableView on the same bases, with logic moved to some column renderers and a layout. Those should be well documented and deprecates former TableView, EntityAttributesTableView and CellView, which are however kept for backward compat, with some warnings that may not be very clear unfortunatly (you may see your own table view subclass name here, which doesn't make the problem that clear). Notice that _cw.view('table', rset, *kwargs) will be routed to the new RsetTableView or to the old TableView depending on given extra arguments. See #1986413.

    • display_name don't call .lower() anymore. This may leads to changes in your user interface. Different msgid for upper/lower cases version of entity type names, as this is the only proper way to handle this with some languages.

    • IEditControlAdapter has been deprecated in favor of EditController overloading, which was made easier by adding dedicated selectors called match_edited_type and match_form_id.

    • Pre 3.6 API backward compat has been dropped, though data migration compatibility has been kept. You may have to fix errors due to old API usage for your instance before to be able to run migration, but then you should be able to upgrade even a pre 3.6 database.

    • Deprecated cubicweb.web.views.iprogress in favor of new iprogress cube.

    • Deprecated cubicweb.web.views.flot in favor of new jqplot cube.

    Unintrusive API changes

    • Refactored properties forms (eg user preferences and site wide properties) as well as pagination components to ease overridding.

    • New cubicweb.web.uihelper module with high-level helpers for uicfg.

    • New anonymized_request decorator to temporary run stuff as an anonymous user, whatever the currently logged in user.

    • New 'verbatimattr' attribute view.

    • New facet and form widget for Integer used to store binary mask.

    • New js_href function to generated proper javascript href.

    • match_kwargs and match_form_params selectors both accept a new once_is_enough argument.

    • printable_value is now a method of request, and may be given dict of formatters to use.

    • [Rset]TableView allows to set None in 'headers', meaning the label should be fetched from the result set as done by default.

    • Field vocabulary computation on entity creation now takes __linkto information into accounet.

    • Started a cubicweb.pylintext pylint plugin to help pylint analyzing cubes: you should now use

      pylint --load-plugins=cubicweb.pylintext

      to analyse your cubicweb code.


    User interface changes

    • Datafeed source now present an history of the latest import's log, including global status and debug/info/warning/error messages issued during imports. Import logs older than a configurable amount of time are automatically deleted.
    • Breadcrumbs component is properly kept when creating an entity with '__linkto'.
    • users and groups management now really lead to that (i.e. includes groups management).
    • New 'jsonp' controller with 'jsonexport' and 'ejsonexport' views.


    • Added option 'resources-concat' to make javascript/css files concatenation optional, making JS debugging a lot easier when needed.

    As usual, the 3.14 also includes a bunch of other minor changes, and bug fixes, though this time an effort has been done so that every API changes / new API should be listed here. Please download and install CubicWeb 3.14 and report any problem on the tracker and/or the mailing-list!


  • ensure that 2 boolean attributes of an entity never have the same value


    I want to implement an entity with 2 boolean attributes, and a requirement is that these two attributes never have the same boolean value (think of some kind of radio buttons).

    Let's start with a simple schema example:

    # in
    class MyEntity(EntityType):
       use_option1 = Boolean(required=True, default=True)
       use_option2 = Boolean(required=True, default=False)

    So new entities will be conform to the spec.

    To do this, you need two things:

    • a constraint in the entity schema which will ring if both attributes have the same value
    • a hook which will toggle the other attribute when one attribute is changed.

    RQL constraints are generally meant to be used on relations, but you can use them on attributes too. Simply use 'S' to denote the entity, and write the constraint normally. You need to have the same constraint on both attributes, because the constraint evaluation is triggered by the modification of the attribute.

    # in
    class MyEntity(EntityType):
       use_option1 = Boolean(required=True, default=True,
                             constraints = [
                                  RQLConstraint('S use_option1 O1, S use_option2 != O1')
       use_option2 = Boolean(required=True, default=False,
                             constraints = [
                                  RQLConstraint('S use_option1 O1, S use_option2 != O1')

    With this update, it is no longer possible to have both options set to True or False (you will get a ValidationError). The nice thing to have is to get the other option to be updated when one of the two attributes is changed, which means that you don't have to take care of this when editing the entity in the web interface (which you cannot do anyway if you are using reledit for instance).

    A nice way of writing the hook is to use Python's sets to avoid tedious logic code:

    class RadioButtonUpdateHook(Hook):
       '''ensure use_option1 = not use_option2 (and conversely)'''
       __regid__ = 'mycube.radiobuttonhook'
       events = ('before_update_entity', 'before_add_entity')
       __select__ = Hook.__select__ & is_instance('MyEntity')
       # we prebuild the set of boolean attribute names
       _flag_attributes = set(('use_option1', 'use_option2'))
       def __call__(self):
           entity = self.entity
           edited = set(entity.cw_edited)
           attributes = self._flag_attributes
           if attributes.issubset(edited):
               # both were changed, let the integrity hooks do their job
           if not attributes & edited:
               # none of our attributes where changed, do nothing
           # find which attribute was modified
           modified_set = attributes & edited
           # find the name of the other attribute
           to_change = (attributes - modified_set).pop()
           modified_name = modified_set.pop()
           # set the value of that attribute
           entity.cw_edited[to_change] = not entity.cw_edited[modified_name]

    That's it!

  • What's new in CubicWeb 3.13?

    2011/07/21 by Sylvain Thenault

    CubicWeb 3.13 has been developed for a while and includes some cool stuff:

    • generate and handle Apache's modconcat compatible URLs, to minimize the number of HTTP requests necessary to retrieve JS and CSS files, along with a new cubicweb-ctl command to generate a static 'data' directory that can be served by a front-end instead of CubicWeb
    • major facet enhancements:
      • nicer layout and visual feedback when filtering is in-progress
      • new RQLPathFacet to easily express new filters that are more than one hop away from the filtered entities
      • a more flexibile API, usable in cases where it wasn't previously possible
    • some form handling refactorings and cleanups, notably introduction of a new method to process posted content, and updated documentation
    • support for new base types : BigInt, TZDateTime and TZTime (in 3.12 actually for those two)
    • write queries optimization, and several RQL fixes on complex queries (e.g. using HAVING, sub-queries...), as well as new support for CAST() function and REGEXP operator
    • datafeed source and default CubicWeb xml parsers:
      • refactored into smaller and overridable chunks
      • easier to configure
      • make it work

    As usual, the 3.13 also includes a bunch of other minor enhancements, refactorings and bug fixes. Please download and install CubicWeb 3.13 and report any problem on the tracker and/or the mailing-list!


  • CubicWeb sprint in Paris / Need for Speed

    2011/03/22 by Adrien Di Mascio

    Logilab is hosting a CubicWeb sprint - 3 days in our Paris offices.

    The general focus will be on speed :

    • on cubicweb-server side : improve performance of massive insertions / deletions
    • on cubicweb-client side : cache implementation, HTTP server, massive parallel usage, etc.

    This sprint will take place from in April 2011 from tuesday the 26th to thursday the 28th. You are more than welcome to come along and help out, contribute, but unlike previous sprints, at least basic knowledge of CubicWeb will be required for participants since no introduction is planned.

    Network resources will be available for those bringing laptops.

    Address : 104 Boulevard Auguste-Blanqui, Paris. Ring "Logilab" (googlemap)

    Metro : Glacière

    Contact :

    Dates : 26/04/2011 to 28/04/2011

show 130 results