Blog entries

  • Google Maps and CubicWeb

    2009/03/09 by Adrien Di Mascio

    There is this so-called 'gmap-view' in CubicWeb, the question is: how to use it ?

    Well, first, no surprise, you have to generate an API key to be able to use google maps on your server (make sure your usage conforms the terms as defined by Google).

    Now, let's say you have defined the following schema:

    class Company(EntityType):
        name = String(required=True, maxsize=64)
        # ... some other attributes ...
        latitude = Float(required=True)
        longitude = Float(required=True)
    class Employee(EntityType):
        # ... some attributes ...
        works_for = SubjectRelation('Company', cardinality='1*')

    And you'd like to be able to display companies on a map; you've also got these nice icons that you'd wish to use as markers on the map. First thing, define those three icons as external resources. You can do that by editing your CUBE/data/external_resources file:


    We're nearly done, now. We just have to make our entity class implement the cubicweb.interfaces.IGeocodable interface. Here's an example:

    from cubicweb.entities import AnyEntity
    from cubicweb.interfaces import IGeocodable
    class Company(AnyEntity):
        id = 'Company' # this must match the type as defined in your schema
        __implements__ = AnyEntity.__implements__ + (IGeocodable,)
        def size(self):
            return self.req.execute('Any COUNT(E) WHERE E works_for C, C eid %(c)s',
                                    {'c': self.eid})
        # this is a method of IGeocodable
        def marker_icon(self):
            size = self.size()
            if size < 20:
                return self.req_external_resource('SMALL_MARKER_ICON')
            elif size < 500:
                return self.req_external_resource('MEDIUM_MARKER_ICON')
                return self.req_external_resource('BIG_MARKER_ICON')

    That's it, you can now call the gmap-view on a resultset containing companies:

    rset = self.req.execute('Any C WHERE C is Company')
    self.wview(rset, 'gmap-view', gmap_key=YOUR_API_KEY)

    Further configuration is possible, especially to control the size of the map or the default zoom level.

    To be fair, I must say that in a real-life cube, chances are you won't be able to specificy directly latitude and longitude and that you'll only have an address. This is slightly more complex to do since you'll need to query a geocoding service (the google one for instance) to transform your address into latitude/longitude. This will typically be done in a hook

    Here is an screenshot of google maps on a production site, the museums in Normandy :

  • Django, lessons learned in the world of startup companies

    2010/06/02 by Sandrine Ribeau

    I went to the BayPIGgies meeting last thursday. The talk of this session was led by the chief software architect of RubberCan, Barnaby Bienkowski. The idea was to explain why Django turns out to be the choice a lot of startups make when building their web applications.

    Governement 2.0

    The fact that Django is recommended by Sunlight Foundation is important. This foundation is a non-partisan, non-profit organization based in Washington, DC that focuses on the digitization of government data and the creation of tools and Web sites to make that data easily accessible for all citizens. This is part of what is called Governement 2.0. It is a neologism for attempts to apply the social networking and integration advantages of Web 2.0 to the practice of government (see E-Governement).

    It looks like the Sunlight Foundation recommends Django because it comes from the publishing industry. I am not sure what is so special about this, but I wish I could get more details on it, so please add your comments below.

    Since the CubicWeb's community is still small, we are not yet recommended by such a large foundation, but we'll make more effort to talk about it and try to expand our community.


    These days, geo-localization is a big deal in most applications. On that matter, what Django has to offer is GeoDjango, that recently became part of the Django core. It is integrated with the ORM and has pre-generated SQL queries, but it is not optimized. It uses PostGIS, which adds support for geographic objects to the PostgreSQL object-relational database. GeoDjango strives to make it as simple as possible to create geographic web applications, like location-based services. Some of the features it provides are:

    • Extensions to Django’s ORM for the querying and manipulation of spatial data
    • Editing of geometry fields inside the administration panels
    • Loosely-coupled, high-level Python interfaces for GIS geometry operations and data formats.

    OpenStreetMap is used for the backend. It provides geographic data for any part of the world. This is a nice feature and we should consider it for CubicWeb. What we provide so far is an interface IGeocodable with related views gmap-view, gmap-bubble, geocoding-json and gmap-legend. We do not query this data yet, we simply render them nicely in a Google Map. You can find the details on how to use it here.

    Online stores

    Numerous web applications are not only service or data providers, they sell something. Satchmo is the Django tool to easily build online stores. It provides a shopping cart framework with checkout using different payment modules such as, TrustCommerce, CyberSource, PayPal, Google Checkout or Protx.

    CubicWeb does not provide a component allowing to build an online store, it's not yet a domain we worked on. But I'd like to talk a bit about the cube cubicweb-shoppingcart. This cube defines shopping item and shopping cart, and enables to add items to the shopping cart. It defines type of shopping items and only those can be added to the shopping cart. Whereas Satchmo required to define categories and add items within a category, cubicweb-shoppingcart does not oblige to define categories. Creating shopping items is the only thing you need to do. That makes this component usable not only for online store. For example, we used this cube to manage Euroscipy registration fees reusing the generic schema of a "virtual" shopping cart and its related ressources (web widgets, validation hook, ...).

    Re-usable components

    Pinax has a overall good satisfaction as it supports basics components for blogging, tagging, registration, notification and so on. But one point that was raised, is the difficulty of customizing Pinax components. It seems easy to write your own version of Pinax components, but to integrate them is a pain. All the components are tightly related and by customizing one, there is a big chance it will affect the other components.

    This last point is a big disadvantage. Why? Well, as a developer there is always something that you need to adjust to fit your needs. So customizing components is something you will not avoid while developing your web application. And something I'd like to point about CubicWeb, is its simplicity of re-using existing components, which are independent from each others. This is as easy as Python inheritance. And with its VRegistry, selectors and application objects (see The VRegistry, selectors and application objects for more details), customization is well integrated into the framework.

    Assemble cubes and functionalities is very easy as well. Let's think of an example. We have those three cubes: cubicweb-book, cubicweb-tag and cubicweb-comment. Cubicweb-book defines Book entity type. Cubicweb-tag defines Tag entities and the ability to tag other entity types. Cubicweb-comment defines Comment entity type and the ability to comment other entity types. What if we want to create an application in which we could tag and comment Book. Well, this is done with the following schema definition where we explicitly define the relations between Book, Tag and Comment entity types:

    from yams.buildobjs import RelationDefinition
    class comments(RelationDefinition):
        subject = 'Comment'
        object = 'Book'
        cardinality = '1*'
        composite = 'subject'
    class tag(RelationDefinition):
        subject = 'Tag'
        object = 'Book'
        cardinality = '**'


    Despite the fact that forms are easy in Django, there is no way to add inline entities, at least for now (see this proposition) as easily as in CubicWeb (see HTML form construction for more details). That is very neat when you create/edit related entities. Plus, since CubicWeb 3.6, forms are much easier to handle, and we still put a lot of effort into making it simplier.

    So, yes, overall Django is selected as the best compromise, but for the reason I listed, CubicWeb should be considered.

    Watch out Django, we are getting on your way ;)

  • Reusing OpenData from with CubicWeb in 2 hours

    2011/12/07 by Vincent Michel is great news for the OpenData movement!

    Two days ago, the French government released thousands of data sets on under an open licensing scheme that allows people to access and play with them. Thanks to the CubicWeb semantic web framework, it took us only a couple hours to put some of that open data to good use. Here is how we mapped the french railway system.

    Train stations in french Britany

    Source Datasets

    We used two of the datasets available on

    • Train stations : description of the 6442 train stations in France, including their name, type and geographic coordinates. Here is a sample of the file

      441000;St-Germain-sur-Ille;Desserte Voyageur;48,23955;-1,65358
      441000;Montreuil-sur-Ille;Desserte Voyageur-Infrastructure;48,3072;-1,6741
    • LevelCrossings : description of the 18159 level crossings on french railways, including their type and location. Here is a sample of the file

      558000;PN privé pour voitures avec barrières sans passage piétons accolé;48,05865;1,60697
      395000;PN privé pour voitures avec barrières avec passage piétons accolé public;;48,82544;1,65795

    Data Model

    Given the above datasets, we wrote the following data model to store the data in CubicWeb:

    class Location(EntityType):
        name = String(indexed=True)
        latitude = Float(indexed=True)
        longitude = Float(indexed=True)
        feature_type = SubjectRelation('FeatureType', cardinality='?*')
        data_source = SubjectRelation('DataGovSource', cardinality='1*', inlined=True)
    class FeatureType(EntityType):
        name = String(indexed=True)
    class DataGovSource(EntityType):
        name = String(indexed=True)
        description = String()
        uri = String(indexed=True)
        icon = String()

    The Location object is used for both train stations and level crossings. It has a name (text information), a latitude and a longitude (numeric information), it can be linked to multiple FeatureType objects and to a DataGovSource. The FeatureType object is used to store the type of train station or level crossing and is defined by a name (text information). The DataGovSource object is defined by a name, a description and a uri used to link back to the source data on

    Schema of the data model

    Data Import

    We had to write a few lines of code to benefit from the massive data import feature of CubicWeb before we could load the content of the CSV files with a single command:

    $ cubicweb-ctl import-datagov-location datagov_geo gare.csv-fr.CSV  --source-type=gare
    $ cubicweb-ctl import-datagov-location datagov_geo passage_a_niveau.csv-fr.CSV  --source-type=passage

    In less than a minute, the import was completed and we had:

    • 2 DataGovSource objects, corresponding to the two data sets,
    • 24 FeatureType objects, corresponding to the different types of locations that exist (e.g. Non exploitée, Desserte Voyageur, PN public isolé pour piétons avec portillons or PN public pour voitures avec barrières gardé avec passage piétons accolé manoeuvré à distance),
    • 24601 Locations, corresponding to the different train stations and level crossings.

    Data visualization

    CubicWeb allows to build complex applications by assembling existing components (called cubes). Here we used a cube that wraps the Mapstraction and the OpenLayers libraries to display information on maps using data from OpenStreetMap.

    In order for the Location type defined in the data model to be displayable on a map, it is sufficient to write the following adapter:

    class IGeocodableAdapter(EntityAdapter):
          __regid__ = 'IGeocodable'
          __select__ = is_instance('Location')
          def latitude(self):
              return self.entity.latitude
          def longitude(self):
              return self.entity.longitude

    That was it for the development part! The next step was to use the application to browse the structure of the french train network on the map.

    Train stations in use:

    Train stations not in use:

    Zooming on some parts of the map, for example Brittany, we get to see more details and clicking on the train icons gives more information on the corresponding Location.

    Train stations in use:

    Train stations not in use:

    Since CubicWeb separates querying the data and displaying the result of a query, we can switch the view to display the same data in tables or to export it back to a CSV file.

    Querying Data

    CubicWeb implements a query langage very similar to SPARQL, that makes the data available without the need to learn a specific API.

    • Example 1: http:/some.url.demo/?rql=Any X WHERE X is Location, X name LIKE "%miny"

      This request gives all the Location with a name that ends with "miny". It returns only one element, the Firminy train station.
    • Example 2: http:/some.url.demo/?rql=Any X WHERE X is Location, X name LIKE "%ny"

      This request gives all the Location with a name that ends with "ny", and return 112 trainstations.
    • Example 3: http:/some.url.demo/?rql=Any X WHERE X latitude < 47.8, X latitude>47.6, X longitude >-1.9, X longitude<-1.8

      This request gives all the Location that have a latitude between 47.6 and 47.8, and a longitude between -1.9 and -1.8.

      We obtain 11 Location (9 levelcrossings and 2 trainstations). We can map them using the view that we describe previously.
    • Example 4: http:/domainname:8080/?rql=Any X WHERE X latitude < 47.8, X latitude>47.6, X longitude >-1.9, X longitude<-1.8, X feature_type F, F name "Desserte Voyageur"

      Will limit the previous results set to train stations that are used for passenger service:
    • Example 5: http:/domainname:8080/?rql=Any X WHERE X feature_type F, F name "PN public pour voitures sans barrières sans SAL"&

      Finally, one can map all the level crossings for vehicules without barriers (there are 3704):

    As you could see in the last URL, the map view was chosen directly with the parameter vid, meaning that the URL is shareable and can be easily included in a blog with a iframe for example.

    Data sharing

    The result of a query can also be "displayed" in RDF, thus allowing users to download a semantic version of the information, without having to do the preprocessing themselves:

    <rdf:Description rdf:about="cwuri24684b3a955d4bb8830b50b4e7521450">
      <rdf:type rdf:resource=""/>
      <cw:cw_source rdf:resource="http://some.url.demo/"/>
      <cw:longitude rdf:datatype="">-1.89599</cw:longitude>
      <cw:latitude rdf:datatype="">47.67778</cw:latitude>
      <cw:feature_type rdf:resource="http://some.url.demo/7222"/>
      <cw:data_source rdf:resource="http://some.url.demo/7206"/>


    For someone who knows the CubicWeb framework, a couple hours are enough to create a CubicWeb application that stores, displays, queries and shares data downloaded from

    The full source code for the above will be released before the end of the week.

    If you want to see more of CubicWeb in action, browse or learn how to develop your own application at

  • INSEE, XML and RDF

    2009/07/07 by Nicolas Chauvat

    I discovered that the French Institute for Statistics and Economic Studies (INSEE) has published part of its data as XML and RDF:

    We will try to put that data to good use.

  • Geonames in CubicWeb !

    2011/12/14 by Vincent Michel

    CubicWeb is a semantic web framework written in Python that has been succesfully used in large-scale projects, such as (French National Library's opendata) or Collections des musées de Haute-Normandie (museums of Haute-Normandie).

    CubicWeb provides a high-level query language, called RQL, operating over a relational database (PostgreSQL in our case), and allows to quickly instantiate an entity-relationship data-model. By separating in two distinct steps the query and the display of data, it provides powerful means for data retrieval and processing.

    In this blog, we will demonstrate some of these capabilities on the Geonames data.


    Geonames is an open-source compilation of geographical data from various sources:

    "...The GeoNames geographical database covers all countries and contains over eight million placenames that are available for download free of charge..." (

    The data is available as a dump containing different CSV files:

    • allCountries: main file containing information about 8,000,000 places in the world. We won't detail the various attributes of each location, but we will focus on some important properties, such as population and elevation. Moreover, admin_code_1 and admin_code_2 will be used to link the different locations to the corresponding AdministrativeRegion, and feature_code will be used to link the data to the corresponding type.
    • admin1CodesASCII.txt and admin2Codes.txt detail the different administrative regions, that are parts of the world such as region (Ile-de-France), department (Department of Yvelines), US counties...
    • featureCodes.txt details the different types of location that may be found in the data, such as forest(s), first-order administrative division, aqueduct, research institute, ...
    • timeZones.txt, countryInfo.txt, iso-languagecodes.txt are additional files prodividing information about timezones, countries and languages. They will be included in our CubicWeb database but won't be explained in more details here.

    The Geonames website also provides some ways to browse the data: by Countries, by Largest Cities, by Highest mountains, by postal codes, etc. We will see that CubicWeb could be used to automatically create such ways of browsing data while allowing far deeper queries. There are two main challenges when dealing with such data:

    • the number of entries: with 8,000,000 placenames, we have to use efficient tools for storing and querying them.
    • the structure of the data: the different types of entries are separated in different files, but should be merged for efficient queries (i.e. we have to rebuild the different links between entities, e.g Location to Country or Location to AdministrativeRegion).

    Data model

    With CubicWeb, the data model of the application is written in Python. It defines different entity classes with their attributes, as well as the relationships between the different entity classes. Here is a sample of the that we have used for Geonames data:

    class Location(EntityType):
        name = String(maxsize=1024, indexed=True)
        uri = String(unique=True, indexed=True)
        geonameid = Int(indexed=True)
        latitude = Float(indexed=True)
        longitude = Float(indexed=True)
        feature_code = SubjectRelation('FeatureCode', cardinality='?*', inlined=True)
        country = SubjectRelation('Country', cardinality='?*', inlined=True)
        main_administrative_region = SubjectRelation('AdministrativeRegion',
                                  cardinality='?*', inlined=True)
        timezone = SubjectRelation('TimeZone', cardinality='?*', inlined=True)

    This indicates that the main Location class has a name attribute (string), an uri (string), a geonameid (integer), a latitude and a longitude (both floats), and some relation to other entity classes such as FeatureCode (the relation is named feature_code), Country (the relation is named country), or AdministrativeRegion called main_administrative_region.

    The cardinality of each relation is classically defined in a similar way as RDBMS, where * means any number, ? means zero or one and 1 means one and only one.

    We give below a visualisation of the schema (obtained using the /schema relative url)


    The data contained in the CSV files could be pushed and stored without any processing, but it is interesting to reconstruct the relations that may exist between different entities and entity classes, so that queries will be easier and faster.

    Executing the import procedure took us 80 minutes on regular hardware, which seems very reasonable given the amount of data (~7,000,000 entities, 920MB for the allCountries.txt file), and the fact that we are also constructing many indexes (on attributes or on relations) to improve the queries. This import procedure uses some low-level SQL commands to load the data into the underlying relational database.

    Queries and views

    As stated before, queries are performed in CubicWeb using RQL (Relational Query Language), which is similar to SPARQL, but with a syntax that is closer to SQL. This language may be used to query directly the concepts while abstracting the physical structure of the underlying database. For example, one can use the following request:

    Any X LIMIT 10 WHERE X is Location, X population > 1000000,
        X country C, C name "France"

    that means:

    Give me 10 locations that have a population greater than 1000000, and that are in a country named "France"

    The corresponding SQL query is:

    SELECT _X.cw_eid FROM cw_Country AS _C, cw_Location AS _X
    WHERE _X.cw_population>1000000
          AND _X.cw_country=_C.cw_eid AND _C.cw_name="France"
    LIMIT 10

    We can see that RQL is higher-level than SQL and abstracts the details of the tables and the joins.

    A query returns a result set (a list of results), that can be displayed using views. A main feature of CubicWeb is to separate the two steps of querying the data and displaying the results. One can query some data and visualize the results in the standard web framework, download them in different formats (JSON, RDF, CSV,...), or display them in some specific view developed in Python.

    In particular, we will use the which is based on the Mapstraction and the OpenLayers libraries to display information on maps using data from OpenStreetMap. This view uses a feature of CubicWeb called adapter. An adapter adapts a class of entity to some interface, hence views can rely on interfaces instead of types and be able to display entities with different attributes and relations. In our case, the IGeocodableAdapter returns a latitude and a longitude for a given class of entity (here, the mapping is trivial, but there are more complex cases... :) ):

    class IGeocodableAdapter(EntityAdapter):
          __regid__ = 'IGeocodable'
          __select__ = is_instance('Location')
          def latitude(self):
              return self.entity.latitude
          def longitude(self):
              return self.entity.longitude

    We will give some results of queries and views later. It is important to notice that the following screenshoots are taken without any modification of the standard web interface of CubicWeb. It is possible to write specific views and to define a specific CSS, but we only wanted to show how CubicWeb could handle such data. However, the default web template of CubicWeb is sufficient for what we want to do, as it dynamically creates web pages showing attributes and relations, as well as some specific forms and javascript applets adapted directly to the data (e.g. map-based tools). Last but not least, the query and the view could be defined within the url, and thus open a world of new possibilities to the user:

    http://baseurl:port/?rql=The query that I want&vid=Identifier-of-the-view


    We will not get into too much details about Facets, but let's just say that this feature may be used to determine some filtering axis on the data, and thus may be used to post-filter a result set. In this example, we have defined four different facets: on the population, on the elevation, one the feature_code and one the main_administrative_region. We will see illustration of these facets below.

    We give here an example of the definition of a Facet:

    class LocationPopulationFacet(facet.RangeFacet):
        __regid__ = 'population-facet'
        __select__ = is_instance('Location')
        order = 2
        rtype = 'population'

    where __select__ defines which class(es) of entities are targeted by this facet, order defines the order of display of the different facets, and rtype defines the target attribute/relation that will be used for filtering.

    Geonames in CubicWeb

    The main page of the Geoname application is illustrated in the screenshot below. It provides general information on the database, in particular the number of entities in the different classes:

    • 7,984,330 locations.
    • 59,201 administrative regions (e.g. regions, counties, departments...)
    • 7,766 languages.
    • 656 features (e.g. types of location).
    • 410 time zones.
    • 252 countries.
    • 7 continents.

    Simple query

    We will first illustrate the possibilites of CubicWeb with the simple query that we have detailed before (that could be directly pasted in the url...):

    Any X LIMIT 10 WHERE X is Location, X population > 1000000,
        X country C, C name "France"

    We obtain the following page:

    This is the standard view of CubicWeb for displaying results. We can see (right box) that we obtain 10 locations that are indeed located in France, with a population of more than 1,000,000 inhabitants. The left box shows the search panel that could be used to launch queries, and the facet filters that may be used for filtering results, e.g. we may ask to keep only results with a population greater than 4,767,709 inhabitants within the previous results:

    and we obtain now only 4 results. We can also notice that the facets are linked: by restricting the result set using the population facet, the other facets also restricted their possibilities.

    Simple query (but with more information !)

    Let's say that we now want more information about the results that we have obtained previously (for example the exact population, the elevation and the name). This is really simple ! We just have to ask within the RQL query what we want (of course, the names N, P, E of the variables could be almost anything...):

    Any N, P, E LIMIT 10 WHERE X is Location,
        X population P, X population > 1000000,
        X elevation E, X name N, X country C, C name "France"

    The empty column for the elevation simply means that we don't have any information about elevation.

    Anyway, we can see that fetching particular information could not be simpler! Indeed, with more complex queries, we can access countless information from the Geonames database:

    Any N,E,LA,LO ORDERBY E DESC LIMIT 10  WHERE X is Location,
          X latitude LA, X longitude LO,
          X elevation E, NOT X elevation NULL, X name N,
          X country C, C name "France"

    which means:

    Give me the 10 highest locations (the 10 first when sorting by decreasing elevation) with their name, elevation, latitude and longitude that are in a country named "France"

    We can now use another view on the same request, e.g. on a map (view

    Any X ORDERBY E DESC LIMIT 10  WHERE X is Location,
           X latitude LA, X longitude LO, X elevation E,
           NOT X elevation NULL, X country C, C name "France"

    And now, we can add the fact that we want more results (20), and that the location should have a non-null population:

    Any N, E, P, LA, LO ORDERBY E DESC LIMIT 20  WHERE X is Location,
           X latitude LA, X longitude LO,
           X elevation E, NOT X elevation NULL, X population P,
           X population > 0, X name N, X country C, C name "France"

    ... and on a map ...


    In this blog, we have seen how CubicWeb could be used to store and query complex data, while providing (among other...) Web-based views for data vizualisation. It allows the user to directly query data within the URL and may be used to interact with and explore the data in depth. In a next blog, we will give more complex queries to show the full possibilities of the system.