Blog entries

Comparing CubicWeb with Drupal plus CCK extension

2009/10/29 by Nicolas Chauvat
http://www.cubicweb.org/image/502151?vid=download

Drupal is a CMS written in PHP that is getting more and more visibility in the Semantic Web crowd. Several researchers from DERI have been using it as a test bed for their research projects and developed extensions to showcase their ideas. It is for example used to build the Semantic Web Dog Food site that archives the semantic web conferences and publishes them as Linked Open Data. The URL for this year's ISWC is http://data.semanticweb.org/conference/iswc/2009

This led me to read more about Drupal than I had had the incentive before. I have not had time to give it a try, but I skimmed the documentation and will try to compare it with CubicWeb from a software architecture point of view.

Drupal defines a Node as an information item. The CCK (aka Content Construction Kit) can be used to define new types of Nodes thru a web interface. Nodes and the bits and pieces used to display them as HTML are not packed together in components. The Features extension is planning on getting this bits packaged.

If you are a Drupal user/developer and think I am not being fair to Drupal, please comment below.

On the other hand, CubicWeb has implemented very early the concept of reusable component. What is called a Node in Drupal is an Entity in CubicWeb. By design, CubicWeb does not have a web interface to define entities. The data model is part of the code. To efficiently maintain applications in production, changes to the data model must be tracked with changes to the code. Data model changes imply migration procedures. In CubicWeb, all of this is versionned and made part of the components. Where Drupal needs to grow extensions like CCK and Features, CubicWeb has more advanced possibilities by design, for example the ability to develop featurefull applications by assembling components.

This was a very short comparison. I'm looking forward to getting a chance of discussing it with knowledgeable Drupal hackers.


Customizing search box with magicsearch

2009/12/13 by Adrien Di Mascio

During last cubicweb sprint, I was asked if it was possible to customize the search box CubicWeb comes with. By default, you can use it to either type RQL queries, plain text queries or standard shortcuts such as <EntityType> or <EntityType> <attrname> <value>.

Ultimately, all queries are translated to rql since it's the only language understood on the server (data) side. To transform the user query into RQL, CubicWeb uses the so-called magicsearch component which in turn delegates to a number of query preprocessor that are responsible of interpreting the user query and generating corresponding RQL.

The code of the main processor loop is easy to understand:

for proc in self.processors:
    try:
        return proc.process_query(uquery, req)
    except (RQLSyntaxError, BadRQLQuery):
        pass

The idea is simple: for each query processor, try to translate the query. If it fails, try with the next processor, if it succeeds, we're done and the RQL query will be executed.

Now that the general mechanism is understood, here's an example of code that could be used in a forge-based cube to add a new search shortcut to find tickets. We'd like to use the project_name:text syntax to search for tickets of project_name containing text (e.g pylint:warning).

Here's the corresponding preprocessor code:

from cubicweb.web.views.magicsearch import BaseQueryProcessor

class MyCustomQueryProcessor(BaseQueryProcessor):
    priority = 0 # controls order in which processors are tried

    def preprocess_query(self, uquery, req):
        """
        :param uqery: the query as sent by the browser
        :param req: the standard, omnipresent, cubicweb's req object
        """
        try:
            project_name, text = uquery.split(':')
        except ValueError:
            return None # the shortcut doesn't apply
        return (u'Any T WHERE T is Ticket, T concerns P, P name %(p)s, '
                u'T has_text %(t)s', {'p': project_name, 't': text})

The code is rather self-explanatory, but here's a few additional comments:

  • the class is registered with the standard vregistry mechanism and should be defined along the views
  • the priority attribute is used to sort and define the order in which processors will be tried in the main processor loop
  • the preprocess_query returns None or raise an exception if the query can't be processed

To summarize, if you want to customize the search box, you have to:

  1. define a new query preprocessor component
  2. define its priority wrt other standard processors
  3. implement the preprocess_query method

and CubicWeb will do the rest !


Reusing OpenData from Data.gouv.fr with CubicWeb in 2 hours

2011/12/07 by Vincent Michel

Data.gouv.fr is great news for the OpenData movement!

Two days ago, the French government released thousands of data sets on http://data.gouv.fr/ under an open licensing scheme that allows people to access and play with them. Thanks to the CubicWeb semantic web framework, it took us only a couple hours to put some of that open data to good use. Here is how we mapped the french railway system.

http://www.cubicweb.org/file/2110281?vid=download

Train stations in french Britany

Source Datasets

We used two of the datasets available on data.gouv.fr:

  • Train stations : description of the 6442 train stations in France, including their name, type and geographic coordinates. Here is a sample of the file

    441000;St-Germain-sur-Ille;Desserte Voyageur;48,23955;-1,65358
    441000;Montreuil-sur-Ille;Desserte Voyageur-Infrastructure;48,3072;-1,6741
    
  • LevelCrossings : description of the 18159 level crossings on french railways, including their type and location. Here is a sample of the file

    558000;PN privé pour voitures avec barrières sans passage piétons accolé;48,05865;1,60697
    395000;PN privé pour voitures avec barrières avec passage piétons accolé public;;48,82544;1,65795
    

Data Model

Given the above datasets, we wrote the following data model to store the data in CubicWeb:

class Location(EntityType):
    name = String(indexed=True)
    latitude = Float(indexed=True)
    longitude = Float(indexed=True)
    feature_type = SubjectRelation('FeatureType', cardinality='?*')
    data_source = SubjectRelation('DataGovSource', cardinality='1*', inlined=True)

class FeatureType(EntityType):
    name = String(indexed=True)

class DataGovSource(EntityType):
    name = String(indexed=True)
    description = String()
    uri = String(indexed=True)
    icon = String()

The Location object is used for both train stations and level crossings. It has a name (text information), a latitude and a longitude (numeric information), it can be linked to multiple FeatureType objects and to a DataGovSource. The FeatureType object is used to store the type of train station or level crossing and is defined by a name (text information). The DataGovSource object is defined by a name, a description and a uri used to link back to the source data on data.gouv.fr.

http://www.cubicweb.org/file/2110311?vid=download

Schema of the data model

Data Import

We had to write a few lines of code to benefit from the massive data import feature of CubicWeb before we could load the content of the CSV files with a single command:

$ cubicweb-ctl import-datagov-location datagov_geo gare.csv-fr.CSV  --source-type=gare
$ cubicweb-ctl import-datagov-location datagov_geo passage_a_niveau.csv-fr.CSV  --source-type=passage

In less than a minute, the import was completed and we had:

  • 2 DataGovSource objects, corresponding to the two data sets,
  • 24 FeatureType objects, corresponding to the different types of locations that exist (e.g. Non exploitée, Desserte Voyageur, PN public isolé pour piétons avec portillons or PN public pour voitures avec barrières gardé avec passage piétons accolé manoeuvré à distance),
  • 24601 Locations, corresponding to the different train stations and level crossings.

Data visualization

CubicWeb allows to build complex applications by assembling existing components (called cubes). Here we used a cube that wraps the Mapstraction and the OpenLayers libraries to display information on maps using data from OpenStreetMap.

In order for the Location type defined in the data model to be displayable on a map, it is sufficient to write the following adapter:

class IGeocodableAdapter(EntityAdapter):
      __regid__ = 'IGeocodable'
      __select__ = is_instance('Location')
      @property
      def latitude(self):
          return self.entity.latitude
      @property
      def longitude(self):
          return self.entity.longitude

That was it for the development part! The next step was to use the application to browse the structure of the french train network on the map.

Train stations in use:

http://www.cubicweb.org/file/2110279?vid=download

Train stations not in use:

http://www.cubicweb.org/file/2110280?vid=download

Zooming on some parts of the map, for example Brittany, we get to see more details and clicking on the train icons gives more information on the corresponding Location.

Train stations in use:

http://www.cubicweb.org/file/2110281?vid=download

Train stations not in use:

http://www.cubicweb.org/file/2110282?vid=download

Since CubicWeb separates querying the data and displaying the result of a query, we can switch the view to display the same data in tables or to export it back to a CSV file.

http://www.cubicweb.org/file/2110313?vid=download

Querying Data

CubicWeb implements a query langage very similar to SPARQL, that makes the data available without the need to learn a specific API.

  • Example 1: http:/some.url.demo/?rql=Any X WHERE X is Location, X name LIKE "%miny"

    This request gives all the Location with a name that ends with "miny". It returns only one element, the Firminy train station.

http://www.cubicweb.org/file/2110286?vid=download
  • Example 2: http:/some.url.demo/?rql=Any X WHERE X is Location, X name LIKE "%ny"

    This request gives all the Location with a name that ends with "ny", and return 112 trainstations.

http://www.cubicweb.org/file/2110287?vid=download
  • Example 3: http:/some.url.demo/?rql=Any X WHERE X latitude < 47.8, X latitude>47.6, X longitude >-1.9, X longitude<-1.8

    This request gives all the Location that have a latitude between 47.6 and 47.8, and a longitude between -1.9 and -1.8.

    We obtain 11 Location (9 levelcrossings and 2 trainstations). We can map them using the view mapstraction.map that we describe previously.

    http://www.cubicweb.org/file/2110288?vid=download
  • Example 4: http:/domainname:8080/?rql=Any X WHERE X latitude < 47.8, X latitude>47.6, X longitude >-1.9, X longitude<-1.8, X feature_type F, F name "Desserte Voyageur"

    Will limit the previous results set to train stations that are used for passenger service:

    http://www.cubicweb.org/file/2110289?vid=download
  • Example 5: http:/domainname:8080/?rql=Any X WHERE X feature_type F, F name "PN public pour voitures sans barrières sans SAL"&vid=mapstraction.map

    Finally, one can map all the level crossings for vehicules without barriers (there are 3704):

    http://www.cubicweb.org/file/2110290?vid=downloadhttp://www.cubicweb.org/file/2110291?vid=download

As you could see in the last URL, the map view was chosen directly with the parameter vid, meaning that the URL is shareable and can be easily included in a blog with a iframe for example.

Data sharing

The result of a query can also be "displayed" in RDF, thus allowing users to download a semantic version of the information, without having to do the preprocessing themselves:

<rdf:Description rdf:about="cwuri24684b3a955d4bb8830b50b4e7521450">
  <rdf:type rdf:resource="http://ns.cubicweb.org/cubicweb/0.0/Location"/>
  <cw:cw_source rdf:resource="http://some.url.demo/"/>
  <cw:longitude rdf:datatype="http://www.w3.org/2001/XMLSchema#float">-1.89599</cw:longitude>
  <cw:latitude rdf:datatype="http://www.w3.org/2001/XMLSchema#float">47.67778</cw:latitude>
  <cw:feature_type rdf:resource="http://some.url.demo/7222"/>
  <cw:data_source rdf:resource="http://some.url.demo/7206"/>
</rdf:Description>

Conclusion

For someone who knows the CubicWeb framework, a couple hours are enough to create a CubicWeb application that stores, displays, queries and shares data downloaded from http://www.data.gouv.fr/

The full source code for the above will be released before the end of the week.

If you want to see more of CubicWeb in action, browse http://data.bnf.fr or learn how to develop your own application at http://docs.cubicweb.org/