Data.gouv.fr is great news for the OpenData movement! Two days ago, the French government released thousands of data sets on http://data.gouv.fr/ under an open licensing scheme that allows people to access and play with them. Thanks to the CubicWeb semantic web framework, it took us only a couple hours to put some of that open data to good use. Here is how we mapped the french railway system. Train stations in french Britany Source DatasetsWe used two of the datasets available on data.gouv.fr:
Data ModelGiven the above datasets, we wrote the following data model to store the data in CubicWeb: class Location(EntityType):
name = String(indexed=True)
latitude = Float(indexed=True)
longitude = Float(indexed=True)
feature_type = SubjectRelation('FeatureType', cardinality='?*')
data_source = SubjectRelation('DataGovSource', cardinality='1*', inlined=True)
class FeatureType(EntityType):
name = String(indexed=True)
class DataGovSource(EntityType):
name = String(indexed=True)
description = String()
uri = String(indexed=True)
icon = String()
The Location object is used for both train stations and level crossings. It has a name (text information), a latitude and a longitude (numeric information), it can be linked to multiple FeatureType objects and to a DataGovSource. The FeatureType object is used to store the type of train station or level crossing and is defined by a name (text information). The DataGovSource object is defined by a name, a description and a uri used to link back to the source data on data.gouv.fr. Schema of the data model Data ImportWe had to write a few lines of code to benefit from the massive data import feature of CubicWeb before we could load the content of the CSV files with a single command: $ cubicweb-ctl import-datagov-location datagov_geo gare.csv-fr.CSV --source-type=gare
$ cubicweb-ctl import-datagov-location datagov_geo passage_a_niveau.csv-fr.CSV --source-type=passage
In less than a minute, the import was completed and we had:
Data visualizationCubicWeb allows to build complex applications by assembling existing components (called cubes). Here we used a cube that wraps the Mapstraction and the OpenLayers libraries to display information on maps using data from OpenStreetMap. In order for the Location type defined in the data model to be displayable on a map, it is sufficient to write the following adapter: class IGeocodableAdapter(EntityAdapter):
__regid__ = 'IGeocodable'
__select__ = is_instance('Location')
@property
def latitude(self):
return self.entity.latitude
@property
def longitude(self):
return self.entity.longitude
That was it for the development part! The next step was to use the application to browse the structure of the french train network on the map. Train stations in use: Train stations not in use: Zooming on some parts of the map, for example Brittany, we get to see more details and clicking on the train icons gives more information on the corresponding Location. Train stations in use: Train stations not in use: Since CubicWeb separates querying the data and displaying the result of a query, we can switch the view to display the same data in tables or to export it back to a CSV file. Querying DataCubicWeb implements a query langage very similar to SPARQL, that makes the data available without the need to learn a specific API.
As you could see in the last URL, the map view was chosen directly with the parameter vid, meaning that the URL is shareable and can be easily included in a blog with a iframe for example. Data sharingThe result of a query can also be "displayed" in RDF, thus allowing users to download a semantic version of the information, without having to do the preprocessing themselves: <rdf:Description rdf:about="cwuri24684b3a955d4bb8830b50b4e7521450">
<rdf:type rdf:resource="http://ns.cubicweb.org/cubicweb/0.0/Location"/>
<cw:cw_source rdf:resource="http://some.url.demo/"/>
<cw:longitude rdf:datatype="http://www.w3.org/2001/XMLSchema#float">-1.89599</cw:longitude>
<cw:latitude rdf:datatype="http://www.w3.org/2001/XMLSchema#float">47.67778</cw:latitude>
<cw:feature_type rdf:resource="http://some.url.demo/7222"/>
<cw:data_source rdf:resource="http://some.url.demo/7206"/>
</rdf:Description>
ConclusionFor someone who knows the CubicWeb framework, a couple hours are enough to create a CubicWeb application that stores, displays, queries and shares data downloaded from http://www.data.gouv.fr/ The full source code for the above will be released before the end of the week. If you want to see more of CubicWeb in action, browse http://data.bnf.fr or learn how to develop your own application at http://docs.cubicweb.org/ |


ensure that 2 boolean attributes of an entity never have the same value

Comments
Great post Vincent, thank you for sharing! I will definitely give a try to this.
No doubt the french government would benefit from CubicWeb technology instead of distributing our data as inert CSV files...