We have been playing along with political data for a while, using CubicWeb to store and query various sets of open data (e.g. NosDeputes, data.gouv.fr), and testing different visualization tools. In particular, we have extended our prototype of News Analysis (see the presentation we made last year at Euroscipy), in order to use these political datasets as reference for the named entities extraction part. Last week's conference "The Law Factory" at Sciences Po was a really nice opportunity to meet people with similar interests in opendata for political sciences, and to find out which questions we should be asking our data ! Check out the talk of our presentation and a few screencasts (no sound) :
- Query - Example 1
- Query - Example 2
- Query - Example 3
- Visualization - Example 1
- Visualization - Example 2
- Visualization - Example 3
Comments are welcome !
Among the different things that we have seen, we want to emphasize on:
- Law is Code (http://gitorious.org/law-is-code/) - This project by the team of Regards Citoyens, aims at analysing the laws and amendments, by extracting information from the French National Assembly website, and by pushing the contributions of the members of parlement to a given law in a git repository. If we can find the time, we'll turn that into a mercurial repository and integrate it into our above application using cubicweb-vcsfile.
- Both national websites (Assemblée Nationale, Sénat), do not allow (yet...) to get data any other way than parsing the sites. However, it seems that the people involved are aware of the issues of opendata, and this may changed in the next months. In particular, the Senat use two databases (Basile and Ameli), and opening them to the public could be really interesting
- Different projects about African parlements can be found on the following website : http://www.parliaments.info
- Check out, ITCparliement which gives tools to analyse and share data from many different parliments.
Saturday, at La Cantine Numérique, the discussions focused on the possibilities to share tools, and the possible collaborations. I think that this is the crucial point: How people can share tools and use them in a efficient way, without being an IT expert ?
In this way, we have are thinking about some evolutions of CubicWeb that can fullfill (part) of these requirements:
- easier installation, especially on Windows, and easier Postgresql configuration. This could perhaps be made by allowing some graphical interface for creating/managing the instances and the databases.
- a graphical tool for schema construction. Even if the construction of a data model in CubicWeb is quite simple, and rely on the straightforward Python syntax, it could be interesting to expose a graphical tool for adding/removing/modifying entities from the schema, as well as some attributes or relations.
- easier ways to import data. This point is not trivial, and we don't want to develop a specific language for defining import rules, that could be used for 80% of the cases, but will be painful to extend to the 20% exotic cases. We would rather develop some helpers to ease the building of some import scripts in Python, and to upload some CubicWeb instances already filled with open databases.
As a follow up of the conference, we are openning a demo site using CubicWeb to expose data of the past legislative and presidential elections (2002, 2007, 2012)
This demo site allows you to deeply explore the data, with different visualisations, and complex queries. Again, comments are welcome, especially if you want to retrieve some information but you don't know how to! This demo site will probably evolve in the next weeks, and we will use it to test different cubes that we have been building.
PS: We are sorry we cannot open the propotype of news aggregator for now, as there are still licensing issues concerning the reusability of the different news sources that we get articles from.