subscribe to this blog

CubicWeb Blog

News about the framework and its uses.

How to create your own forms and controllers?

2012/09/05 by Stéphane Bugat

Aim

Sometimes you need to associate to a given view your own specific form and the associated controller. We will see in this blog entry how it can be done in cubicweb on a concrete case.

The case

Let's suppose you're working on a social network project where you have to develop friend-of-a-frient (foaf) relationships between persons. For that purpose, we use the cubicweb-person cube and create in our scheme relations between persons like X in_contact_with Y:

class in_contact_with(RelationDefinition):
      subject = 'Person'
      object = 'Person'
      cardinality = '**'
      symmetric = True

We will also assume that a given Person corresponds to a unique CWUser through the relation is_user.

Although it is not evident, we would like that any connected person can chose to disconnect himself from another person at any time. For that, we will create a table view that will display the list of connected users, with a custom column giving the ability to "disconnect" with the person.

Before disconnecting with this particular person, we would like also to have a confirmation form.

How to proceed

The following steps were defined to address the above issue:

  1. Define a "contact view" that will display the list of known contacts of the connected user ;
  2. In this contact view, allow the user to click on a specific contact so as to remove him ;
  3. Create a deletion confirmation view, that will contain:
    • A form holding the buttons for deletion confirmation or cancel;
    • A controller responsible for the actual deletion or the cancelling.

The contact view

Rendering a table view of connected persons

To display the list of connected persons to the current person, but also to add custom columns that do not refer specifically to attributes of a given entity, the best choice is to use EntityTableView (see here for more information):

class ContactView(EntityTableView):
    __regid__ = 'contacts_tableview'
    __select__ = is_instance('Person')
    columns = ['person', 'firstname', 'surname', 'email', 'phone', 'remove']
    layout_args = {'display_filter': 'top', 'add_view_actions': None}

    def cell_remove(w, entity):
        """link to the suppression of the relation between both contacts"""
        icon_url = entity._cw.data_url('img/user_delete.png')
        action_url = entity._cw.build_url(eid=entity.eid,
                vid='suppress_contact_view',
                __redirectpath=entity._cw.relative_path(),
                __redirectvid=entity._cw.form.get('__redirectvid', ''))
        w(u'<a href="%(actionurl)s" title="%(title)s">'
                u'<img alt="%(title)s" src="%(url)s" /></a>'
                % {'actionurl': xml_escape(action_url),
                   'title': _('remove from contacts'),
                   'url':icon_url})

    column_renderers = {
            'person': MainEntityColRenderer(),
            'email': RelatedEntityColRenderer(
                getrelated=lambda x:x.primary_email and x.primary_email[0] \
                        or None),
            'phone': RelatedEntityColRenderer(
                getrelated=lambda x:x.phone and x.phone[0] or None),
            'remove': EntityTableColRenderer(
                renderfunc=cell_remove,
                header=''),}

A few explanations about the above view:

  • By default, the column attribute contains a list of displayable attributes of the entity. If one element of the list does not correspond to an attribute, which is the case for 'remove' here, it has to have rendering function defined in the dictionnary column_renderers.
  • However, when the column header refers to a related entity attribute, we can easily use the rendering function RelatedEntityColRenderer, as it is the case for the email and phone display.
  • As for concerns the 'remove' column, we render a clickable image in the cell_remove method. Here we have chosen an icon from famfamsilk that is putted in our data/ directory, but feel free to chose a predefined icon in the cubicweb shared data directory.

The redirection URL associated to each image has to be a link to a specific action allowing the user to remove the selected person from its contacts. It is built using the self._cw.build_url() convenience function. The redirection view, 'suppress_contact_view', will be defined later on. The eid argument passed refers to the id of the contact person the user wants to remove.

Calling the contact view

The above view has to be called with a given rset which corresponds to the list of known contacts for the connected user. In our case, we have defined a StartupView for the contact management, in which in the call function we have added the following piece of code:

person = self._cw.user.related('is_user', 'object').get_entity(0,0)
rset = self._cw.execute(
        'Any X WHERE X is Person, X in_contact_with Y, '
        'Y eid %(eid)s', {'eid': person.eid})
self.w(u'<h3>' + _('Number of contacts in my network:'))
self.w(unicode(len(rset)) + u'</h3>')
if len(rset) != 0:
    self.wview('contacts_tableview', rset)

The Person corresponding to the connected user is retrieved thanks to the use of the related method and the is_user relation. The contact table view is displayed inside the parent StartupView.

Creation of the deletion confirmation view

Defining the confirmation view for contact deletion

The corresponding view is a simple View class instance, that will display a confirmation message and the related buttons. It could be defined as follows:

class SuppressContactView(View):
    __regid__ = 'suppress_contact_view'

    def cell_call(self, row, col):
        entity = self.cw_rset.get_entity(row, col)
        msg = self._cw._('Are you sure you want to remove %(name)s from your contacts?')
        self.w(u'<p>' + msg % {'name': entity.dc_long_title()} + u'</p>')
        form = self._cw.vreg['forms'].select('suppress_contact_form',
                self._cw, rset=self.cw_rset)
        form.add_hidden(u'eidto', entity.eid)
        form.add_hidden(u'eidfrom', self._cw.user.related('is_user',
            'object').get_entity(0,0).eid)
        form.render(w=self.w)

Inside the cell_call() method of this view, we will have to render a form which aims at displaying both buttons (confirm deletion or cancel deletion). This form will be described later on.

The Person contact to remove is retrieved easily thanks to cw_rset. The Person corresponding to the connected user is here also retrieved thanks to the is_user relation. To make both of them available in the form, we add them at the instanciation of the form using the convenience function add_hidden(key,val).

Defining the deletion form

The deletion form as mentioned previously is only here to hold both buttons for the deletion confirmation or the cancelling. Both buttons are declared thanks to the form_buttons attribute of the form, which is instanciated from forms.FieldsForm:

class SuppressContactForm(forms.FieldsForm):
    __regid__ = 'suppress_contact_form'
    domid = 'delete_contact_form'
    form_renderer_id = 'base'

    @property
    def action(self):
        return self._cw.build_url('suppress_contact_controller')

    form_buttons = [
            fw.Button(stdmsgs.BUTTON_DELETE, cwaction='delete'),
            fw.Button(stdmsgs.BUTTON_CANCEL, cwaction='cancel')]

Specifying a given domid will ensure that your form will have a specific DOM identifier,the controller defined in the action method will be called without any ambiguity. The form_renderer_id is precised here so as to avoid additional display of informations which don't make sense here.

Defining the controller

The custom controller is instanciated from the Controller class in cubicweb.web.controller. The declaration of the controller should have the same domid than the calling form, as mentioned previously. The related actions are described in the publish() method of the controller:

class SuppressContactController(Controller):
    __regid__ = 'suppress_contact_controller'
    domid = 'delete_contact_form'

    def publish(self, rset=None):
        if '__action_cancel' in self._cw.form.keys():
            msg = self._cw._('Deletion canceled')
            raise Redirect(self._cw.build_url(
                vid='contact_management_view',
                __message=msg))
        elif '__action_delete' in self._cw.form.keys():
            xid = self._cw.form['eidfrom']
            dead_contact = self._cw.entity_from_eid(xid)
            yid = self._cw.form['eidto']
            self._cw.execute(
                    'DELETE X in_contact_with Y'
                    '  WHERE X eid %(xid)s, Y eid %(yid)s',
                    {'xid': xid, 'yid': yid})
            msg = self._cw._('%s removed from your contacts') %\
                dead_contact.dc_long_title()
            raise Redirect(self._cw.build_url(
                vid='contact_management_view',
                __message=msg))

Retrieving of the user action is performed by testing if the '__action_<action>', where <action> refers to the cwaction in the button declaration, is present in the form keys. In the case of a cancelling, we simply redirect to the contact management view with a message specifying that the deletion has been cancelled. In the case of a deletion confirmation, both Person id's for the connected user and for the contact to remove are retrieved from the form hidden arguments.

The deletion is performed using an RQL request on the relation in_contact_with. We also redirect the view to the contact management view, this time with another message confirming the deletion of the contact link.


Logilab at the LawFactory

2012/07/16 by Vincent Michel

We have been playing along with political data for a while, using CubicWeb to store and query various sets of open data (e.g. NosDeputes, data.gouv.fr), and testing different visualization tools. In particular, we have extended our prototype of News Analysis (see the presentation we made last year at Euroscipy), in order to use these political datasets as reference for the named entities extraction part. Last week's conference "The Law Factory" at Sciences Po was a really nice opportunity to meet people with similar interests in opendata for political sciences, and to find out which questions we should be asking our data ! Check out the talk of our presentation and a few screencasts (no sound) :

Comments are welcome !

Interresting things seen at #OLPC

Among the different things that we have seen, we want to emphasize on:

  • Law is Code (http://gitorious.org/law-is-code/) - This project by the team of Regards Citoyens, aims at analysing the laws and amendments, by extracting information from the French National Assembly website, and by pushing the contributions of the members of parlement to a given law in a git repository. If we can find the time, we'll turn that into a mercurial repository and integrate it into our above application using cubicweb-vcsfile.
http://www.cubicweb.org/file/2423768?vid=download
  • Both national websites (Assemblée Nationale, Sénat), do not allow (yet...) to get data any other way than parsing the sites. However, it seems that the people involved are aware of the issues of opendata, and this may changed in the next months. In particular, the Senat use two databases (Basile and Ameli), and opening them to the public could be really interesting
  • Different projects about African parlements can be found on the following website : http://www.parliaments.info
  • Check out, ITCparliement which gives tools to analyse and share data from many different parliments.

Saturday, at La Cantine Numérique, the discussions focused on the possibilities to share tools, and the possible collaborations. I think that this is the crucial point: How people can share tools and use them in a efficient way, without being an IT expert ?

How does this inspire us for CubicWeb ?

In this way, we have are thinking about some evolutions of CubicWeb that can fullfill (part) of these requirements:

  • easier installation, especially on Windows, and easier Postgresql configuration. This could perhaps be made by allowing some graphical interface for creating/managing the instances and the databases.
  • a graphical tool for schema construction. Even if the construction of a data model in CubicWeb is quite simple, and rely on the straightforward Python syntax, it could be interesting to expose a graphical tool for adding/removing/modifying entities from the schema, as well as some attributes or relations.
  • easier ways to import data. This point is not trivial, and we don't want to develop a specific language for defining import rules, that could be used for 80% of the cases, but will be painful to extend to the 20% exotic cases. We would rather develop some helpers to ease the building of some import scripts in Python, and to upload some CubicWeb instances already filled with open databases.

Demo of CubicWeb as a follow up

As a follow up of the conference, we are openning a demo site using CubicWeb to expose data of the past legislative and presidential elections (2002, 2007, 2012)

https://www.cubicweb.org/file/2425136?&vid=download

The data used is published under Licence Ouverte / Open Licence by http://data.gouv.fr.

This demo site allows you to deeply explore the data, with different visualisations, and complex queries. Again, comments are welcome, especially if you want to retrieve some information but you don't know how to! This demo site will probably evolve in the next weeks, and we will use it to test different cubes that we have been building.

PS: We are sorry we cannot open the propotype of news aggregator for now, as there are still licensing issues concerning the reusability of the different news sources that we get articles from.


What's new in CubicWeb 3.15

2012/05/14 by Sylvain Thenault

CubicWeb 3.15 introduces a bunch of new functionalities. In short (more details below):

  • ability to use ZMQ instead of Pyro to connect to repositories
  • ZMQ inter-instances messages bus
  • new LDAP source using the datafeed approach, much more flexible than the legacy 'ldapuser' source
  • full undo support

Plus some refactorings regarding Ajax function calls, WSGI, the registry, etc. Read more for the detail.

New functionalities

  • Add ZMQ server, based on the cutting edge ZMQ socket library. This allows to access distant instances, in a similar way as Pyro.
  • Publish/subscribe mechanism using ZMQ for communication among cubicweb instances. The new zmq-address-sub and zmq-address-pub configuration variables define where this communication occurs. As of this release this mechanism is used for entity cache invalidation.
  • Improved WSGI support. While there are still some caveats, most of the code which was twisted only is now generic and allows related functionalities to work with a WSGI front-end.
  • Full undo/transaction support: undo of modifications has finally been implemented, and the configuration simplified (basically you activate it or not on an instance basis).
  • Controlling HTTP status code returns is now much easier:
    • WebRequest now has a status_out attribute to control the response status ;
    • most web-side exceptions take an optional status argument.

API changes

  • The base registry implementation has been moved to a new logilab.common.registry module (see #1916014). This includes code from :

    • cubicweb.vreg (everything that was in there)
    • cw.appobject (base selectors and all).

    In the process, some renaming was done:

    • the top level registry is now RegistryStore (was VRegistry), but that should not impact CubicWeb client code;
    • former selectors functions are now known as "predicate", though you still use predicates to build an object'selector;
    • for consistency, the objectify_selector decorator has hence been renamed to objectify_predicate;
    • on the CubicWeb side, the selectors module has been renamed to predicates.

    Debugging refactoring dropped the need for the lltrace decorator. There should be full backward compat with proper deprecation warnings. Notice the yes predicate and objectify_predicate decorator, as well as the traced_selection function should now be imported from the logilab.common.registry module.

  • All login forms are now submitted to <app_root>/login. Redirection to requested page is now handled by the login controller (it was previously handled by the session manager).

  • Publisher.publish has been renamed to Publisher.handle_request. This method now contains a generic version of the logic previously handled by Twisted. Controller.publish is not affected.

Unintrusive API changes

  • New 'ldapfeed' source type, designed to replace 'ldapuser' source with data-feed (i.e. copy based) source ideas.
  • New 'zmqrql' source type, similar to 'pyrorql' but using ømq instead of Pyro.
  • A new registry called 'services' has appeared, where you can register server-side cubicweb.server.Service child classes. Their call method can be invoked from a web-side AppObject instance using the new self._cw.call_service method or a server-side one using self.session.call_service. This is a new way to call server-side methods, much cleaner than monkey patching the Repository class, which becomes a deprecated way to perform similar tasks.
  • a new ajaxfunction registry now hosts all remote functions (i.e. functions callable through the asyncRemoteExec JS api). A convenience ajaxfunc decorator will let you expose your python functions easily without all the appobject standard boilerplate. Backwards compatibility is preserved.
  • the 'json' controller is now deprecated in favor of the 'ajax' one.
  • WebRequest.build_url can now take a __secure__ argument. When True, cubicweb tries to generate an https url.

User interface changes

A new 'undohistory' view exposes the undoable transactions and gives access to undo some of them.


Thoughts on CubicWeb 4.0

2012/05/14 by Sylvain Thenault

This is a fairly technical post talking about the structural changes I would like to see in CubicWeb's near future. Let's call that CubicWeb 4.0! It also drafts ideas on how to go from here to there. Draft, really. But that will eventually turn into a nice roadmap hopefully.

The great simplification

Some parts of cubicweb are sometimes too hairy for different reasons (some good, most bad). This participates in the difficulty to get started quickly. The goal of CubicWeb 4.0 should be to make things simpler :

  • Fix some bad old design.
  • Stop reinventing the wheel and use widely used libraries in the Python Web World. This extends to benefitting from state of the art libraries to build nice and flexible UI such as Bootstrap, on top of the JQuery foundations (which could become as prominent as the Python standard library in CubicWeb, the development team should get ready for it).
  • If there is a best way to do something, just do it and refrain from providing configurability and options.

On the road to Bootstrap

First, a few simple things could be done to simplify the UI code:

  • drop xhtml support: always return text/html content type, stop bothering with this stillborn stuff and use html5
  • move away everything that should not be in the framework: calendar?, embedding, igeocodable, isioc, massmailing, owl?, rdf?, timeline, timetable?, treeview?, vcard, wdoc?, xbel, xmlrss?

Then we should probably move the default UI into some cubes (i.e. the content of cw.web.views and cw.web.data). Besides making the move to Bootstrap easier, this should also have the benefit of making clearer that this is the default way to build an (automatic) UI in CubicWeb, but one may use other, more usual, strategies (such as using a template language).

At a first glance, we should start with the following core cubes:

  • corelayout, the default interface layout and generic components. Modules to backport there: application (not an appobject yet), basetemplates, error, boxes, basecomponents, facets, ibreadcrumbs, navigation, undohistory.
  • coreviews, the default generic views and forms. Modules to backport there: actions, ajaxedit, baseviews, autoform, dotgraphview, editcontroller, editforms, editviews, forms, formrenderers, primary, json, pyviews, tableview, reledit, tabs.
  • corebackoffice, the concrete views for the default back-office that let you handle users, sources, debugging, etc. through the web. Modules to backport here: cwuser, debug, bookmark, cwproperties, cwsources, emailaddress, management, schema, startup, workflow.
  • coreservices, the various services, not directly related to display of something. Modules to backport here: ajaxcontroller, apacherewrite, authentication, basecontrollers, csvexport, idownloadable, magicsearch, sessions, sparql, sessions, staticcontrollers, urlpublishing, urlrewrite.

This is a first draft that will need some adjustements. Some of the listed modules should be split (e.g. actions, boxes,) and their content moved to different core cubes. Also some modules in cubicweb.web packages may be moved to the relevant cube.

Each cube should provide an interface so that one could replace it with another one. For instance, move from the default coreviews and corelayout cube to bootstrap based ones. This should allow a nice migration path from the current UI to a Bootstrap based UI. Bootstrap should probably be introduced bottom-up: start using it for tables, lists, etc. then go up until the layout defined in the main template. The Orbui experience should greatly help us by pointing at hot spots that will have to be tackled, as well as by providing a nice code base from which we should start.

Regarding current implementation, we should take care that Contextual components are a powerful way to build "pluggable" UI, but we should probably add an intermediate layer that would make more obvious / explicit:

  • what the available components are
  • what the available slots are
  • which component should go in which slot when possible

Also at some point, we should take care to separate view's logic from HTML generation: our experience with client works shows that a common need is to use the logic but produce a different HTML. Though we should wait for more use of Bootstrap and related HTML simplification to see if the CSS power doesn't somewhat fulfill that need.

On the road to proper tasks management

The current looping task / repo thread mecanism is used for various sort of things and has several problems:

  • tasks don't behave similarly in a multi-instances configuration (some should be executed in a single instance, some in a subset); the tasks system has been originally written in a single instance context; as of today this is (sometimes) handled using configuration options (that will have to be properly set in each instance configuration file);
  • tasks is a repository only api but we also need web-side tasks;
  • there is probably some abuse of the system that may lead to unnecessary resources usage.

Analyzing a sample http://www.logilab.org/ instance, below are the running looping task by categories. Tasks that have to run on each web instance:

  • clean_sessions, automatically closes unused repository sessions. Notice cw.etwist.server also records a twisted task to clean web sessions. Some changes are imminent on this, they will be addressed in the upcoming refactoring session (that will become more and more necessary to move on several points listed here).
  • regular_preview_dir_cleanup (preview cube), cleanup files in the preview filesystem directory. Could be executed by a (some of the) web instance(s) provided that the preview directory is shared.

Tasks that should run on a single instance:

  • update_feeds, update copy based sources (e.g. datafeed, ldapfeed). Controlled by 'synchronize' source configuration (persistent source attribute that may be overridden by instance using CWSourceHostConfig entities)
  • expire_dataimports, delete CWDataImport entities older than an amount of time specified in the 'logs-lifetime' configuration option. Not controlled yet.
  • cleanup_auth_cookies (rememberme cube), delete CWAuthCookie entities whose life-time is exhausted. Not controlled yet.
  • cleaning_revocation_key (forgotpwd cube), delete Fpasswd entities with past revocation_date. Not controlled yet.
  • cleanup_plans (narval cube), delete Plan entities instance older than an amount of time specified in the configuration. If 'plan-cleanup-delay' is set to an empty value, the task isn't started.
  • refresh_local_repo_caches (vcsfile cube), pull or clone vcs repositories cache if the Repository entity ask to import_revision_content (hence web instance should have up to date cache to display files content) or if 'repository-import' configuration option is set to 'yes'; import vcs repository content as entities if 'repository-import' configuration option and it is coming from the system source.

Some deeper thinking is needed here so we can improve things. That includes thinking about:

  • the inter-instances messages bus based on zmq and introduced in 3.15,
  • the Celery project (http://celeryproject.org/), an asynchronous task queue, widely used and written in Python,

Remember the more cw independent the tasks are, the better it is. Though we still want an 'all-integrated' approach, e.g. not relying on external configuration of Unix specific tools such as CRON. Also we should see if a hard-dependency on Celery or a similar tool could be avoided, and if not if it should be considered as a problem (for devops).

On the road to an easier configuration

First, we should drop the different behaviour according to presence of a '.hg' in cubicweb's directory. It currently changes the location where cubicweb external resources (js, css, images, gettext catalogs) are searched for. Speaking of implementation:

  • shared_dir returns the cubicweb.web package path instead of the path to the shared cube,
  • i18n_lib_dir returns the cubicweb/i18n directory path instead of the path to the shared/i18n cube,
  • migration_scripts_dir returns the cubicweb/misc/migration directory path instead of share/cubicweb/migration.

Moving web related objects as proposed in the Bootstrap section would resolve the problem for the content web/data and most of i18n (though some messages will remain and additional efforts will be needed here). By going further this way, we may also clean up some schema code by moving cubicweb/schemas and cubicweb/misc/migration to a cube (though only a small benefit is to be expected here).

We should also have fewer environment variables... Let's see what we have today:

  • CW_INSTANCES_DIR, where to look for instances configuration
  • CW_INSTANCES_DATA_DIR, where to look for instances persistent data files
  • CW_RUNTIME_DIR, where to look for instances run-time data files
  • CW_MODE, set to 'system' or 'user' will predefine above environment variables differently
  • CW_CUBES_PATH, additional directories where to look for cubes
  • CW_CUBES_DIR, location of the system 'cubes' directory
  • CW_INSTALL_PREFIX, installation prefix, from which we can compute path to 'etc', 'var', 'share', etc.

I would propose the following changes:

  • CW_INSTANCES_DIR is turned into CW_INSTANCES_PATH, and defaults to ~/etc/cubicweb.d if it exists and /etc/cubicweb.d (on Unix platforms) otherwise;
  • CW_INSTANCES_DATA_DIR and CW_RUNTIME_DIR are replaced by configuration file options, with smart values generated at instance creation time;
  • the above change should make CW_MODE useless;
  • CW_CUBES_DIR is to be dropped, CW_CUBES_PATH should be enough;
  • regarding CW_INSTALL_PREFIX, I'm lacking experience with non-hg-or-debian installations and don't know if this can be avoided or not.

Last but not least (for the moment), the 'web' / 'repo' / 'all-in-one' configurations, and the fact that the associated configuration file changes stinks. Ideas to stop doing this:

  • one configuration file per instance, with all options provided by installed parts of the framework used by the application.
  • activate 'services' (or not): web server, repository, zmq server, pyro server. Default services to be started are stored in the configuration file.

There is probably more that can be done here (less configuration options?), but that would already be a great step forward.

On the road to...

The following projects should be investigated to see if we could benefit from them:

Discussion

Remember the following goals: migration of legacy code should go smoothly. In a perfect world every application should be able to run with CubicWeb 4.0 until the backwards compatibility code is removed (and CubicWeb 4.0 will probably be released as 4.0 at that time).

Please provide feedbacks:

  • do you think choices proposed above are good/bad choices? Why?
  • do you know some additional libraries that should be investigated?
  • do you have other changes in mind that could/should be done in cw 4.0?

Follow up of IRI conference about Museums and the Web #museoweb

2012/04/12 by Arthur Lutz

I attented the conference organised by IRI in a series of conferences about "Muséologie, muséographie et nouvelles formes d’adresse au public" (hashtag #museoweb). This particular occurence was about "Le Web devient audiovisuel" (the web is also audio and video content). Here are a few notes and links we gathered. The event was organised by Alexandre Monnin @aamonnz.

http://polemictweet.com/2011-2012-museo-audiovisuel/images/slide4_museo_fr.png

Yves Raimond from the BBC

Yves Raimond @moustaki made a presentation about his work at the BBC around semantic web technologies and speech recognition over large quantities of digitized archives. Parts of the BCC web sites use semantic web data as the database and do mashups with external sources of data (musicbrainz, dbpedia, wikipedia). For example Tom Waits has an html web page : http://www.bbc.co.uk/music/artists/c3aeb863-7b26-4388-94e8-5a240f2be21b add .rdf at the end of the URL http://www.bbc.co.uk/music/artists/c3aeb863-7b26-4388-94e8-5a240f2be21b.rdf

He also made an introduction about the ABC-IP The Automatic Broadcast Content Interlinking Project and the Kiwi-API project that uses CMU Sphinx on Amazon Web Services to process large quantities of archives. A screenshot of Kiwi-API is shown on the BBC R&D blog. The code should be open sourced soon and should appear on the BBC R&D github page.

Following his presentation, the question was asked if using Wikipedia content on an institutional web site would be possible in France, I pointed to the use of Wikipedia on http://data.bnf.fr , for example at the bottom of the Victor Hugo page.

Raphaël Troncy about Media Fragments

Raphaël Troncy @rtroncy made a presentation about "Media Fragments" which will enable sharing parts of a video on the web. Two major features : the sharing of specific extracts and the optimization of bandwith use when streaming the extract (usefull for mobile devices for example). It is a W3C working draft : http://www.w3.org/TR/media-frags-reqs/. Here are a few links of demos and players :

Part of the presentation was about the ACAV project done jointly with Dailymotion : http://www.capdigital.com/projet-acav/

The slides of his presentation are available here : http://www.slideshare.net/troncy/addressing-and-annotating-multimedia-fragments

IRI presentation

Vincent Puig @vincentpuig and Raphaël Velt @raphv made a presentation of various projects led by IRI :

http://www.iri.centrepompidou.fr/wp-content/themes/IRI-Theme/images/logo-iri-petit_fr_fr.png

Final words

The technologies seen during this conference are often related to semantic web technologies or at least web standards. Some of the visualizations are quite impressive and could mean new uses of the Web and an inspiration for CubicWeb projects.

A few of the people present at the conference will be attending or presenting talks at SemWeb.Pro which will take place in Paris on the 2nd and 3rd of may 2012.


Undoing changes in CubicWeb

2012/02/29 by Anthony Truchet

Many desktop applications offer the possibility for the user to undo the recent changes : a similar undo feature has now been integrated into the CubicWeb framework.

Because a semantic web application and a common desktop application are not the same thing at all, especially as far as undoing is concerned, we will first introduce what is the undo feature for now.

What's undoing in a CubicWeb application

A CubicWeb application acts upon an Entity-Relationship model, described by a schema. This ensures some data integrity properties. It also implies that changes are made by group called transaction : so as to insure the data integrity the transaction is completely applied or none of it is applied. What may appear as a simple atomic action to a user can actually consist in several actions for the framework. The end-user has no need to know the details of all actions in those transactions. Only the so-called public actions will appear in the description of the an undoable transaction.

Lets take a simple example: posting a "comment" for a blog entry will create the entity itself and the link to the blog entry.

The undo feature for CubicWeb end-users

For now there are two ways to access the undo feature when it has been activated in the instance configuration file with the option undo-support=yes. Immediately after having done something the undo** link appears in the "creation" message.

Screenshot of the undo link in the message

Otherwise, one can access at any time the undo-history view accessible from the start-up page.

Screenshot of the undo link in the message

This view shows the transactions, and each provides its own undo link. Only the transactions the user has permissions to see and undo will be shown.

Screenshot of the **undo** link in the message

If the user attempts to undo a transaction which can't be undone or whose undoing fails, then a message will explain the situation and no partial undoing will be left behind.

What's next

The undo feature is functional but the interface and configuration options are quite limited. One major, planned, improvement would be enable the user to filter which transactions or actions he sees in the undo-history view. Another critical improvement would be to selectively enable the undo feature on part of the entity-relationship schema to avoid storing too much data and reduce the underlying overhead.

Feedback on this undo feature for specific CubicWeb applications is welcome. More detailed information regarding the undo feature will be published in the CubicWeb book when the patches make it through the review process.


CubicWeb Sprint report for the "ZMQ" team

2012/02/27 by Julien Cristau

There has been a growing interest in ZMQ in the past months, due to its ability to efficiently deal with message passing, while being light and robust. We have worked on introducing ZMQ in the CubicWeb framework for various uses :

  • As a replacement/alternative to the Pyro source, that is used to connect to distant instances. ZMQ may be used as a lighter and more efficient alternative to Pyro. The main idea here is to use the send_pyobj/recv_pyobj API of PyZMQ (python wrapper of ZMQ) to execute methods on the distant Repository in a totally transparent way for CubicWeb.
http://www.cubicweb.org/file/2219158?vid=download
  • As a JSONServer. Indeed, ZMQ could be used to share data between a server and any requests done through ZMQ. The request is just a string of RQL, and the response is the result set formatted in Json.
  • As the building block for a simple notification (publish/subscribe) system between CubicWeb instances. A component can register its interest in a particular topic, and receive a callback whenever a corresponding message is received. At this point, this mechanism is used in CubicWeb to notify other instances that they should invalidate their caches when an entity is deleted.

CubicWeb Sprint report for the "WSGI" team

2012/02/20 by Pierre-Yves David

Cubicweb has had WSGI support for several years, but this support was incomplete.

The WSGI team was in charge of turning WSGI support into a full featured backend that could replace Twisted in real production scenarii.

Because we only had first class support for Twisted, some of the CubicWeb logic related to HTTP handling was implemented on the twisted side with twisted concepts. Our first task was to move this logic in CubicWeb itself. The handling of HTTP status in our response was improved in the process.

Our second task was to focus on the "non-HTTP" part of CubicWeb (because the repository also manages background tasks). The developement mode for WSGI is now able to handle and run such tasks. For this purpose we have begun a process that aims to remove server related code from the repository object.

We also Tested several WSGI middleware. One of the most promising is Firepython, integrating python logging and debugging feature with Firebug. werkzeug debugger seems neat too.

http://www.cubicweb.org/file/2194267?vid=download

All these improvements open the road to a simple and efficient multi-process architecture in CubicWeb.


CubicWeb Sprint report for the "Benchmarks" team

2012/02/17 by Arthur Lutz

One team during the CubicWeb sprint looked at issues around monitoring benchmark values for CubicWeb development. This is a huge task, so we tried to stay focused on a few aspects:

  • production reponse times (using tools such as smokeping and munin)
  • response times of test executions in continuous integration tests
  • response times of test instances runinng in continuous integration

We looked at using cpu.clock() instead of cpu.time() in the xunit files that report test results so as to be a bit more independent of the load of the machine (but subprocesses won't be counted for).

Graphing test times in hudson/jenkins already exists (/job/PROJECT/BUILDID/testReport/history/?) and can also be graphed by TestClass and by individual test. What is missing so far is a specific dashboard were one could select the significant graphs to look at.

By the end of the first day we had a "lorem ipsum" test instance that is created on the fly on each hudson/jenkins build and a jmeter bench running on it, it's results processed by the performance plugin.

http://www.cubicweb.org/file/2184036?vid=download

By the end of the second day we had some visualisation of existing data collected by apycot using jqplot javascript visulation (cubicweb-jqplot):

http://www.cubicweb.org/file/2184035?vid=download

By the end of the sprint, we got patches submitted for the following cubes :

  • apycot
  • cubicweb-jqplot
  • the original jqplot library (update : patch accepted a few days later)

On the last hour of the sprint, since we had a "lorem ipsum" test application running each time the tests went through the continuous integration, we hacked up a proof of concept to get automatic screenshots of this temporary test application. So far, we get screenshots for firefox only, but it opens up possibilities for other browsers. Inspiration could be drawn from https://browsershots.org/


"Data Fast-food": quick interactive exploratory processing and visualization of complex datasets with CubicWeb

2012/01/19 by Vincent Michel

With the emergence of the semantic web in the past few years, and the increasing number of high quality open data sets (cf the lod diagram), there is a growing interest in frameworks that allow to store/query/process/mine/visualize large data sets.

We have seen in previous blog posts how CubicWeb may be used as an efficient knowledge management system for various types of data, and how it may be used to perform complex queries. In this post, we will see, using Geonames data, how CubicWeb may perform simple or complex data mining and machine learning procedures on data, using the datamining cube. This cube adds powerful tools to CubicWeb that make it easy to interactively process and visualize datasets.

At this point, it is not meant to be used on massive datasets, for it is not fully optimized yet. If you try to perform a TF-IDF (term frequency–inverse document frequency) with a hierarchical clustering on the full dbpedia abstracts dataset, be prepared to wait. But it is a promising way to enrich the user experience while playing with different datasets, for quick interactive exploratory datamining processing (what I've called the "Data fast-food"). This cube is based on the scikit-learn toolbox that has recently gained a huge popularity in the machine learning and Python community. The release of this cube drastically increases the interest of CubicWeb for data management.

The Datamining cube

For a given query, similarly to SQL, CubicWeb returns a result set. This result set may be presented by a view to display a table, a map, a graph, etc (see documentation and previous blog posts).

The datamining cube introduces the possibility to process the result set before presenting it, for example to apply machine learning algorithms to cluster the data.

The datamining cube is based on two concepts:

  • the concept of processor: basically, a processor transforms a result set in a numpy array, given some criteria defining the mathematical processing, and the columns/rows of the result set to be taken into account. The numpy-array is a polyvalent structure that is widely used for numerical computation. This array could thus be efficiently used with any kind of datamining algorithms. Note that, in our context of knowledge management, it is more convenient to return a numpy array with additional meta-information, such as indices or labels, the result being stored in what we call a cw-array. Meta-information may be useful for display, but is not compulsory.
  • the concept of array-view: the "views" are basic components of CubicWeb, distinguish querying and displaying the data is key in this framework. So, on a given result set, many different views can be applied. In the datamining cube, we simply overload the basic view of CubicWeb, so that it works with cw-array instead of result sets. These array-views are associated to some machine learning or datamining processes. For example, one can apply the k-means (clustering process) view on a given cw-array.

A very important feature is that the processor and the array-view are called directly through the URL using the two related parameters arid (for ARray ID) and vid (for View ID, standard in CubicWeb).

http://www.cubicweb.org/file/2154793?vid=download

Processors

We give some examples of basic processors that may be found in the datamining cube:

  • AttributesAsFloatArrayProcessor (arid='attr-asfloat'): This processor turns all Int, BigInt and Float attributes in the result set to floats, and returns the corresponding array. The number of rows is equal to the number of rows in the result set, and the number of columns is equal to the number of convertible attributes in the result set.
  • EntityAsFloatArrayProcessor (arid='entity-asfloat'): This processor performs similarly to the AttributesAsFloatArrayProcessor, but keeps the reference to the entities used to create the numpy-array. Thus, this information could be used for display (map, label, ...).
  • AttributesAsTokenArrayProcessor (arid='attr-astoken'): This processor turns all String attributes in the result set in a numpy array, based on a Word-n-gram analyze. This may be used to tokenize a set of strings.
  • PivotTableCountArrayProcessor (arid='pivot-table-count'): This processor is used to create a pivot table, with a count function. Other functions, such as sum or product also exist. This may be used to create some spreadsheet-like views.
  • UndirectedRelationArrayProcessor (arid='undirected-rel'): This processor creates a binary numpy array of dimension (nb_entities, nb_entities), that represents the relations (or corelations) between entities. This may be used for graph-based vizualisation.

We are also planning to extend the concept of processor to sparse matrix (scipy.sparse), in order to deal with very high dimensional data.

Array Views

The array views that are found in the datamining cube, are, for most of them, used for simple visualization. We used HTML-based templates and the Protovis Javascript Library.

We will not detail all the views, but rather show some examples. Read the reference documentation for a complete and detailed description.

Examples on numerical data

Histogram

The request:

Any LO, LA WHERE X latitude LA, NOT X latitude NULL, X longitude LO,  NOT X longitude NULL,
X country C, NOT X elevation NULL, C name "France"

that may be translated as:

All couples (latitude, longitude) of the locations in France, with an elevation not null

and, using vid=protovis-hist and arid=attr-asfloat

http://www.cubicweb.org/file/2154795?vid=download

Scatter plot

Using the notion of view, we can display differently the same result set, for example using a scatter plot (vid=protovis-scatterplot).

http://www.cubicweb.org/file/2156233?vid=download

Another example with the request:

Any P, E WHERE X is Location, X elevation E, X elevation >1, X population P,
X population >10, X country CO, CO name "France"

that may be translated as:

All couples (population, elevation) of locations in France,
with a population higher than 10 (inhabitants),and an elevation higher than 1 (meter)

and, using the same vid (vid=protovis-scatterplot) and the same arid (arid=attr-asfloat)

http://www.cubicweb.org/file/2154802?vid=download

If a third column is given in the result set (and thus in the numpy array), it will be encoded in the size/color of each dot of the scatter plot. For example with the request:

Any LO, LA, E WHERE X latitude LA, NOT X latitude NULL, X longitude LO,  NOT X longitude NULL,
X country C, NOT X elevation NULL, X elevation E, C name "France"

that may be translated as:

All tuples (latitude, longitude, elevation) of the locations in France, with an elevation not null

and, using the same vid (vid=protovis-scatterplot) and the same arid (arid=attr-asfloat), we can visualize the elevation on a map, encoded in size/color

http://www.cubicweb.org/file/2154805?vid=download

Another example with the request:

Any LO, LA LIMIT 50000 WHERE X is Location, X population  >1000, X latitude LA, X longitude LO,
X country CO, CO name "France"

that may be translated as:

All couples (latitude, longitude) of 50000 locations in France, with a population higher than 100 (inhabitants)
http://www.cubicweb.org/file/2156095?vid=download

There also exist some AreaChart view, LineArray view, ...

Examples on relational data

Relational Matrix (undirected graph)

The request:

Any X,Y WHERE X continent CO, CO name "North America", X neighbour_of Y

that may be translated as:

All neighbour countries in North America

and using the vid='protovis-binarymap' and arid='undirected-rel'

http://www.cubicweb.org/file/2154796?vid=download

Relational Matrix (directed graph)

If we do not want a symmetric matrix, i.e. if we want to keep the direction of a link (X,Y is not the same relation as Y,X), we can use the directed*rel array processor. For example, with the following request:

Any X,Y LIMIT 20 WHERE X continent Y

that may be translated as:

20 countries and their continent

and using the vid='protovis-binarymap' and arid='directed-rel'

http://www.cubicweb.org/file/2154797?vid=download

Force directed graph

For a dynamic representation of relations, we can use a force directed graph. The request:

Any X,Y WHERE X neighbour_of Y

that may be translated as:

All neighbour countries in the World.

and using the vid='protovis-forcedirected' and arid='undirected-rel', we can see the full graph, with small independent components (e.g. UK and Ireland)

http://www.cubicweb.org/file/2154800?vid=download

Again, a third column in the result set could be used to encode some labeling information, for example the continent.

The request:

Any X,Y,CO WHERE X neighbour_of Y, X continent CO

that may be translated as:

All neighbour countries in the World, and their corresponding continent.

and again, using the vid='protovis-forcedirected' and arid='undirected-rel', we can see the full graph with the continents encoded in color (Americas in green, Africa in dark blue, ...)

http://www.cubicweb.org/file/2154801?vid=download

Dendrogram

For hierarchical information, one can use the Dendrogram view. For example, with the request:

Any X,Y WHERE X continent Y

that may be translated as:

All couple (country, continent) in the World

and using vid='protovis-dendrogram' and arid='directed-rel', we have the following dendrogram (we only show a part due to lack of space)

http://www.cubicweb.org/file/2154806?vid=download

Unsupervised Learning

We have also developed some machine learning view for unsupervised learning. This is more a proof of concept than a fully optimized development, but we can already do some cool stuff. Each machine learning processing is referenced by a mlid. For example, with the request:

Any LO, LA WHERE X is Location, X elevation E, X elevation >1, X latitude LA, X longitude LO,
X country CO, CO name "France"

that may be translated as:

All couples (latitude, longitude) of the locations in France, with an elevation higher than 1

and using vid='protovis-scatterplot' arid='attr-asfloat' and mlid='kmeans', we can construct a scatter plot of all couples of latitude and longitude in France, and create 10 clusters using the kmeans clustering. The labeling information is thus encoded in color/size:

http://www.cubicweb.org/file/2154804?vid=download

Download

Finally, we have also implement a download view, based on the Pickle of the numpy-array. It is thus possible to access remotely any data within a Python shell, allowing to process them as you want. Changing the request can be done very easily by changing the rql parameter in the URL. For example:

import pickle, urllib
data = pickle.loads(urllib.open('http://mydomain?rql=my request&vid=array-numpy&arid=attr-asfloat'))