blog entries created by Arthur Lutz
show 27 results
  • Monitor all the things! ... and early too!

    2016/09/16 by Arthur Lutz

    Following the "release often, release early" mantra, I thought it might be a good idea to apply it to monitoring on one of our client projects. So right from the demo stage where we deliver a new version every few weeks (and sometimes every few days), we setup some monitoring.

    Monitoring performance

    The project is an application built with the CubicWeb platform, with some ElasticSearch for indexing and searching. As with any complex stack, there are a great number of places where one could monitor performance metrics.

    Here are a few things we have decided to monitor, and with what tools.

    Monitoring CubicWeb

    To monitor our running Python code, we have decided to use statsd, since it is already built into CubicWeb's core. Out of the box, you can configure a statsd server address in your all-in-one.conf configuration. That will send out some timing statistics about some core functions.

    The statsd server (there a numerous implementations, we use a simple one : python-pystatsd) gets the raw metrics and outputs them to carbon which stores the time series data in whisper files (which can be swapped out for a different technology if need be).

    If we are curious about a particular function or view that might be taking too long to generate or slow down the user experience, we can just add the @statsd_timeit decorator there. Done. It's monitored.

    statsd monitoring is a fire-and-forget UDP type of monitoring, it should not have any impact on the performance of what you are monitoring.

    Monitoring Apache

    Simply enough we re-use the statsd approach by plugging in an apache module to time the HTTP responses sent back by apache. With nginx and varnish, this is also really easy.

    One of the nice things about this part is that we can then get graphs of errors since we will differentiate OK 200 type codes from 500 type codes (HTTP codes).

    Monitoring ElasticSearch

    ElasticSearch comes with some metrics in GET /_stats endpoint, the same goes for individual nodes, individual indices and even at cluster level. Some popular tools can be installed through the ElasticSearch plugin system or with Kibana (plugin system there too).

    We decided on a different approach that fitted well with our other tools (and demonstrates their flexibility!) : pull stats out of ElasticSearch with SaltStack, push them to Carbon, pull them out with Graphite and display them in Grafana (next to our other metrics).

    On the SaltStack side, we wrote a two line execution module (

    import requests
    def stats:
        return request.get('http://localhost:9200/_stats').json()

    This gets shipped using the custom execution modules mechanism (_modules and saltutils.sync_modules), and is executed every minute (or less) in the salt scheduler. The resulting dictionary is fed to the carbon returner that is configured to talk to a carbon server somewhere nearby.

    # salt demohost elasticsearch.stats
      { "indextime_inmillis" : 30,

    Monitoring web metrics

    To evaluate parts of the performance of a web page we can look at some metrics such as the number of assets the browser will need to download, the size of the assets (js, css, images, etc) and even things such as the number of subdomains used to deliver assets. You can take a look at such metrics in most developer tools available in the browser, but we want to graph this over time. A nice tool for this is (written in javascript with phantomjs). Out of the box, it has a graphite outputter so we just have to add --graphiteHost FQDN. even recommends using grafana to visualize the results and publishes some example dashboards that can be adapted to your needs.

    The command is configured and run by salt using pillars and its scheduler.

    We will have to take a look at using their jenkins plugin with our jenkins continuous integration instance.

    Monitoring crashes / errors / bugs

    Applications will have bugs (in particular when released often to get a client to validate some design choices early). Level 0 is having your client calling you up saying the application has crashed. The next level is watching some log somewhere to see those errors pop up. The next level is centralised logs on which you can monitor the numerous pieces of your application (rsyslog over UDP helps here, graylog might be a good solution for visualisation).

    When it starts getting useful and usable is when your bugs get reported with some rich context. That's when using sentry gets in. It's free software developed on github (although the website does not really show that) and it is written in python, so it was a good match for our culture. And it is pretty awesome too.

    We plug sentry into our WSGI pipeline (thanks to cubicweb-pyramid) by installing and configuring the sentry cube : cubicweb-sentry. This will catch rich context bugs and provide us with vital information about what the user was doing when the crash occured.

    This also helps sharing bug information within a team.

    The sentry cube reports on errors being raised when using the web application, but can also catch some errors when running some maintenance or import commands (ccplugins in CubicWeb). In this particular case, a lot of importing is being done and Sentry can detect and help us triage the import errors with context on which files are failing.

    Monitoring usage / client side

    This part is a bit neglected for the moment. Client side we can use Javascript to monitor usage. Some basic metrics can come from piwik which is usually used for audience statistics. To get more precise statistics we've been told Boomerang has an interesting approach, enabling a closer look at how fast a page was displayed client side, how much time was spend on DNS, etc.

    On the client side, we're also looking at two features of the Sentry project : the raven-js client which reports Javascript errors directly from the browser to the Sentry server, and the user feedback form which captures some context when something goes wrong or a user/client wants to report that something should be changed on a given page.

    Load testing - coverage

    To wrap up, we also often generate traffic to catch some bugs and performance metrics automatically :

    • wget --mirror $URL
    • linkchecker $URL
    • for $search_term in cat corpus; do wget URL/$search_term ; done
    • wapiti $URL --scope page
    • nikto $URL

    Then watch the graphs and the errors in Sentry... Fix them. Restart.

    Graphing it in Grafana

    We've spend little time on the dashboard yet since we're concentrating on collecting the metrics for now. But here is a glimpse of the "work in progress" dashboard which combines various data sources and various metrics on the same screen and the same time scale.

    Further plans

    • internal health checks, we're taking a look at python-hospital and healthz: Stop reverse engineering applications and start monitoring from the inside (Monitorama) (the idea is to distinguish between the app is running and the app is serving it's purpose), and pyramid_health
    • graph the number of Sentry errors and the number of types of errors: the sentry API should be able to give us this information. Feed it to Salt and Carbon.
    • setup some alerting : next versions of Grafana will be doing that, or with elastalert
    • setup "release version X" events in Graphite that are displayed in Grafana, maybe with some manual command or a postcreate command when using docker-compose up ?
    • make it easier for devs to have this kind of setup. Using this suite of tools in developement might sometimes be overkill, but can be useful.

  • Brainomics - A management system for exploring and merging heterogeneous brain mapping data

    2013/09/12 by Arthur Lutz

    At OBHM 2013, the 19th Annual Meeting of the Organization for Human Brain Mapping, Logilab presented a poster which explains the work done using CubicWeb on brain imaging and genetics data in collaboration with INRIA, INSERM and the CEA during the Brainomics project co-financed by Agence nationale de la Rercherche.

    You can download this poster and try the demo online.

  • We're going to PGDay France, the Postgresql Community conference

    2013/06/11 by Arthur Lutz

    A few people of the CubicWeb team are going to attend the French PostgreSQL community conference in Nantes (France) on the 13th of june.

    We're excited to learn more about the following topics that are relevant to CubicWeb's development and features :

    Obviously we'll pay attention to all the talks during the day. If you're attending, we hope to see you there.

  • OpenData meets the Semantic Web at WOD2013

    2013/06/10 by Arthur Lutz

    With a few people from Logilab we went to the 2nd International Workshop on Open Data (WOD), on the 3rd of june.

    Although the main focus was an academic take on OpenData, a lot of talks were related to the Semantic Web technologies and especially LinkedData.

    The full program (and papers) is on the following website. Here is a quick review of the things we though worth sharing.

    • privacy oriented ontologies :
    • interesting automations done to suggest alignments when initial data is uploaded to an opendata website
    • some opendata platforms have built-in APIs to get files, one example is Socrata :
    • some work is being done to scale processing of linked data in the cloud (did you know you could access ready available datasets in the Amazon cloud ? DBPedia for example )
    • the data stored in wikipedia can be a good source of vocabulary on certain machine learning tasks (and in the future, wikidata project)
    • there is an RDF extension to Google Refine (or OpenRefine), but we haven't managed to get it working out of the box,
    • WebSmatch uses morphological operators (erosion / dilation) to identify grids and zones in Excel Spreadsheets and then aligns column data on known reference values (e.g. country lists).

    We naturally enjoyed the presentation made by Romain Wenz about with the unavoidable mention of Victor Hugo (and CubicWeb).

    Thanks to the organizers of the conference and to the National French Library for hosting the event.

  • Links roundup from

    2012/12/05 by Arthur Lutz

    A few people from Logilab attended the dotjs conference in Paris last week. The conference wasn't exactly what we expected, we were hoping for more technical talks. Nevertheless, some of the things we saw were quite interesting. Some of them could be relevant to CubicWeb.

    Here is a raw roundup of links collected last friday :

    Chrome developer toolsyeomangrunt.jsbackbone.jsDartTypeScriptLangExpress.jsMochaTestacularSASSAngular.jsEnyo.jsSocket.iowhen.jsCoffeescriptSource Maps explained

  • Follow up of IRI conference about Museums and the Web #museoweb

    2012/04/12 by Arthur Lutz

    I attented the conference organised by IRI in a series of conferences about "Muséologie, muséographie et nouvelles formes d’adresse au public" (hashtag #museoweb). This particular occurence was about "Le Web devient audiovisuel" (the web is also audio and video content). Here are a few notes and links we gathered. The event was organised by Alexandre Monnin @aamonnz.

    Yves Raimond from the BBC

    Yves Raimond @moustaki made a presentation about his work at the BBC around semantic web technologies and speech recognition over large quantities of digitized archives. Parts of the BCC web sites use semantic web data as the database and do mashups with external sources of data (musicbrainz, dbpedia, wikipedia). For example Tom Waits has an html web page : add .rdf at the end of the URL

    He also made an introduction about the ABC-IP The Automatic Broadcast Content Interlinking Project and the Kiwi-API project that uses CMU Sphinx on Amazon Web Services to process large quantities of archives. A screenshot of Kiwi-API is shown on the BBC R&D blog. The code should be open sourced soon and should appear on the BBC R&D github page.

    Following his presentation, the question was asked if using Wikipedia content on an institutional web site would be possible in France, I pointed to the use of Wikipedia on , for example at the bottom of the Victor Hugo page.

    Raphaël Troncy about Media Fragments

    Raphaël Troncy @rtroncy made a presentation about "Media Fragments" which will enable sharing parts of a video on the web. Two major features : the sharing of specific extracts and the optimization of bandwith use when streaming the extract (usefull for mobile devices for example). It is a W3C working draft : Here are a few links of demos and players :

    Part of the presentation was about the ACAV project done jointly with Dailymotion :

    The slides of his presentation are available here :

    IRI presentation

    Vincent Puig @vincentpuig and Raphaël Velt @raphv made a presentation of various projects led by IRI :

    Final words

    The technologies seen during this conference are often related to semantic web technologies or at least web standards. Some of the visualizations are quite impressive and could mean new uses of the Web and an inspiration for CubicWeb projects.

    A few of the people present at the conference will be attending or presenting talks at SemWeb.Pro which will take place in Paris on the 2nd and 3rd of may 2012.

  • CubicWeb Sprint report for the "Benchmarks" team

    2012/02/17 by Arthur Lutz

    One team during the CubicWeb sprint looked at issues around monitoring benchmark values for CubicWeb development. This is a huge task, so we tried to stay focused on a few aspects:

    • production reponse times (using tools such as smokeping and munin)
    • response times of test executions in continuous integration tests
    • response times of test instances runinng in continuous integration

    We looked at using cpu.clock() instead of cpu.time() in the xunit files that report test results so as to be a bit more independent of the load of the machine (but subprocesses won't be counted for).

    Graphing test times in hudson/jenkins already exists (/job/PROJECT/BUILDID/testReport/history/?) and can also be graphed by TestClass and by individual test. What is missing so far is a specific dashboard were one could select the significant graphs to look at.

    By the end of the first day we had a "lorem ipsum" test instance that is created on the fly on each hudson/jenkins build and a jmeter bench running on it, it's results processed by the performance plugin.

    By the end of the second day we had some visualisation of existing data collected by apycot using jqplot javascript visulation (cubicweb-jqplot):

    By the end of the sprint, we got patches submitted for the following cubes :

    • apycot
    • cubicweb-jqplot
    • the original jqplot library (update : patch accepted a few days later)

    On the last hour of the sprint, since we had a "lorem ipsum" test application running each time the tests went through the continuous integration, we hacked up a proof of concept to get automatic screenshots of this temporary test application. So far, we get screenshots for firefox only, but it opens up possibilities for other browsers. Inspiration could be drawn from

  • Roundup of "Powered by Cubicweb" websites

    2011/11/15 by Arthur Lutz

    Here is a (incomplete) list of public websites powered by Cubicweb. A lot of CubicWeb technology is used for private web applications in large companies that we can not list here.

    Demos are listed here :

    You can also find a list of the companies providing services for Cubicweb (with a few extra examples) :

  • HTML5 features presented at Paris Web 2010 by Paul Rouget

    2010/10/19 by Arthur Lutz

    While at Paris Web 2010 we were all impressed by the presentation and demos by Paul Rouget on HTML5 (tech evangelist must be a hard job!). Here is my take and a few URLs on the things that were presented.
    • Websockets with persistent connections between the server and the browser. That way you can avoid pulling information every 5 seconds, the server can tell the web page a new info is available. The immediate uses we have for this are :
      • realtime feed display
      • jabber web chat rooms
      • in cubicweb's forge : new comment indication on a ticket
      • in cubicweb in general : notification that the edited element has been openned by another user (instead of a lock mechanism)
      • real time collaborative editing (etherpad style functionality)
    • File upload demo :
    • File EXIF extraction, client side resize or geolocalisation . That could be very cool for things such as resizing an image before it is sent to the server (you know, for your mother who doesn't know how to resize that 2 Mbytes photo before sending it to the site). Reference :
    • Using File IO, you can do some heavy Drag'n'drop from your computer to your browser directly in the browser (yes, you can get rid of that nasty java applet). Apparently Google implemented in Chromium a non-standard drag'n'drop the other way around : from the web app to your desktop, which could be cool as well.
    • XHR - XMLHttpRequest. Usually this type of requests is not possible cross-domain. Now they will be (with an authorization mechanism). That way, you will be able to post and control websites from the page in your browser.
    • Audio Data API : you can now access & modify audio files directly in your browser (before uploading them server side). This makes me think of the first time I realized people where implementing traditionally "heavy" applications (photo editing, music editing, even movie edition) in web applications. I was (and still am) very surprised and skeptic, but this kind of evolution makes me believe that there can be a day when you don't even need to send massive files to the server to edit them.

    Admittedly, you probably need to see the thrilling presentation and demos to be tempted to go and dip into these technologies. Reading the documentation will probably not encourage you to go and code some cool new features.

    One of the things that the audience commented about at the end of the presentation is that there was still a huge lack of "authoring tools" for HTML5. For some coders that never leave vim or emacs, this is heresy, but we have to admit that the adoption of flash and silverlight (apparently) is very much driven by simple click'n'program tools.

    During the presentation, I used a Chrome 6 that I had lying around on my Ubuntu, but by the end of the presentation I had installed Firefox4 using the mozilla PPA

    sudo add-apt-repository ppa:ubuntu-mozilla-daily/ppa
    sudo apt-get update
    sudo apt-get -uVf install firefox-4.0

    The PPA version keeps config files separate so you can easily switch between your "standard" Firefox3 profile and the cutting edge Firefox4 (obviously the big downside is not having all your cool extensions).

    The only thing missing from the presentation was the code... a request I hope Paul will grant to the community (a bunch of tweets about that followed the presentation).

  • CubicWeb presentation at the JDLL (Lyon)

    2010/10/07 by Arthur Lutz

    For the "Journées Du Logiciel Libre (JDLL)" in Lyon which will take place the 14th, 15th et 16th of octobre 2010, we will be presenting the semantic side of CubicWeb on Friday 15th. There will be a talk and a tutorial. Details can be found here and there.

    If you're around, come and see us!

show 27 results