subscribe to this blog

CubicWeb Blog

News about the framework and its uses.

CubicWeb documentation mini-sprint report

2010/02/10 by Sylvain Thenault

We held a one day sprint last week in our Paris office, trying to improve CubicWeb's documentation.

There is a huge work to do on this, much more than we can do on a one day sprint, even with many people. But you have to begin with something :)

So, after a quick meeting to define priorities:

  • Stéphanie, Charles and later Sandrine (from her US home-office), began to add some documentation and screenshots to cubes. They started with the following cubes: addressbook, person, basket, tag, folder, forgotpwd, forge, tracker, vcsfile, keyword, blog and comment.
  • Julien explored sphinx abilities to build the index and extract docstrings. He applied this to improve the documentation of selectors.
  • Adrien (ach) and Celso, our friend from Mexico, tackled the task to improve the tutorial from a beginner's point of view.
  • Arthur added some pieces of documentation found in our intranet, mailing-list...
  • Pyves worked on a cubicweb-ctl command to generate schema images (png) for cubes, to include them in the cube's documentation.
  • Adrien (adim) and I helped the various teams.

Huum, I think I did not forgot anyone...

If there is still a lot to do (we need more doc sprints, stay tuned), this is really a nice start! This site should soon be updated to include more valuable cubes description and online documentation extracted from the contributed doc.


CubicWeb documentation sprint in feb. 2010

2010/01/22 by Nicolas Chauvat
http://farm4.static.flickr.com/3042/2871708248_950831962c_s.jpg

On February 2nd, 2010 Logilab will host in its head offices a one-day sprint dedicated to the improvement of the CubicWeb documentation.

Get in touch with Logilab if you want to participate in person or via the net: contact at logilab dot fr.

Photo by Adam Hyde from the FLOSS blog


MS SQL Server backuping gotcha

2010/01/19

While working on the port of CubicWeb to the Windows platform, including supporting MS Sql Server as the database backend, I got bitten by a weird behavior of that database engine. When working with cubicweb, most administrations command are wrappped by the cubicweb-ctl utility and database backups are performed by running cubicweb-ctl db-dump <instancename>. If the instance uses PostgreSQL as the backend, this will call the pg_dump utility.

When porting to Sql Server, I could not find such a utility, but I found that Transact SQL has a BACKUP DATABASE command, so I was able to call it using Python's pyodbc module. I tested it interactively, and was satisfied with the result:

>>> from logilab.common.db import get_connection
>>> cnx = get_connection(driver='sqlserver2005', database='mydb', host='localhost', extra_args='autocommit;trusted_connection')
>>> cursor = cnx.cursor()
>>> cursor.execute('BACKUP DATABASE ? TO DISK = ?', ('mydb', 'C:\\Data\\mydb.dump'))
>>> cnx.close()

However, testing that very same code through cubicweb-ctl produced no file in C:\\Data\\. To make a (quite) long story short, the thing is that the BACKUP DATABASE command is asynchronous (or maybe the odbc driver is) and the call to cursor.execute(...) will return immediately, before the backup actually starts. When running interactively, by the time I got to type cnx.close() the backup was finished but when running in a function, the connection was closed before the backup started (which effectively killed the backup operation).

I worked around this by monitoring the size of the backup file in a loop and waiting until that size gets stable before closing the connection:

import os
import time
from logilab.common.db import get_connection

filename = 'c:\\data\\toto.dump'
dbname = 'mydb'
cnx = get_connection(driver='sqlserver2005',
                     host='localhost',
                     database=dbname,
                     extra_args='autocommit;trusted_connection')
cursor = cnx.cursor()
cursor.execute("BACKUP DATABASE ? TO DISK= ? ", (dbname, filename,))
prev_size = -1
err_count = 0
same_size_count = 0
while err_count < 10 and same_size_count < 10:
    time.sleep(1)
    try:
        size = os.path.getsize(filename)
        print 'file size', size
    except OSError, exc:
        err_count +=1
        print exc
    if size > prev_size:
        same_size_count = 0
        prev_size = size
    else:
       same_size_count += 1
cnx.close()

I hope sharing this will save some people time...

Note: get_connection() comes from logilab.common.db which is a wrapper module which tries to simplify writing code for different database backends by handling once for all various idiosyncrasies. If you want pure pyodbc code, you can replace it with:

from pyodbc import connect
cnx = connect(driver='SQL Server Native Client 10.0',
              host='locahost',
              database=dbname,
              trusted_connection='yes',
              autocommit=True)

The autocommit=True part is especially important, because BACKUP DATABASE will fail if run from within a transaction.


Distributed scalable architecture using CubicWeb

2010/01/14 by Arthur Lutz

Here is a small example of one the things you can do with cubicweb's scalable architecture when serving a large number of users.

http://www.cubicweb.org/image/619085?vid=download

Obviously you can easily add machines hosting CubicWeb to the middle bit to scale up. Adding multiple postgres servers is possible but more tricky. In a later blog I will also show a way of split CubicWeb servers onto multiple servers (separate the web engine from the data repository part). Debian is one of the possible host systems, you can use something else, it's just easier with debian...

If you want a more detailed explanation of how we setup such an environment, please comment and we'll try to find the time to document it.

As a systems administrator, I can then enjoy the use of the following tools :

  • clusterssh - to access all machines at once and do common task by only typing it once (a must!)
  • htop - to monitor resources in a nicer way than the simple top
  • iotop - to monitor input/output load
  • varnishist - to check varnish is properly caching some content
  • apachetop - to watch in real time what is being accessed on the apache server
  • jnettop - to watch network flows
  • apt-get (on debian) to install all this in a a few simple commands...

CubicWeb 3.6 sprint report

2009/12/14 by Sylvain Thenault

Last week we held a cubicweb sprint in our new Paris office !

We were a nice number of people: 7 from the Logilab's crew, including Sandrine, our US representative, Celso and Carlos from Mexico, plus some others guests and colleagues working on (cubicweb based of course) customer projects.

The objective of the sprint was to kick out the 3.6 version of cubicweb, a big refactoring release started by Adrien and I a few months ago. Unfortunatly we had been preempted by some other projects and the cubicweb development branch was simply painfully following changes done in the stable branch.

Also, we decided to start using mq as a basis for code review. The sprint was a nice opportunity to test and see if it was actually usable for both developer and code reviewer. But more on this latter :)

The tasks to achieve to get this release out were:

  1. resurrect the default branch after 3 months of nasty bugs introduced by simply merging from the stable branch without any time to test
  2. update main cubes to the new test / uicfg / hooks / members api
  3. finish the editcontroller (which handle post of most web forms) refactoring
  4. finish the relation permissions change, including migration
  5. update the documentation
  6. test real applications

Of course this was ambitious :) Among those point 0. and 1. and 3. took us much more time than I expected. The editcontroller work (2.) has not been finished yet, and we didn't find any time for the documentation (4.).

Besides this, everyone (well, me at least ;) enjoyed its time while working hard all together in our new meeting room! The 3.6 version still needs a little work before being released, but the development branch is definitly back, with a great bunch of cubes ready. Among them : comment, tag, blog, keyword, tracker, forge, card, nosylist, etc...

So many thanks to everyone, and particularly to our Mexican friends Carlos and Celso... Tequila! ;)

By the way the good news is that we plan to do more sprints like this now that we've some room for it!


Customizing search box with magicsearch

2009/12/13 by Adrien Di Mascio

During last cubicweb sprint, I was asked if it was possible to customize the search box CubicWeb comes with. By default, you can use it to either type RQL queries, plain text queries or standard shortcuts such as <EntityType> or <EntityType> <attrname> <value>.

Ultimately, all queries are translated to rql since it's the only language understood on the server (data) side. To transform the user query into RQL, CubicWeb uses the so-called magicsearch component which in turn delegates to a number of query preprocessor that are responsible of interpreting the user query and generating corresponding RQL.

The code of the main processor loop is easy to understand:

for proc in self.processors:
    try:
        return proc.process_query(uquery, req)
    except (RQLSyntaxError, BadRQLQuery):
        pass

The idea is simple: for each query processor, try to translate the query. If it fails, try with the next processor, if it succeeds, we're done and the RQL query will be executed.

Now that the general mechanism is understood, here's an example of code that could be used in a forge-based cube to add a new search shortcut to find tickets. We'd like to use the project_name:text syntax to search for tickets of project_name containing text (e.g pylint:warning).

Here's the corresponding preprocessor code:

from cubicweb.web.views.magicsearch import BaseQueryProcessor

class MyCustomQueryProcessor(BaseQueryProcessor):
    priority = 0 # controls order in which processors are tried

    def preprocess_query(self, uquery, req):
        """
        :param uqery: the query as sent by the browser
        :param req: the standard, omnipresent, cubicweb's req object
        """
        try:
            project_name, text = uquery.split(':')
        except ValueError:
            return None # the shortcut doesn't apply
        return (u'Any T WHERE T is Ticket, T concerns P, P name %(p)s, '
                u'T has_text %(t)s', {'p': project_name, 't': text})

The code is rather self-explanatory, but here's a few additional comments:

  • the class is registered with the standard vregistry mechanism and should be defined along the views
  • the priority attribute is used to sort and define the order in which processors will be tried in the main processor loop
  • the preprocess_query returns None or raise an exception if the query can't be processed

To summarize, if you want to customize the search box, you have to:

  1. define a new query preprocessor component
  2. define its priority wrt other standard processors
  3. implement the preprocess_query method

and CubicWeb will do the rest !


Using gettext on windows

2009/12/01
http://www.gnu.org/graphics/gnu-head-sm.jpg

CubicWeb relies on gnu gettext for its translation management. However, the binary installers easily found for gettext (such as the one in python(x,y)) are for older versions, and compiling it is not that easy (especially in the Python world where people do not necessarily have a C compiler at hand).

We did the job and a binary installer for gnu gettext 0.17 is available on our ftp server.


Browsing the Semantic Web

2009/10/31 by Nicolas Chauvat
http://www.cubicweb.org/image/502157?vid=download

Now that the Web of Data has become a reality, innovative applications are springing up everywhere. Here is a selection of web apps that help you browse the semantic web.

  • Parallax is a faceted browser that is demonstrated by displaying the content of Freebase.
  • Neofonie demonstrates its faceted browser by displaying the content of DBpedia at dbpedia.neofonie.de
  • VisiNav is a search engine that allows to refine searches in a way that reminds of facets.
  • Falcons is a search engine that indexes RDF data.
  • Sindice is a search engine that indexes RDF data as well as data extracted from Microformats. It offers public Sindice API that can be used to retrieve the search results as RDF, json or Atom.
  • SameAs is a service that returns all the equivalent URIs for a search term or a given URI.
  • When you enter search terms, Sig.ma collates the data from the resources included in the results of a search on Sindice.
  • When you publish your product data according to the GoodRelations ontology, informations like the price show up in Yahoo's search results.

More and more services will appear in the coming months that make use of these new resources. Just for tagging, you may look at CommonTag, Zemanta and OpenCalais and imagine new ways to automate and facilitate the process of publishing information on the web.


Comparing CubicWeb with Drupal plus CCK extension

2009/10/29 by Nicolas Chauvat
http://www.cubicweb.org/image/502151?vid=download

Drupal is a CMS written in PHP that is getting more and more visibility in the Semantic Web crowd. Several researchers from DERI have been using it as a test bed for their research projects and developed extensions to showcase their ideas. It is for example used to build the Semantic Web Dog Food site that archives the semantic web conferences and publishes them as Linked Open Data. The URL for this year's ISWC is http://data.semanticweb.org/conference/iswc/2009

This led me to read more about Drupal than I had had the incentive before. I have not had time to give it a try, but I skimmed the documentation and will try to compare it with CubicWeb from a software architecture point of view.

Drupal defines a Node as an information item. The CCK (aka Content Construction Kit) can be used to define new types of Nodes thru a web interface. Nodes and the bits and pieces used to display them as HTML are not packed together in components. The Features extension is planning on getting this bits packaged.

If you are a Drupal user/developer and think I am not being fair to Drupal, please comment below.

On the other hand, CubicWeb has implemented very early the concept of reusable component. What is called a Node in Drupal is an Entity in CubicWeb. By design, CubicWeb does not have a web interface to define entities. The data model is part of the code. To efficiently maintain applications in production, changes to the data model must be tracked with changes to the code. Data model changes imply migration procedures. In CubicWeb, all of this is versionned and made part of the components. Where Drupal needs to grow extensions like CCK and Features, CubicWeb has more advanced possibilities by design, for example the ability to develop featurefull applications by assembling components.

This was a very short comparison. I'm looking forward to getting a chance of discussing it with knowledgeable Drupal hackers.


Relase early, release often

2009/10/05 by Arthur Lutz

Looking at the releases of the CubicWeb projects for the month of September alone, I think we can conclude that we are applying the Agile Software Development principle quite closely.

http://farm4.static.flickr.com/3025/2732378117_cdd948fd1d_m.jpg
  • 11 releases of the cubicweb framework (now in stable and unstable flavors) : 3.5.2, 3.5.1, 3.5.0, 3.4.11, , 3.4.9, 3.4.8, 3.4.7, 3.4.6, 3.4.5, 3.4.4, 3.4.3
  • 3 releases of cubicweb-vcsfile
  • 4 releases of cubicweb-forge
  • 2 releases of cubicweb-drh
  • 2 releases of cubicweb-workorder
  • 1 release of cubicweb-conference, cubicweb-tracker, cubicweb-registration, cubicweb-timesheet, cubicweb-workcase, cubicweb-task, cubicweb-expense, cubicweb-calendar, cubicweb-invoice, cubicweb-nosylist, etc.

Hope you can keep-up or use the stable versions...

photo by kennymatic under creative commons