subscribe to this blog

CubicWeb Blog

News about the framework and its uses.

Distributed scalable architecture using CubicWeb

2010/01/14 by Arthur Lutz

Here is a small example of one the things you can do with cubicweb's scalable architecture when serving a large number of users.

http://www.cubicweb.org/image/619085?vid=download

Obviously you can easily add machines hosting CubicWeb to the middle bit to scale up. Adding multiple postgres servers is possible but more tricky. In a later blog I will also show a way of split CubicWeb servers onto multiple servers (separate the web engine from the data repository part). Debian is one of the possible host systems, you can use something else, it's just easier with debian...

If you want a more detailed explanation of how we setup such an environment, please comment and we'll try to find the time to document it.

As a systems administrator, I can then enjoy the use of the following tools :

  • clusterssh - to access all machines at once and do common task by only typing it once (a must!)
  • htop - to monitor resources in a nicer way than the simple top
  • iotop - to monitor input/output load
  • varnishist - to check varnish is properly caching some content
  • apachetop - to watch in real time what is being accessed on the apache server
  • jnettop - to watch network flows
  • apt-get (on debian) to install all this in a a few simple commands...

CubicWeb 3.6 sprint report

2009/12/14 by Sylvain Thenault

Last week we held a cubicweb sprint in our new Paris office !

We were a nice number of people: 7 from the Logilab's crew, including Sandrine, our US representative, Celso and Carlos from Mexico, plus some others guests and colleagues working on (cubicweb based of course) customer projects.

The objective of the sprint was to kick out the 3.6 version of cubicweb, a big refactoring release started by Adrien and I a few months ago. Unfortunatly we had been preempted by some other projects and the cubicweb development branch was simply painfully following changes done in the stable branch.

Also, we decided to start using mq as a basis for code review. The sprint was a nice opportunity to test and see if it was actually usable for both developer and code reviewer. But more on this latter :)

The tasks to achieve to get this release out were:

  1. resurrect the default branch after 3 months of nasty bugs introduced by simply merging from the stable branch without any time to test
  2. update main cubes to the new test / uicfg / hooks / members api
  3. finish the editcontroller (which handle post of most web forms) refactoring
  4. finish the relation permissions change, including migration
  5. update the documentation
  6. test real applications

Of course this was ambitious :) Among those point 0. and 1. and 3. took us much more time than I expected. The editcontroller work (2.) has not been finished yet, and we didn't find any time for the documentation (4.).

Besides this, everyone (well, me at least ;) enjoyed its time while working hard all together in our new meeting room! The 3.6 version still needs a little work before being released, but the development branch is definitly back, with a great bunch of cubes ready. Among them : comment, tag, blog, keyword, tracker, forge, card, nosylist, etc...

So many thanks to everyone, and particularly to our Mexican friends Carlos and Celso... Tequila! ;)

By the way the good news is that we plan to do more sprints like this now that we've some room for it!


Customizing search box with magicsearch

2009/12/13 by Adrien Di Mascio

During last cubicweb sprint, I was asked if it was possible to customize the search box CubicWeb comes with. By default, you can use it to either type RQL queries, plain text queries or standard shortcuts such as <EntityType> or <EntityType> <attrname> <value>.

Ultimately, all queries are translated to rql since it's the only language understood on the server (data) side. To transform the user query into RQL, CubicWeb uses the so-called magicsearch component which in turn delegates to a number of query preprocessor that are responsible of interpreting the user query and generating corresponding RQL.

The code of the main processor loop is easy to understand:

for proc in self.processors:
    try:
        return proc.process_query(uquery, req)
    except (RQLSyntaxError, BadRQLQuery):
        pass

The idea is simple: for each query processor, try to translate the query. If it fails, try with the next processor, if it succeeds, we're done and the RQL query will be executed.

Now that the general mechanism is understood, here's an example of code that could be used in a forge-based cube to add a new search shortcut to find tickets. We'd like to use the project_name:text syntax to search for tickets of project_name containing text (e.g pylint:warning).

Here's the corresponding preprocessor code:

from cubicweb.web.views.magicsearch import BaseQueryProcessor

class MyCustomQueryProcessor(BaseQueryProcessor):
    priority = 0 # controls order in which processors are tried

    def preprocess_query(self, uquery, req):
        """
        :param uqery: the query as sent by the browser
        :param req: the standard, omnipresent, cubicweb's req object
        """
        try:
            project_name, text = uquery.split(':')
        except ValueError:
            return None # the shortcut doesn't apply
        return (u'Any T WHERE T is Ticket, T concerns P, P name %(p)s, '
                u'T has_text %(t)s', {'p': project_name, 't': text})

The code is rather self-explanatory, but here's a few additional comments:

  • the class is registered with the standard vregistry mechanism and should be defined along the views
  • the priority attribute is used to sort and define the order in which processors will be tried in the main processor loop
  • the preprocess_query returns None or raise an exception if the query can't be processed

To summarize, if you want to customize the search box, you have to:

  1. define a new query preprocessor component
  2. define its priority wrt other standard processors
  3. implement the preprocess_query method

and CubicWeb will do the rest !


Using gettext on windows

2009/12/01
http://www.gnu.org/graphics/gnu-head-sm.jpg

CubicWeb relies on gnu gettext for its translation management. However, the binary installers easily found for gettext (such as the one in python(x,y)) are for older versions, and compiling it is not that easy (especially in the Python world where people do not necessarily have a C compiler at hand).

We did the job and a binary installer for gnu gettext 0.17 is available on our ftp server.


Browsing the Semantic Web

2009/10/31 by Nicolas Chauvat
http://www.cubicweb.org/image/502157?vid=download

Now that the Web of Data has become a reality, innovative applications are springing up everywhere. Here is a selection of web apps that help you browse the semantic web.

  • Parallax is a faceted browser that is demonstrated by displaying the content of Freebase.
  • Neofonie demonstrates its faceted browser by displaying the content of DBpedia at dbpedia.neofonie.de
  • VisiNav is a search engine that allows to refine searches in a way that reminds of facets.
  • Falcons is a search engine that indexes RDF data.
  • Sindice is a search engine that indexes RDF data as well as data extracted from Microformats. It offers public Sindice API that can be used to retrieve the search results as RDF, json or Atom.
  • SameAs is a service that returns all the equivalent URIs for a search term or a given URI.
  • When you enter search terms, Sig.ma collates the data from the resources included in the results of a search on Sindice.
  • When you publish your product data according to the GoodRelations ontology, informations like the price show up in Yahoo's search results.

More and more services will appear in the coming months that make use of these new resources. Just for tagging, you may look at CommonTag, Zemanta and OpenCalais and imagine new ways to automate and facilitate the process of publishing information on the web.


Comparing CubicWeb with Drupal plus CCK extension

2009/10/29 by Nicolas Chauvat
http://www.cubicweb.org/image/502151?vid=download

Drupal is a CMS written in PHP that is getting more and more visibility in the Semantic Web crowd. Several researchers from DERI have been using it as a test bed for their research projects and developed extensions to showcase their ideas. It is for example used to build the Semantic Web Dog Food site that archives the semantic web conferences and publishes them as Linked Open Data. The URL for this year's ISWC is http://data.semanticweb.org/conference/iswc/2009

This led me to read more about Drupal than I had had the incentive before. I have not had time to give it a try, but I skimmed the documentation and will try to compare it with CubicWeb from a software architecture point of view.

Drupal defines a Node as an information item. The CCK (aka Content Construction Kit) can be used to define new types of Nodes thru a web interface. Nodes and the bits and pieces used to display them as HTML are not packed together in components. The Features extension is planning on getting this bits packaged.

If you are a Drupal user/developer and think I am not being fair to Drupal, please comment below.

On the other hand, CubicWeb has implemented very early the concept of reusable component. What is called a Node in Drupal is an Entity in CubicWeb. By design, CubicWeb does not have a web interface to define entities. The data model is part of the code. To efficiently maintain applications in production, changes to the data model must be tracked with changes to the code. Data model changes imply migration procedures. In CubicWeb, all of this is versionned and made part of the components. Where Drupal needs to grow extensions like CCK and Features, CubicWeb has more advanced possibilities by design, for example the ability to develop featurefull applications by assembling components.

This was a very short comparison. I'm looking forward to getting a chance of discussing it with knowledgeable Drupal hackers.


Relase early, release often

2009/10/05 by Arthur Lutz

Looking at the releases of the CubicWeb projects for the month of September alone, I think we can conclude that we are applying the Agile Software Development principle quite closely.

http://farm4.static.flickr.com/3025/2732378117_cdd948fd1d_m.jpg
  • 11 releases of the cubicweb framework (now in stable and unstable flavors) : 3.5.2, 3.5.1, 3.5.0, 3.4.11, , 3.4.9, 3.4.8, 3.4.7, 3.4.6, 3.4.5, 3.4.4, 3.4.3
  • 3 releases of cubicweb-vcsfile
  • 4 releases of cubicweb-forge
  • 2 releases of cubicweb-drh
  • 2 releases of cubicweb-workorder
  • 1 release of cubicweb-conference, cubicweb-tracker, cubicweb-registration, cubicweb-timesheet, cubicweb-workcase, cubicweb-task, cubicweb-expense, cubicweb-calendar, cubicweb-invoice, cubicweb-nosylist, etc.

Hope you can keep-up or use the stable versions...

photo by kennymatic under creative commons


Running CubicWeb on Windows

2009/09/08

This was not supported (and still isn't officially). But rumors have been circulating about a port of CubicWeb on Windows.

http://pc-astuces.seebz.net/images/logo-windows-small.jpg

I can confirm that there is some truth in this. A few changesets have been circulating, to be merged in the official repository any time now, enabling one to run CubicWeb on Windows. Support for running CubicWeb as a Windows service is not available yet, but should become available in the next few weeks.

Update: check out the source of the 3.5 branch, and you will be able to create and start instances on Windows. Use cubicweb-ctl start -D for non daemon start.


Sparkles everywhere, CubicWeb gets fizzy

2009/07/28 by Adrien Di Mascio
http://www.logilab.org/image/9845?vid=download

Last week, we finally took a few days to dive into SPARQL in order to transform any CubicWeb application into a potential SPARQL endpoint.

The first step was to get a parser. Fortunately the w3c provides a grammar definition and around 200 test cases. There was a few interesting options around there: we tried to reuse rdflib, rasqal, the sparql.g version designed for antlr3 and SimpleParse but after two days of work, we had nothing that worked well enough. We decided it was not worth it and switched to yapps since we knew yapps and rql already had a dependency on it.

Maybe we'll consider changing the parser at some point later but the priority was to get something working as soon as we could and we finally came up with a version of fyzz passing 90% of the W3C test suite (of course, there might be some false positives).

Fyzz parses the SPARQL query and generates something we decided to call an AST although it's still a bit rough for now. Fyzz understands simple triples, distincts, limits, offsets and other basic functionalities.

Please note that fyzz is totally independent of cubicweb and it can be reused by any project.

Here's an example of how to use fyzz:

>>> from fyzz.yappsparser import parse
>>> ast = parse("""PREFIX doap: <http://usefulinc.com/ns/doap#>
... SELECT ?project ?name WHERE {
...    ?project a doap:Project;
...         doap:name ?name.
... }
... ORDER BY ?name LIMIT 5 OFFSET 10
... """)
>>> print ast.selected
[SparqlVar('project'), SparqlVar('name')]
>>> print ast.prefixes
{'doap': 'http://usefulinc.com/ns/doap#'}
>>> print ast.orderby
[(SparqlVar('name'), 'asc')]
>>> print ast.limit, ast.offset
5 10
>>> print ast.where
[(SparqlVar('project'), ('', 'a'), ('http://usefulinc.com/ns/doap#', 'Project')),
 (SparqlVar('project'), ('http://usefulinc.com/ns/doap#', 'name'), SparqlVar('name'))]

This AST is then processed and transformed into a RQL query which can finally be processed by CubicWeb directly.

Here's what can be done in cubicweb-ctl shell session (of course, this can also be done in the web application) of our forge cube:

>>> from cubicweb.spa2rql import Sparql2rqlTranslator
>>> query = """PREFIX doap: <http://usefulinc.com/ns/doap#>
... SELECT ?project ?name WHERE {
...    ?project a doap:Project;
...         doap:name ?name.
... }
... ORDER BY ?name LIMIT 5 OFFSET 10
... """
>>> qinfo = translator.translate(query)
>>> rql, args = qinfo.finalize()
>>> print rql, args
Any PROJECT, NAME ORDERBY NAME ASC LIMIT 5 OFFSET 10 WHERE PROJECT name NAME, PROJECT is Project {}

From the above example, we can notice two things. First, for cubicweb to understand the doap namespace, we have to declare the correspondance between the standard doap vocabulary and our internal schema, this is done with yams.xy:

>>> from yams import xy
>>> xy.register_prefix('http://usefulinc.com/ns/doap#', 'doap')
>>> xy.add_equivalence('Project', 'doap:Project')
>>> xy.add_equivalence('Project name', 'doap:Project doap:name')

Secondly, for now, we notice that the case is not preserved during the transformation : ?project becomes PROJECT in the rql query. This is probably something that we'll need to tackle quickly.

We've also add a few views in CubicWeb to wrap that and it will be available in the upcoming version 3.4.0 and is already available through our pulic mercurial repository.

The door is now open, the path is still long, stay tuned !

image under creative commons by beger (original)


CubicWeb at BayPiggies/OSCON in July

2009/07/14 by Nicolas Chauvat
http://www.logilab.org/image/9631?vid=download

I am pleased to announce that CubicWeb will be presented during a BayPIGgies meeting that will exceptionally take place in the OSCON conference building as the closing event of the Bird of Feathers on July 23rd at 8pm.

Joins us to get to know more about CubicWeb.

Read the report.