subscribe to this blog

CubicWeb Blog

News about the framework and its uses.

Exploring the datafeed API in CubicWeb

2014/09/26 by Denis Laxalde

The datafeed API is one of the nice features of the CubicWeb framework. It makes it possible to easily build such things as a news aggregator (or even a semantic news feed reader), a LDAP importer or an application importing data from another web platform. The underlying API is quite flexible and powerful. Yet, the documentation being quite thin, it may be hard to find one's way through. In this article, we'll describe the basics of the datafeed API and provide guiding examples.

The datafeed API is essentially built around two things: a CWSource entity and a parser, which is a kind of AppObject.

The CWSource entity defines a list of URL from which to fetch data to be imported in the current CubicWeb instance, it is linked to a parser through its __regid__. So something like the following should be enough to create a usable datafeed source [1].

create_entity('CWSource', name=u'some name', type='datafeed', parser=u'myparser')

The parser is usually a subclass of DataFeedParser (from cubicweb.server.sources.datafeed). It should at least implement the two methods process and before_entity_copy. To make it easier, there are specialized parsers such as DataFeedXMLParser that already define process so that subclasses only have to implement the process_item method.

Overview of the datafeed API

Before going into further details about the actual implementation of a DataFeedParser, it's worth having in mind a few details about the datafeed parsing and import process. This involves various players from the CubicWeb server, namely: a DataFeedSource (from cubicweb.server.sources.datafeed), the Repository and the DataFeedParser.

  • Everything starts from the Repository which loops over its sources and pulls data from each of these (this is done using a looping task which is setup upon repository startup). In the case of datafeed sources, Repository sources are instances of the aforementioned DataFeedSource class [2].
  • The DataFeedSource selects the appropriate parser from the registry and loops on each uri defined in the respective CWSource entity by calling the parser's process method with that uri as argument (methods pull_data and process_urls of DataFeedSource).
  • If the result of the parsing step is successful, the DataFeedSource will call the parser's handle_deletion method, with the URI of the previously imported entities.
  • Then, the import log is formatted and the transaction committed. The DataFeedSource and DataFeedParser are connected to an import_log which feeds the CubicWeb instance with a CWDataImport per data pull. This usually contains the number of created and updated entities along with any error/warning message logged by the parser. All this is visible in a table from the CWSource primary view.

So now, you might wonder what actually happens during the parser's process method call. This method takes an URL from which to fetch data and processes further each piece of data (using a process_item method for instance). For each data-item:

  1. the repository is queried to retrieve or create an entity in the system source: this is done using the extid2entity method;
  2. this extid2entity method essentially needs two pieces of information:
    • a so-called extid, which uniquely identifies an item in the distant source
    • any other information needed to create or update the corresponding entity in the system source (this will be later refered to as the sourceparams)
  3. then, given the (new or existing) entity returned by extid2entity, the parser can perform further postprocessing (for instance, updating any relation on this entity).

In step 1 above, the parser method extid2entity in turns calls the repository method extid2eid given the current source and the extid value. If an entry in the entities table matches with the specified extid, the corresponding eid (identifier in the system source) is returned. Otherwise, a new eid is created. It's worth noting that the created entity (in case the entity is to be created) is not complete with respect to the data model at this point. In order the entity to be completed, the source method before_entity_insertion is called. This is where the aforementioned sourceparams are used. More specifically, on the parser side the before_entity_copy method is called: it usually just updates (using entity.cw_set() for instance) the fetched entity with any relevant information.

Case study: a news feeds parser

Now we'll go through a concrete example to illustrate all those fairly abstract concepts and implement a datafeed parser which can be used to import news feeds. Our parser will create entities of type FeedArticle, which minimal data model would be:

class FeedArticle(EntityType):
    title = String(fulltextindexed=True)
    uri = String(unique=True)
    author = String(fulltextindexed=True)
    content = RichString(fulltextindexed=True, default_format='text/html')

Here we'll reuse the DataFeedXMLParser, not because we have XML data to parse, but because its interface fits well with our purpose, namely: it ships an item-based processing (a process_item method) and it relies on a parse method to fetch raw data. The underlying parsing of the news feed resources will be handled by feedparser.

class FeedParser(DataFeedXMLParser):
    __regid__ = 'newsaggregator.feed-parser'

The parse method is called by process, it should return a list tuples with items information.

def parse(self, url):
    """Delegate to feedparser to retrieve feed items"""
    data = feedparser.parse(url)
    return zip(data.entries)

Then the process_item method takes an individual item (i.e. an entry of the result obtained from feedparser in our case). It essentially defines an extid, here the uri of the feed entry (good candidate for unicity) and calls extid2entity with that extid, the entity type to be created / retrieved and any additional data useful for entity completion passed as keyword arguments. (The process_feed method call just transforms the results obtained from feedparser into a dict suitable for entity creation following the data model described above.)

def process_item(self, entry):
    data = self.process_feed(entry)
    extid = data['uri']
    entity = self.extid2entity(extid, 'FeedArticle', feeddata=data)

The before_entity_copy method is called before the entity is actually created (or updated) in order to give the parser a chance to complete it with any other attribute that could be set from source data (namely feedparser data in our case).

def before_entity_copy(self, entity, sourceparams):
    feeddata = sourceparams['feeddata']
    entity.cw_edited.update(feeddata)

And this is all what's essentially needed for a simple parser. Further details could be found in the news aggregator cube. More sophisticated parsers may use other concepts not described here, such as source mappings.

Testing datafeed parsers

Testing a datafeed parser often involves pulling data from the corresponding datafeed source. Here is a minimal test snippet that illustrates how to retrieve the datafeed source from a CWSource entity and to pull data from it.

with self.admin_access.repo_cnx() as cnx:
    # Assuming one knows the URI of a CWSource.
    rset = cnx.execute('CWSource X WHERE X uri %s' % uri)
    # Retrieve the datafeed source instance.
    dfsource = self.repo.sources_by_eid[rset[0][0]]
    # Make sure it's parser matches the expected.
    self.assertEqual(dfsource.parser_id, '<my-parser-id>')
    # Pull data using an internal connection.
    with self.repo.internal_cnx() as icnx:
        stats = dfsource.pull_data(icnx, force=True, raise_on_error=True)
        icnx.commit()

The resulting stats is a dictionnary containing eids of created and updated entities during the pull. In addition all entities created should have the cw_source relation set to the corresponding CWSource entity.

Notes

[1]

It is possible to add some configuration to the CWSource entity in the form a string of configuration items (one per line). Noteworthy items are:

  • the synchronization-interval;
  • use-cwuri-as-url=no, which avoids using external URL inside the CubicWeb instance (leading to any link on an imported entity to point to the external source URI);
  • delete-entities=[yes,no] which controls if entities not found anymore in the distant source should be deleted from the CubicWeb instance.
[2]The mapping between CWSource entities' type (e.g. "datafeed") and DataFeedSource object is quite unusual as it does not rely on the vreg but uses a specific sources registry (defined in cubicweb.server.SOURCE_TYPES).

CubicWeb roadmap meeting on September 4th, 2014

2014/09/01 by Nicolas Chauvat

The Logilab team holds a roadmap meeting every two months to plan its CubicWeb development effort. The previous roadmap meeting was in July 2014.

Here is the report about the September 4th, 2014 meeting. Christophe de Vienne (Unlish) and Dimitri Papadopoulos (CEA) joined us to express their concerns and discuss the future of CubicWeb.

Versions

Version 3.17

This version is stable but old and maintainance will continue only as long as some customers will be willing to pay for it (current is 3.17.16 with 3.17.17 in development).

Version 3.18

This version is stable and maintained (current is 3.18.5 with 3.18.6 in development).

Version 3.19

This version is stable and maintained (current is 3.19.3 with 3.19.4 in development).

Version 3.20

This version is under development. It will try to reduce as much as possible the stock of patches in the state "reviewed", "awaiting review" and "in progress". If you have had something in the works that has not been accepted yet, please ready it for 3.20 and get it merged.

It should still include the work done for CWEP-002 (computed attributes and relations).

For details read list of tickets for CubicWeb 3.20.0.

Version 3.21

Removal of the dbapi, merging of Connection and ClientConnection, CWEP-003 (adding a FROM clause to RQL).

Version 4.0

When the work done for Pyramid will have been tested, it will become the default runner and a lot of things will be dropped: twisted, dead code, ui and core code that would be better cast into cubes, etc.

This version could happen early in 2015.

Cubes

New cubes and libraries

CWEPs

Here is the status of open CubicWeb Evolution Proposals:

CWEP-0002 full-featured implementation, to be merged in 3.20

CWEP-0003 patches sent to the review. . Champion will be adim.

Work in progress

PyConFR

Christophe will try to present at PyConFR the work he did on getting CubicWeb to work with Pyramid.

Pip-friendly source layout

Logilab and Christophe will try to make CubicWeb more pip/virtualenv-friendly. This may involve changing the source layout to include a sub-directory, but the impact on existing devs is expected to be too much and could be delayed to CubicWeb 4.0.

Pyramid

Christophe has made good progress on getting CubicWeb to work with Pyramid and he intends to put it into production real soon now. There is a Pyramid extension named pyramid_cubicweb and a CubicWeb cube named cubicweb-pyramid. Both work with CubicWeb 3.19. Christophe demonstrated using the debug toolbar, authenticating users with Authomatic and starting multiple workers with uWSGI.

Early adopters are now invited to jump in and help harden the code!

Agenda

Logilab's next roadmap meeting will be held at the beginning of november 2014 and Christophe and Dimitri were invited.


Handling dependencies between form fields in CubicWeb

2014/07/11 by Denis Laxalde

This post considers the issue of building an edition form of a CubicWeb entity with dependencies on its fields. It's a quite common issue that needs to be handled client-side, based on user interaction.

Consider the following example schema:

from yams.buildobjs import EntityType, RelationDefinition, String, SubjectRelation
from cubicweb.schema import RQLConstraint

_ = unicode

class Country(EntityType):
    name = String(required=True)

class City(EntityType):
    name = String(required=True)

class in_country(RelationDefinition):
    subject = 'City'
    object = 'Country'
    cardinality = '1*'

class Citizen(EntityType):
    name = String(required=True)
    country = SubjectRelation('Country', cardinality='1*',
                              description=_('country the citizen lives in'))
    city = SubjectRelation('City', cardinality='1*',
                           constraints=[
                               RQLConstraint('S country C, O in_country C')],
                           description=_('city the citizen lives in'))

The main entity of interest is Citizen which has two relation definitions towards Country and City. Then, a City is bound to a Country through the in_country relation definition.

In the automatic edition form of Citizen entities, we would like to restrict the choices of cities depending on the selected Country, to be determined from the value of the country field. (In other words, we'd like the constraint on city relation defined above to be fulfilled during form rendering, not just validation.) Typically, in the image below, cities not in Italy should be available in the city select widget:

Example of Citizen entity edition form.

The issue will be solved by little customization of the automatic entity form, some uicfg rules and a bit of Javascript. In the following, the country field will be referred to as the master field whereas the city field as the dependent field.

So here the code of the views.py module:

from cubicweb.predicates import is_instance
from cubicweb.web.views import autoform, uicfg
from cubicweb.uilib import js

_ = unicode


class CitizenAutoForm(autoform.AutomaticEntityForm):
    """Citizen autoform handling dependencies between Country/City form fields
    """
    __select__ = is_instance('Citizen')

    needs_js = autoform.AutomaticEntityForm.needs_js + ('cubes.demo.js', )

    def render(self, *args, **kwargs):
        master_domid = self.field_by_name('country', 'subject').dom_id(self)
        dependent_domid = self.field_by_name('city', 'subject').dom_id(self)
        self._cw.add_onload(js.cw.cubes.demo.initDependentFormField(
            master_domid, dependent_domid))
        super(CitizenAutoForm, self).render(*args, **kwargs)


def city_choice(form, field):
    """Vocabulary function grouping city choices by country."""
    req = form._cw
    vocab = [(req._('<unspecified>'), '')]
    for eid, name in req.execute('Any X,N WHERE X is Country, X name N'):
        rset = req.execute('Any N,E ORDERBY N WHERE'
                           ' X name N, X eid E, X in_country C, C eid %(c)s',
                           {'c': eid})
        if rset:
            # 'optgroup' tag.
            oattrs = {'id': 'country_%s' % eid}
            vocab.append((name, None, oattrs))
            for label, value in rset.rows:
                # 'option' tag.
                vocab.append((label, str(value)))
    return vocab


uicfg.autoform_field_kwargs.tag_subject_of(('Citizen', 'city', '*'),
                                           {'choices': city_choice, 'sort': False})

The first thing (reading from the bottom of the file) is that we've added a choices function on city relation of the Citizen automatic entity form via uicfg. This function city_choice essentially generates the HTML content of the field value by grouping available cities by respective country through the addition of some optgroup tags.

Then, we've overridden the automatic entity form for Citizen entity type by essentially calling a piece of Javascript code fed with the DOM ids of the master and dependent fields. Fields are retrieved by their name (field_by_name method) and respective id using the dom_id method.

Now the Javascript part of the picture:

cw.cubes.demo = {
    // Initialize the dependent form field select and bind update event on
    // change on the master select.
    initDependentFormField: function(masterSelectId,
                                     dependentSelectId) {
        var masterSelect = cw.jqNode(masterSelectId);
        cw.cubes.demo.updateDependentFormField(masterSelect, dependentSelectId);
        masterSelect.change(function(){
            cw.cubes.demo.updateDependentFormField(this, dependentSelectId);
        });
    },

    // Update the dependent form field select.
    updateDependentFormField: function(masterSelect,
                                       dependentSelectId) {
        // Clear previously selected value.
        var dependentSelect = cw.jqNode(dependentSelectId);
        $(dependentSelect).val('');
        // Hide all optgroups.
        $(dependentSelect).find('optgroup').hide();
        // But the one corresponding to the master select.
        $('#country_' + $(masterSelect).val()).show();
    }
}

It consists of two functions. The initDependentFormField is called during form rendering and it essentially bind the second function updateDependentFormField to the change event of the master select field. The latter "update" function retrieves the dependent select field, hides all optgroup nodes (i.e. the whole content of the select widget) and then only shows dependent options that match with selected master option, identified by a custom country_<eid> set by the vocabulary function above.


CubicWeb roadmap meeting on July 3rd, 2014

2014/06/26 by Nicolas Chauvat

The Logilab team holds a roadmap meeting every two months to plan its CubicWeb development effort. The previous roadmap meeting was in May 2014.

Here is the report about the July 3rd, 2014 meeting. Christophe de Vienne (Unlish) and Dimitri Papadopoulos (CEA) joined us to express their concerns and discuss the future of CubicWeb.

Versions

Version 3.17

This version is stable but old and maintainance will continue only as long as some customers will be willing to pay for it (current is 3.17.15 with 3.17.16 in development).

Version 3.18

This version is stable and maintained (current is 3.18.5 with 3.18.6 in development).

Version 3.19

This version was published at the end of April and has now been tested on our internal servers. It includes support for Cross Origin Resource Sharing (CORS) and a heavy refactoring that modifies sessions and sources to lay the path for CubicWeb 4.

For details read the release notes or the list of tickets for CubicWeb 3.19.0. Current is 3.19.2

Version 3.20

This version is under development. It will try to reduce as much as possible the stock of patches in the state "reviewed", "awaiting review" and "in progress". If you have had something in the works that has not been accepted yet, please ready it for 3.20 and get it merged.

It should still include the work done for CWEP-002 (computed attributes and relations.

For details read list of tickets for CubicWeb 3.20.0.

Version 3.21 (or maybe 4.0?)

Removal of the dbapi, merging of Connection and ClientConnection, CWEP-003 (adding a FROM clause to RQL).

Cubes

Cubes published over the past two months

New cubes

  • cubicweb-frbr: Cube providing a schema based on FRBR entities
  • cubicweb-clinipath
  • cubicweb-fastimport

CWEPs

Here is the status of open CubicWeb Evolution Proposals:

CWEP-0002 only missing a bit of migration support, to be finished soon for inclusion in 3.20.

CWEP-0003 has been reviewed and is waiting for a bit of reshaping that should occurs soon. It's targeted for 3.21.

New CWEPs are expected to be written for clarifying the API of the _cw object, supporting persistent sessions and improving the performance of massive imports.

Work in progress

Design

The new logo is now published in the 3.19 line. David showed us his experimentation that modernize a forge's ui with a bit of CSS. There is still a bit of pressure on the bootstrap side though, as it still rely on heavy monkey-patching in the cubicweb-bootstrap cube.

Data import

Also, Dimitry expressed is concerns with the lack of proper data import API. We should soon have some feedback from Aurelien's cubicweb-fastimport experimentation, which may be an answer to Dimitry's need. In the end, we somewhat agreed that there were different needs (eg massive-no-consistency import vs not-so-big-but-still-safe), that cubicweb.dataimport was an attempt to answer them all and then cubicweb-dataio and cubicweb-fastimport were more specific responses. In the end we may reasonably hope that an API will emerge.

Removals

On his way to persistent sessions, Aurélien made a huge progress toward silence of warnings in the 3.19 tests. dbapi has been removed, ClientConnection / Connection merged. We decided to take some time to think about the recurring task management as it is related to other tricky topics (application / instance configuration) and it's not directly related to persistent session.

Rebasing on Pyramid

Last but not least, Christophe demonstrated that CubicWeb could basically live with Pyramid. This experimentation will be pursued as it sounds very promising to get the good parts from the two framework.

Agenda

Logilab's next roadmap meeting will be held at the beginning of september 2014 and Christophe and Dimitri were invited.


Logilab's roadmap for CubicWeb on May 15th, 2014

2014/05/21 by Nicolas Chauvat

The Logilab team holds a roadmap meeting every two months to plan its CubicWeb development effort. Here is the report about the May 15th, 2014 meeting. The previous report posted to the blog was the march 2014 roadmap.

Versions

Version 3.17

This version is stable but old and maintainance will continue only as long as some customers will be willing to pay for it (current is 3.17.15).

Version 3.18

This version is stable and maintained (current is 3.18.4).

Version 3.19

This version was published at the end of April. It includes support for Cross Origin Resource Sharing (CORS) and a heavy refactoring that modifies sessions and sources to lay the path for CubicWeb 4.

For details read the release notes or the list of tickets for CubicWeb 3.19.0.

Version 3.20

This version is under development. It will try to reduce as much as possible the stock of patches in the state "reviewed", "awaiting review" and "in progress". If you have had something in the works that has not been accepted yet, please ready it for 3.20 and get it merged.

It should also include the work done for CWEP-002 (computed attributes and relations) and the merging of Connection and ClientConnection if it happens to be simple enough to get done quickly (in case the removal of dbapi would really help, this merging will wait for 3.21).

For details read list of tickets for CubicWeb 3.20.0.

Version 3.21 (or maybe 4.0?)

Removal of the dbapi and merging of CWEP-003 (adding a FROM clause to RQL).

Cubes

Here is a list of cubes that had versions published over the past two months: accidents, awstats, book, bootstrap, brainomics, cmt, collaboration, condor, container, dataio, expense, faq, file, forge, forum, genomics, geocoding, inlineedit, inventory, keyword, link, mailinglist, mediaplayer, medicalexp, nazcaui, ner, neuroimaging, newsaggregator, processing, questionnaire, rqlcontroller, semnews, signedrequest, squareui, task, testcard, timesheet, tracker, treeview, vcsfile, workorder.

Here are a the new cubes we are pleased to announce:

rqlcontroller receives via a POST a list of RQL queries and executes them. This is a way to build web services.

wsme is helping build a web service API on top of a CubicWeb database.

signedrequest is a simple token based authentication system. This is a way for scripts or callback urls to access an instance without login/pwd information.

relationwidget is a widget usable in forms to edit relationships between objects. It depends on CubicWeb 3.19.

searchui is an experiment on adding blocks to the list of facets that allow building complex RQL queries step by step by clicking with the mouse instead of directly writing the RQL with the keyboard.

ckan is using the REST API of a CKAN data portal to mirror its content.

CWEPs

Here is the status of open CubicWeb Evolution Proposals:

CWEP-0002 is now in good shape and the goal is to have it merged into 3.20. It lacks some documentation and a migration script.

CWEP-0003 has made good progress during the latest sprint, but will need a thorough review before being merged. It will probably not be ready for 3.20 and have to wait for 3.21.

New CWEPs are expected to be written for clarifying the API of the _cw object, supporting persistent sessions and improving the performance of massive imports.

Visual identity

CubicWeb has a new logo that will appear before the end of may on its revamped homepage at http://www.cubicweb.org

Last but not least

As already said on the mailing list, other developers and contributors are more than welcome to share their own goals in order to define a roadmap that best fits everyone's needs.

Logilab's next roadmap meeting will be held at the beginning of july 2014.


What's new in CubicWeb 3.19

2014/05/05 by Aurelien Campeas

New functionalities

  • implement Cross Origin Resource Sharing (CORS) (see #2491768)
  • system_source.create_eid can return a range of IDs, to reduce overhead of batch entity creation

Behaviour Changes

  • The anonymous property of Session and Connection is now computed from the related user login. If it matches the anonymous-user in the config the connection is anonymous. Beware that the anonymous-user config is web specific. Therefore, no session may be anonymous in a repository only setup.

New Repository Access API

Connection replaces Session

A new explicit Connection object replaces Session as the main repository entry point. A Connection holds all the necessary methods to be used server-side (execute, commit, rollback, call_service, entity_from_eid, etc...). One obtains a new Connection object using session.new_cnx(). Connection objects need to have an explicit begin and end. Use them as a context manager to never miss an end:

with session.new_cnx() as cnx:
    cnx.execute('INSERT Elephant E, E name "Babar"')
    cnx.commit()
    cnx.execute('INSERT Elephant E, E name "Celeste"')
    cnx.commit()
# Once you get out of the "with" clause, the connection is closed.

Using the same Connection object in multiple threads will give you access to the same Transaction. However, Connection objects are not thread safe (hence at your own risks).

repository.internal_session is deprecated in favor of repository.internal_cnx. Note that internal connections are now safe by default, i.e. the integrity hooks are enabled.

Backward compatibility is preserved on Session.

dbapi vs repoapi

A new API has been introduced to replace the dbapi. It is called repoapi.

There are three relevant functions for now:

  • repoapi.get_repository returns a Repository object either from an URI when used as repoapi.get_repository(uri) or from a config when used as repoapi.get_repository(config=config).
  • repoapi.connect(repo, login, **credentials) returns a ClientConnection associated with the user identified by the credentials. The ClientConnection is associated with its own Session that is closed when the ClientConnection is closed. A ClientConnection is a Connection-like object to be used client side.
  • repoapi.anonymous_cnx(repo) returns a ClientConnection associated with the anonymous user if described in the config.

repoapi.ClientConnection replaces dbapi.Connection and company

On the client/web side, the Request is now using a repoapi.ClientConnection instead of a dbapi.Connection. The ClientConnection has multiple backward compatible methods to make it look like a dbapi.Cursor and dbapi.Connection.

Sessions used on the Web side are now the same as the ones used Server side. Some backward compatibility methods have been installed on the server side Session to ease the transition.

The authentication stack has been altered to use the repoapi instead of the dbapi. Cubes adding new elements to this stack are likely to break.

New API in tests

All current methods and attributes used to access the repo on CubicWebTC are deprecated. You may now use a RepoAccess object. A RepoAccess object is linked to a new Session for a specified user. It is able to create Connection, ClientConnection and web side requests linked to this session:

access = self.new_access('babar') # create a new RepoAccess for user babar
with access.repo_cnx() as cnx:
    # some work with server side cnx
    cnx.execute(...)
    cnx.commit()
    cnx.execute(...)
    cnx.commit()

with access.client_cnx() as cnx:
    # some work with client side cnx
    cnx.execute(...)
    cnx.commit()

with access.web_request(elephant='babar') as req:
    # some work with web request
    elephant_name = req.form['elephant']
    req.execute(...)
    req.cnx.commit()

By default testcase.admin_access contains a RepoAccess object for the default admin session.

API changes

  • RepositorySessionManager.postlogin is now called with two arguments, request and session. And this now happens before the session is linked to the request.
  • SessionManager and AuthenticationManager now take a repo object at initialization time instead of a vreg.
  • The async argument of _cw.call_service has been dropped. All calls are now synchronous. The zmq notification bus looks like a good replacement for most async use cases.
  • repo.stats() is now deprecated. The same information is available through a service (_cw.call_service('repo_stats')).
  • repo.gc_stats() is now deprecated. The same information is available through a service (_cw.call_service('repo_gc_stats')).
  • repo.register_user() is now deprecated. The functionality is now available through a service (_cw.call_service('register_user')).
  • request.set_session no longer takes an optional user argument.
  • CubicwebTC does not have repo and cnx as class attributes anymore. They are standard instance attributes. set_cnx and _init_repo class methods become instance methods.
  • set_cnxset and free_cnxset are deprecated. The database connection acquisition and release cycle is now more transparent.
  • The implementation of cascading deletion when deleting composite entities has changed. There comes a semantic change: merely deleting a composite relation does not entail any more the deletion of the component side of the relation.
  • _cw.user_callback and _cw.user_rql_callback are deprecated. Users are encouraged to write an actual controller (e.g. using ajaxfunc) instead of storing a closure in the session data.
  • A new entity.cw_linkable_rql method provides the rql to fetch all entities that are already or may be related to the current entity using the given relation.

Deprecated Code Drops

  • The session.hijack_user mechanism has been dropped.
  • EtypeRestrictionComponent has been removed, its functionality has been replaced by facets a while ago.
  • the old multi-source support has been removed. Only copy-based sources remain, such as datafeed or ldapfeed.

Logilab's roadmap for CubicWeb on March 7th, 2014

2014/03/10 by Nicolas Chauvat

The Logilab team holds a roadmap meeting every two months to plan its CubicWeb development effort. Here is the report about the Mar 7th, 2014 meeting. The previous report posted to the blog was the january 2014 roadmap.

Version 3.17

This version is stable but old and maintainance will stop in a few weeks (current is 3.17.13 and 3.17.14 is upcoming).

Version 3.18

This version is stable and maintained (current is 3.18.3 and 3.18.4 is upcoming).

Version 3.19

This version is about to be published. It includes a heavy refactoring that modifies sessions and sources to lay the path for CubicWeb 4.

For details read list of tickets for CubicWeb 3.19.0.

Version 3.20

This version will try to reduce as much as possible the stock of patches in the state "reviewed", "awaiting review" and "in progress". If you have had something in the works that has not been accepted yet, please ready it for 3.20 and get it merged.

It should also include the work done for CWEP-002 (computed attributes and relations) and CWEP-003 (adding a FROM clause to RQL).

For details read list of tickets for CubicWeb 3.20.0.

Cubes

Here is a list of cubes that had versions published over the past two months: addressbook, awstats, blog, bootstrap, brainomics, comment, container, dataio, genomics, invoice, mediaplayer, medicalexp, neuroimaginge, person, preview, questionnaire, securityprofile, simplefacet, squareui, tag, tracker, varnish, vcwiki, vtimeline.

Here are a the new cubes we are pleased to announce:

collaboration is a building block that reuses container and helps to define collaborative workflows where entities are cloned, modified and shared.

Our priorities for the next two months are collaboration and container, then narval/apycot, then mercurial-server, then rqlcontroller and signedrequest, then imagesearch.

Mid-term goals

The work done for CWEP-0002 (computed attributes and relations) is expected to land in CubicWeb 3.20.

The work done for CWEP-0003 (explicit data source federation using FROM in RQL) is expected to land in CubicWeb 3.20.

Tools to diagnose performance issues would be very useful. Maybe in 3.21 ?

Caching session data would help and some work was done on this topic during the sprint in february. Maybe in 3.22 ?

WSGI has made progress lately, but still needs work. Maybe in 3.23 ?

RESTfulness is a goal. Maybe in 3.24 ?

Maybe 3.25 will be in fact 4.0 ?

Events

A spring sprint will take place in Logilab's offices in Paris from April 28th to 30th. We invite all the interested parties to join us there!

Last but not least

As already said on the mailing list, other developers and contributors are more than welcome to share their own goals in order to define a roadmap that best fits everyone's needs.

Logilab's next roadmap meeting will be held at the beginning of may 2014.


CubicWeb sprint / winter 2014

2014/02/12 by Nicolas Chauvat

This sprint took place at Logilab's offices in Paris on Feb 13/14. People from CEA, Unlish, Crealibre and Logilab teamed up to push CubicWeb forward.

We did not forget the priorities from the roadmap:

  • CubicWeb 3.17.13 and 3.18.3 were released, and CubicWeb 3.19 made progress
  • the branch about ComputedAttributes and ComputedRelations (CWEP-002) is ready to be merged,
  • the branch about the FROM clause (CWEP-003) made progress (the CWEP was reviewed and part of the resulting spec was implemented),
  • in order to reduce work in progress, the number of patches in state reviewed or pending-review was brought down to 243 (from 302, that is 60 or 20%, which is not bad).

CubicWeb using Postgresql at its best

2014/02/08 by Nicolas Chauvat

We had a chat today with a core contributor to Postgresql from whom we may buy consulting services in the future. We discussed how CubicWeb could get the best out of Postgresql:

  • making use of the LISTEN/NOTIFY mechanism built into PG could be useful (to warn the cache about modified items for example) and PgQ is its good friend;
  • views (materialized or not) are another way to implement computed attributes and relations (see CWEP number 002) and it could be that the Entities table is in fact a view of other tables;
  • implementing RQL as an in-database language could open the door to new things (there is PL/pgSQL, PL/Python, what if we had PL/RQL?);
  • Foreign Data Wrappers written with Multicorn would be another way to write data feeds (see LDAP integration for an example);
  • managing dates can be tricky when users reside in different timezones and UTC is important to keep in mind (unicode/str is a good analogy);
  • for transitive closures that are often needed when implementing access control policies with __permissions, Postgresql can go a long way with queries like "WITH ... (SELECT UNION ALL SELECT RETURNING *) UPDATE USING ...";
  • the fastest way to load tabular data that does not need too much pre-processing is to create a temporary table in memory, then COPY-FROM the data into that table, then index it, then write the transform and load step in SQL (maybe with PL/Python);
  • when executing more than 10 updates in a row, it is better to write into a temporary table in memory, then update the actual tables with UPDATE USING (let's check if the psycopg driver does that when executemany is called);
  • reaching 10e8 rows in a table is at the time of this writing the stage when you should start monitoring your db seriously and start considering replication, partition and sharding.
  • full-text search is much better in Postgresql than the general public thinks it is and recent developments made it orders of magnitude faster than tools like Lucene or Solr and ElasticSearch;
  • when dealing with complex queries (searching graphs maybe), an option to consider is to implement a specific data type, use it into a materialized view and use GIN or GIST indexes over it;
  • for large scientific data sets, it could be interesting to link the numpy library into Postgresql and turn numpy arrays into a new data type;
  • Oh, and one last thing: the object-oriented tables of Postgresql are not such a great idea, unless you have a use case that fits them perfectly and does not hit their limitations (CubicWeb's is_instance_of does not seem to be one of these).

Hopin' I got you thinkin' :)

http://developer.postgresql.org/~josh/graphics/logos/elephant.png

Cubicweb sprints winter/spring 2014

2014/01/24 by David Douard

The Logilab team is pleased to announce two Cubicweb sprints to be held in its Paris offices in the upcoming months:

February 13/14th at Logilab in Paris

The agenda would be the FROM clause for which a CWEP is still awaited, and the RQL rewriter according to the CWEP02.

April 28/30th at Logilab in Paris

Agenda to be defined.

Join the party

All users and contributors of CubicWeb are invited to join the party. Just send an email to contact at Logilab.fr if you plan to come.

http://farm1.static.flickr.com/183/419945378_4ead41a76d_m.jpg