|
CubicWeb BlogNews about the framework and its uses.
|
To be decided. Some possible topics are :
- optimization (still)
- porting cubicweb to python3
- porting cubicweb to pypy
- persistent sessions
- finish twisted / wsgi refactoring
- inter-instance communication bus
- use subprocesses to handle datafeeds
- developing more debug-tools (debug console, view profiling, etc.)
- pluggable / unpluggable external sources (as needed for the cubipedia and semantic family)
- client-side only applications (javascript + http)
- mercurial storage backend: see this thread of the mailing list
- mercurial-server integration: see this email to the mailing list
other ideas are welcome, please bring them up on cubicweb@lists.cubicweb.org
This sprint will take place from in february 2012 from tuesday the 7th to friday the 10th. You are more than welcome to come along, help out and contribute. An introduction is planned for newcomers.
Network resources will be available for those bringing laptops.
Address : 104 Boulevard Auguste-Blanqui, Paris. Ring "Logilab" (googlemap)
Metro : Glacière
Contact : http://www.logilab.fr/contact
Dates : 07/02/2012 to 10/02/2012
CubicWeb is a semantic web framework written in Python that has been succesfully used in large-scale projects, such as data.bnf.fr (French National Library's opendata) or Collections des musées de Haute-Normandie (museums of Haute-Normandie).
CubicWeb provides a high-level query language, called RQL, operating over a relational database (PostgreSQL in our case), and allows to quickly instantiate an entity-relationship data-model. By separating in two distinct steps the query and the display of data, it provides powerful means for data retrieval and processing.
In this blog, we will demonstrate some of these capabilities on the Geonames data.
Geonames is an open-source compilation of geographical data from various sources:
"...The GeoNames geographical database covers all countries and contains over eight million placenames that are available for download free of charge..." (http://www.geonames.org)
The data is available as a dump containing different CSV files:
- allCountries: main file containing information about 8,000,000 places in the world. We won't detail the various attributes of each location, but we will focus on some important properties, such as population and elevation. Moreover, admin_code_1 and admin_code_2 will be used to link the different locations to the corresponding AdministrativeRegion, and feature_code will be used to link the data to the corresponding type.
- admin1CodesASCII.txt and admin2Codes.txt detail the different administrative regions, that are parts of the world such as region (Ile-de-France), department (Department of Yvelines), US counties...
- featureCodes.txt details the different types of location that may be found in the data, such as forest(s), first-order administrative division, aqueduct, research institute, ...
- timeZones.txt, countryInfo.txt, iso-languagecodes.txt are additional files prodividing information about timezones, countries and languages. They will be included in our CubicWeb database but won't be explained in more details here.
The Geonames website also provides some ways to browse the data: by Countries, by Largest Cities, by Highest mountains, by postal codes, etc. We will see that CubicWeb could be used to automatically create such ways of browsing data while allowing far deeper queries.
There are two main challenges when dealing with such data:
- the number of entries: with 8,000,000 placenames, we have to use efficient tools for storing and querying them.
- the structure of the data: the different types of entries are separated in different files, but should be merged for efficient queries (i.e. we have to rebuild the different links between entities, e.g Location to Country or Location to AdministrativeRegion).
With CubicWeb, the data model of the application is written in Python. It defines different entity classes with their attributes, as well as the relationships between the different entity classes. Here is a sample of the schema.py that we have used for Geonames data:
class Location(EntityType):
name = String(maxsize=1024, indexed=True)
uri = String(unique=True, indexed=True)
geonameid = Int(indexed=True)
latitude = Float(indexed=True)
longitude = Float(indexed=True)
feature_code = SubjectRelation('FeatureCode', cardinality='?*', inlined=True)
country = SubjectRelation('Country', cardinality='?*', inlined=True)
main_administrative_region = SubjectRelation('AdministrativeRegion',
cardinality='?*', inlined=True)
timezone = SubjectRelation('TimeZone', cardinality='?*', inlined=True)
...
This indicates that the main Location class has a name attribute (string), an uri (string), a geonameid (integer), a latitude and a longitude (both floats), and some relation to other entity classes such as FeatureCode (the relation is named feature_code), Country (the relation is named country), or AdministrativeRegion called main_administrative_region.
The cardinality of each relation is classically defined in a similar way as RDBMS, where * means any number, ? means zero or one and 1 means one and only one.
We give below a visualisation of the schema (obtained using the /schema relative url)
The data contained in the CSV files could be pushed and stored without any processing, but it is interesting to reconstruct the relations that may exist between different entities and entity classes, so that queries will be easier and faster.
Executing the import procedure took us 80 minutes on regular hardware, which seems very reasonable given the amount of data (~7,000,000 entities, 920MB for the allCountries.txt file), and the fact that we are also constructing many indexes (on attributes or on relations) to improve the queries. This import procedure uses some low-level SQL commands to load the data into the underlying relational database.
As stated before, queries are performed in CubicWeb using RQL (Relational Query Language), which is similar to SPARQL, but with a syntax that is closer to SQL. This language may be used to query directly the concepts while abstracting the physical structure of the underlying database. For example, one can use the following request:
Any X LIMIT 10 WHERE X is Location, X population > 1000000,
X country C, C name "France"
that means:
Give me 10 locations that have a population greater than 1000000, and that are in a country named "France"
The corresponding SQL query is:
SELECT _X.cw_eid FROM cw_Country AS _C, cw_Location AS _X
WHERE _X.cw_population>1000000
AND _X.cw_country=_C.cw_eid AND _C.cw_name="France"
LIMIT 10
We can see that RQL is higher-level than SQL and abstracts the details of the tables and the joins.
A query returns a result set (a list of results), that can be displayed using views. A main feature of CubicWeb is to separate the two steps of querying the data and displaying the results. One can query some data and visualize the results in the standard web framework, download them in different formats (JSON, RDF, CSV,...), or display them in some specific view developed in Python.
In particular, we will use the mapstraction.map which is based on the Mapstraction and the OpenLayers libraries to display information on maps using data from OpenStreetMap. This mapstraction.map view uses a feature of CubicWeb called adapter. An adapter adapts a class of entity to some interface, hence views can rely on interfaces instead of types and be able to display entities with different attributes and relations. In our case, the IGeocodableAdapter returns a latitude and a longitude for a given class of entity (here, the mapping is trivial, but there are more complex cases... :) ):
class IGeocodableAdapter(EntityAdapter):
__regid__ = 'IGeocodable'
__select__ = is_instance('Location')
@property
def latitude(self):
return self.entity.latitude
@property
def longitude(self):
return self.entity.longitude
We will give some results of queries and views later. It is important to notice that the following screenshoots are taken without any modification of the standard web interface of CubicWeb. It is possible to write specific views and to define a specific CSS, but we only wanted to show how CubicWeb could handle such data. However, the default web template of CubicWeb is sufficient for what we want to do, as it dynamically creates web pages showing attributes and relations, as well as some specific forms and javascript applets adapted directly to the data (e.g. map-based tools).
Last but not least, the query and the view could be defined within the url, and thus open a world of new possibilities to the user:
http://baseurl:port/?rql=The query that I want&vid=Identifier-of-the-view
We will not get into too much details about Facets, but let's just say that this feature may be used to determine some filtering axis on the data, and thus may be used to post-filter a result set. In this example, we have defined four different facets: on the population, on the elevation, one the feature_code and one the main_administrative_region. We will see illustration of these facets below.
We give here an example of the definition of a Facet:
class LocationPopulationFacet(facet.RangeFacet):
__regid__ = 'population-facet'
__select__ = is_instance('Location')
order = 2
rtype = 'population'
where __select__ defines which class(es) of entities are targeted by this facet, order defines the order of display of the different facets, and rtype defines the target attribute/relation that will be used for filtering.
The main page of the Geoname application is illustrated in the screenshot below. It provides general information on the database, in particular the number of entities in the different classes:
- 7,984,330 locations.
- 59,201 administrative regions (e.g. regions, counties, departments...)
- 7,766 languages.
- 656 features (e.g. types of location).
- 410 time zones.
- 252 countries.
- 7 continents.
We will first illustrate the possibilites of CubicWeb with the simple query that we have detailed before (that could be directly pasted in the url...):
Any X LIMIT 10 WHERE X is Location, X population > 1000000,
X country C, C name "France"
We obtain the following page:
This is the standard view of CubicWeb for displaying results. We can see (right box) that we obtain 10 locations that are indeed located in France, with a population of more than 1,000,000 inhabitants. The left box shows the search panel that could be used to launch queries, and the facet filters that may be used for filtering results, e.g. we may ask to keep only results with a population greater than 4,767,709 inhabitants within the previous results:
and we obtain now only 4 results. We can also notice that the facets are linked: by restricting the result set using the population facet, the other facets also restricted their possibilities.
Let's say that we now want more information about the results that we have obtained previously (for example the exact population, the elevation and the name). This is really simple ! We just have to ask within the RQL query what we want (of course, the names N, P, E of the variables could be almost anything...):
Any N, P, E LIMIT 10 WHERE X is Location,
X population P, X population > 1000000,
X elevation E, X name N, X country C, C name "France"
The empty column for the elevation simply means that we don't have any information about elevation.
Anyway, we can see that fetching particular information could not be simpler! Indeed, with more complex queries, we can access countless information from the Geonames database:
Any N,E,LA,LO ORDERBY E DESC LIMIT 10 WHERE X is Location,
X latitude LA, X longitude LO,
X elevation E, NOT X elevation NULL, X name N,
X country C, C name "France"
which means:
Give me the 10 highest locations (the 10 first when sorting by decreasing elevation) with their name, elevation, latitude and longitude that are in a country named "France"
We can now use another view on the same request, e.g. on a map (view mapstraction.map):
Any X ORDERBY E DESC LIMIT 10 WHERE X is Location,
X latitude LA, X longitude LO, X elevation E,
NOT X elevation NULL, X country C, C name "France"
And now, we can add the fact that we want more results (20), and that the location should have a non-null population:
Any N, E, P, LA, LO ORDERBY E DESC LIMIT 20 WHERE X is Location,
X latitude LA, X longitude LO,
X elevation E, NOT X elevation NULL, X population P,
X population > 0, X name N, X country C, C name "France"
... and on a map ...
In this blog, we have seen how CubicWeb could be used to store and query complex data, while providing (among other...) Web-based views for data vizualisation. It allows the user to directly query data within the URL and may be used to interact with and explore the data in depth. In a next blog, we will give more complex queries to show the full possibilities of the system.
In most cubicweb projects I've been developing on, there always comes a time where
I need to import legacy data in the new application. CubicWeb provides Store
and Controller objects in the dataimport module. I won't talk here about the
recommended general procedure described in the module's docstring (I find it a bit
convoluted for simple cases) but I will focus on Store objects. Store
objects in this module are more or less a thin layer around session objects,
they provide high-level helpers such as create_entity(), relate()
and keep track of what was inserted, errors occurred, etc.
In a recent project, I had to create a somewhat fair amount (a few million)
of simple entities (strings, integers, floats and dates) and relations. Default
object store (i.e. cubicweb.dataimport.RQLObjectStore) is painfully slow,
the reason being all integrity / security / metadata hooks that are constantly
selected and executed. For large imports, dataimport also provides the
cubicweb.dataimport.NoHookRQLObjectStore. This store bypasses all hooks
and uses the underlying system source primitives directly, making it around
two-times faster than the standard store. The problem is that we're still
doing each sql query sequentially and we're talking here of millions of
INSERT / UPDATE queries.
My idea was to create my own ObjectStore class inheriting from NoHookRQLObjectStore
that would try to use executemany or even copy_from when possible . It
is actually not hard to make groups of similar SQL queries since create_entity()
generates the same query for a given set of parameters. For instance:
create_entity('Person', firstname='John', surname='Doe')
create_entity('Person', firstname='Tim', surname='BL')
will generate the following sql queries:
INSERT INTO cw_Person ( cw_cwuri, cw_eid, cw_modification_date,
cw_creation_date, cw_firstname, cw_surname )
VALUES ( %(cw_cwuri)s, %(cw_eid)s, %(cw_modification_date)s,
%(cw_creation_date)s, %(cw_firstname)s, %(cw_surname)s )
INSERT INTO cw_Person ( cw_cwuri, cw_eid, cw_modification_date,
cw_creation_date, cw_firstname, cw_surname )
VALUES ( %(cw_cwuri)s, %(cw_eid)s, %(cw_modification_date)s,
%(cw_creation_date)s, %(cw_firstname)s, %(cw_surname)s )
The only thing that will differ is the actual data inserted. Well ... ahem ... CubicWeb actually
also generates a "few" extra sql queries to insert metadata for each entity:
INSERT INTO is_instance_of_relation(eid_from,eid_to) VALUES (%s,%s)
INSERT INTO is_relation(eid_from,eid_to) VALUES (%s,%s)
INSERT INTO cw_source_relation(eid_from,eid_to) VALUES (%s,%s)
INSERT INTO owned_by_relation ( eid_to, eid_from ) VALUES ( %(eid_to)s, %(eid_from)s )
INSERT INTO created_by_relation ( eid_to, eid_from ) VALUES ( %(eid_to)s, %(eid_from)s )
Those extra queries are actually even exactly the same for each entity insterted, whatever
the entity type is, hence craving for executemany or copy_from. Grouping together SQL queries
is not that hard but has a drawback : as you don't have an intermediate state
(the data is actually inserted only at the very end of the process),
you loose the ability to query your database to fetch the entities you've just created
during the import.
Now, a few benchmarks ...
To create those benchmarks, I decided to use the workorder cube which is a simple cube, yet complete enough : it provides only two entity types (WorkOrder and Order), a relation between them (Order split_into WorkOrder) and uses different kind of attributes (String, Date, Float).
Once the cube was instantiated, I ran the following script to populate the database with my 3 different stores:
import sys
from datetime import date
from random import choice
from itertools import count
from logilab.common.decorators import timed
from cubicweb import cwconfig
from cubicweb.dbapi import in_memory_repo_cnx
def workorders_data(n, seq=count()):
for i in xrange(n):
yield {'title': u'wo-title%s' % seq.next(), 'description': u'foo',
'begin_date': date.today(), 'end_date': date.today()}
def orders_data(n, seq=count()):
for i in xrange(n):
yield {'title': u'o-title%s' % seq.next(), 'date': date.today(), 'budget': 0.8}
def split_into(orders, workorders):
for workorder in workorders:
yield choice(orders), workorder
def initial_state(session, etype):
return session.execute('Any S WHERE S is State, WF initial_state S, '
'WF workflow_of ET, ET name %(etn)s', {'etn': etype})[0][0]
@timed
def populate(store, nb_workorders, nb_orders, set_state=False):
orders = [store.create_entity('Order', **attrs)
for attrs in orders_data(nb_orders)]
workorders = [store.create_entity('WorkOrder', **attrs)
for attrs in workorders_data(nb_workorders)]
## in_state is set by a hook, so NoHookObjectStore will need
## to set the relation manually
if set_state:
order_state = initial_state(store.session, 'Order')
workorder_state = initial_state(store.session, 'WorkOrder')
for order in orders:
store.relate(order.eid, 'in_state', order_state)
for workorder in workorders:
store.relate(workorder.eid, 'in_state', workorder_state)
for order, workorder in split_into(orders, workorders):
store.relate(order.eid, 'split_into', workorder.eid)
store.commit()
if __name__ == '__main__':
config = cwconfig.instance_configuration(sys.argv[1])
nb_orders = int(sys.argv[2])
nb_workorders = int(sys.argv[3])
repo, cnx = in_memory_repo_cnx(config, login='admin', password='admin')
session = repo._get_session(cnx.sessionid)
from cubicweb.dataimport import RQLObjectStore, NoHookRQLObjectStore
from cubes.mycube.dataimport.store import CopyFromRQLObjectStore
print 'testing RQLObjectStore'
store = RQLObjectStore(session)
populate(store, nb_workorders, nb_orders)
print 'testing NoHookRQLObjectStore'
store = NoHookRQLObjectStore(session)
populate(store, nb_workorders, nb_orders, set_state=True)
print 'testing CopyFromRQLObjectStore'
store = CopyFromRQLObjectStore(session)
I ran the script and asked to create 100 Order entities, 1000 WorkOrder entities
and to link each created WorkOrder to a parent Order
adim@esope:~/tmp/bench_cwdi$ python bench_cwdi.py bench_cwdi 100 1000
testing RQLObjectStore
populate clock: 24.590000000 / time: 46.169721127
testing NoHookRQLObjectStore
populate clock: 8.100000000 / time: 25.712352991
testing CopyFromRQLObjectStore
populate clock: 0.830000000 / time: 1.180006981
My interpretation of the above times is :
- The clock time indicates the time spent on CubicWeb server side (i.e. hooks
and data pre/postprocessing around SQL queries). The time time should be
the sum of clock time + time spent in postgresql.
- RQLObjectStore is slow ;-). Nothing new here, but the clock/time ratio
means that we're speding a lot of time on the python side (i.e. hooks
as I told earlier) and a fair amount of time in postgresql.
- NoHookRQLObjectStore really takes down the time spent on the python side,
the time in postgresql remains about the same as for RQLObjectStore, this
is not surprising, queries performed are the same in both cases.
- CopyFromRQLObjectStore seems blazingly fast in comparison (inserting
a few thousands of elements in postgresql with a COPY FROM statement
is not a problem). And ... yes, I checked the data was actually inserted,
and I even a ran a cubicweb-ctl db-check on the instance afterwards.
This probably opens new perspective for massive data imports since the client API
remains the same as before for the programmer. It's still a bit experimental, can only be
used for "dummy", brute-force import scenario where you can preprocess your data
in Python before updating the database, but it's probably worth having such
a store in the the dataimport module.
Data.gouv.fr is great news for the OpenData movement!
Two days ago, the French government released thousands of data sets on http://data.gouv.fr/ under an open licensing scheme that allows people to access and play with them. Thanks to the CubicWeb semantic web framework, it took us only a couple hours to put some of that open data to good use. Here is how we mapped the french railway system.
We used two of the datasets available on data.gouv.fr:
Train stations : description of the 6442 train stations in France, including their name, type and geographic coordinates.
Here is a sample of the file
441000;St-Germain-sur-Ille;Desserte Voyageur;48,23955;-1,65358
441000;Montreuil-sur-Ille;Desserte Voyageur-Infrastructure;48,3072;-1,6741
LevelCrossings : description of the 18159 level crossings on french railways, including their type and location.
Here is a sample of the file
558000;PN privé pour voitures avec barrières sans passage piétons accolé;48,05865;1,60697
395000;PN privé pour voitures avec barrières avec passage piétons accolé public;;48,82544;1,65795
Given the above datasets, we wrote the following data model to store the data in CubicWeb:
class Location(EntityType):
name = String(indexed=True)
latitude = Float(indexed=True)
longitude = Float(indexed=True)
feature_type = SubjectRelation('FeatureType', cardinality='?*')
data_source = SubjectRelation('DataGovSource', cardinality='1*', inlined=True)
class FeatureType(EntityType):
name = String(indexed=True)
class DataGovSource(EntityType):
name = String(indexed=True)
description = String()
uri = String(indexed=True)
icon = String()
The Location object is used for both train stations and level crossings. It has a name (text information), a latitude and a longitude (numeric information), it can be linked to multiple FeatureType objects and to a DataGovSource. The FeatureType object is used to store the type of train station or level crossing and is defined by a name (text information). The DataGovSource object is defined by a name, a description and a uri used to link back to the source data on data.gouv.fr.
We had to write a few lines of code to benefit from the massive data import feature of CubicWeb before we could load the content of the CSV files with a single command:
$ cubicweb-ctl import-datagov-location datagov_geo gare.csv-fr.CSV --source-type=gare
$ cubicweb-ctl import-datagov-location datagov_geo passage_a_niveau.csv-fr.CSV --source-type=passage
In less than a minute, the import was completed and we had:
- 2 DataGovSource objects, corresponding to the two data sets,
- 24 FeatureType objects, corresponding to the different types of locations that exist (e.g. Non exploitée, Desserte Voyageur, PN public isolé pour piétons avec portillons or PN public pour voitures avec barrières gardé avec passage piétons accolé manoeuvré à distance),
- 24601 Locations, corresponding to the different train stations and level crossings.
CubicWeb allows to build complex applications by assembling existing components (called cubes). Here we used a cube that
wraps the Mapstraction and the OpenLayers libraries to display information on maps using data from OpenStreetMap.
In order for the Location type defined in the data model to be displayable on a map, it is sufficient to write the following adapter:
class IGeocodableAdapter(EntityAdapter):
__regid__ = 'IGeocodable'
__select__ = is_instance('Location')
@property
def latitude(self):
return self.entity.latitude
@property
def longitude(self):
return self.entity.longitude
That was it for the development part! The next step was to use the application to browse the structure of the french train network on the map.
Train stations in use:
Train stations not in use:
Zooming on some parts of the map, for example Brittany, we get to see more details and clicking on the train icons gives more information on the corresponding Location.
Train stations in use:
Train stations not in use:
Since CubicWeb separates querying the data and displaying the result of a query, we can switch the view to display the same data in tables or to export it back to a CSV file.
CubicWeb implements a query langage very similar to SPARQL, that makes the data available without the need to learn a specific API.
Example 1: http:/some.url.demo/?rql=Any X WHERE X is Location, X name LIKE "%miny"
This request gives all the Location with a name that ends with "miny". It returns only one element, the Firminy train station.
Example 2: http:/some.url.demo/?rql=Any X WHERE X is Location, X name LIKE "%ny"
This request gives all the Location with a name that ends with "ny", and return 112 trainstations.
Example 3: http:/some.url.demo/?rql=Any X WHERE X latitude < 47.8, X latitude>47.6, X longitude >-1.9, X longitude<-1.8
This request gives all the Location that have a latitude between 47.6 and 47.8, and a longitude between -1.9 and -1.8.
We obtain 11 Location (9 levelcrossings and 2 trainstations). We can map them using the view mapstraction.map that we describe previously.
Example 4: http:/domainname:8080/?rql=Any X WHERE X latitude < 47.8, X latitude>47.6, X longitude >-1.9, X longitude<-1.8, X feature_type F, F name "Desserte Voyageur"
Will limit the previous results set to train stations that are used for passenger service:
Example 5: http:/domainname:8080/?rql=Any X WHERE X feature_type F, F name "PN public pour voitures sans barrières sans SAL"&vid=mapstraction.map
Finally, one can map all the level crossings for vehicules without barriers (there are 3704):
As you could see in the last URL, the map view was chosen directly with the parameter vid, meaning that the URL is shareable and can be easily included in a blog with a iframe for example.
The result of a query can also be "displayed" in RDF, thus allowing users to download a semantic version of the information,
without having to do the preprocessing themselves:
<rdf:Description rdf:about="cwuri24684b3a955d4bb8830b50b4e7521450">
<rdf:type rdf:resource="http://ns.cubicweb.org/cubicweb/0.0/Location"/>
<cw:cw_source rdf:resource="http://some.url.demo/"/>
<cw:longitude rdf:datatype="http://www.w3.org/2001/XMLSchema#float">-1.89599</cw:longitude>
<cw:latitude rdf:datatype="http://www.w3.org/2001/XMLSchema#float">47.67778</cw:latitude>
<cw:feature_type rdf:resource="http://some.url.demo/7222"/>
<cw:data_source rdf:resource="http://some.url.demo/7206"/>
</rdf:Description>
For someone who knows the CubicWeb framework, a couple hours are enough to create a CubicWeb application that stores,
displays, queries and shares data downloaded from http://www.data.gouv.fr/
The full source code for the above will be released before the end of the week.
If you want to see more of CubicWeb in action, browse http://data.bnf.fr or learn how to develop your own application at http://docs.cubicweb.org/
I want to implement an entity with 2 boolean attributes, and a
requirement is that these two attributes never have the same boolean
value (think of some kind of radio buttons).
Let's start with a simple schema example:
# in schema.py
class MyEntity(EntityType):
use_option1 = Boolean(required=True, default=True)
use_option2 = Boolean(required=True, default=False)
So new entities will be conform to the spec.
To do this, you need two things:
- a constraint in the entity schema which will ring if both attributes
have the same value
- a hook which will toggle the other attribute when one attribute is
changed.
RQL constraints are generally meant to be used on relations, but you
can use them on attributes too. Simply use 'S' to denote the entity,
and write the constraint normally. You need to have the same constraint on both attributes, because the constraint evaluation is triggered by the modification of the attribute.
# in schema.py
class MyEntity(EntityType):
use_option1 = Boolean(required=True, default=True,
constraints = [
RQLConstraint('S use_option1 O1, S use_option2 != O1')
])
use_option2 = Boolean(required=True, default=False,
constraints = [
RQLConstraint('S use_option1 O1, S use_option2 != O1')
])
With this update, it is no longer possible to have both options set to
True or False (you will get a ValidationError). The nice thing to
have is to get the other option to be updated when one of the two
attributes is changed, which means that you don't have to take care of
this when editing the entity in the web interface (which you cannot do
anyway if you are using reledit for instance).
A nice way of writing the hook is to use Python's sets to avoid
tedious logic code:
class RadioButtonUpdateHook(Hook):
'''ensure use_option1 = not use_option2 (and conversely)'''
__regid__ = 'mycube.radiobuttonhook'
events = ('before_update_entity', 'before_add_entity')
__select__ = Hook.__select__ & is_instance('MyEntity')
# we prebuild the set of boolean attribute names
_flag_attributes = set(('use_option1', 'use_option2'))
def __call__(self):
entity = self.entity
edited = set(entity.cw_edited)
attributes = self._flag_attributes
if attributes.issubset(edited):
# both were changed, let the integrity hooks do their job
return
if not attributes & edited:
# none of our attributes where changed, do nothing
return
# find which attribute was modified
modified_set = attributes & edited
# find the name of the other attribute
to_change = (attributes - modified_set).pop()
modified_name = modified_set.pop()
# set the value of that attribute
entity.cw_edited[to_change] = not entity.cw_edited[modified_name]
That's it!
CubicWeb 3.13 has been developed for a while and includes some cool
stuff:
- generate and handle Apache's modconcat compatible URLs, to minimize the number
of HTTP requests necessary to retrieve JS and CSS files, along with a new
cubicweb-ctl command to generate a static 'data' directory that can be served
by a front-end instead of CubicWeb
- major facet enhancements:
- nicer layout and visual feedback when filtering is in-progress
- new RQLPathFacet to easily express new filters that are more than one hop
away from the filtered entities
- a more flexibile API, usable in cases where it wasn't previously possible
- some form handling refactorings and cleanups, notably introduction of a new
method to process posted content, and updated documentation
- support for new base types : BigInt, TZDateTime and TZTime (in 3.12 actually for those two)
- write queries optimization, and several RQL fixes on complex queries
(e.g. using HAVING, sub-queries...), as well as new support for CAST() function
and REGEXP operator
- datafeed source and default CubicWeb xml parsers:
- refactored into smaller and overridable chunks
- easier to configure
- make it work
As usual, the 3.13 also includes a bunch of other minor enhancements,
refactorings and bug fixes. Please download and install CubicWeb 3.13 and report
any problem on the tracker and/or the mailing-list!
Enjoy!
Logilab is hosting a CubicWeb sprint - 3 days in our Paris offices.
The general focus will be on speed :
- on cubicweb-server side : improve performance of massive insertions / deletions
- on cubicweb-client side : cache implementation, HTTP server, massive parallel usage, etc.
This sprint will take place from in April 2011 from tuesday the 26th to thursday the 28th. You are more than welcome to come along and help out, contribute, but unlike previous sprints, at least basic knowledge of CubicWeb will be required for participants since no introduction is planned.
Network resources will be available for those bringing laptops.
Address : 104 Boulevard Auguste-Blanqui, Paris. Ring "Logilab" (googlemap)
Metro : Glacière
Contact : http://www.logilab.fr/contact
Dates : 26/04/2011 to 28/04/2011
Unlike recent major version of CubicWeb, the 3.11 doesn't come with many API
changes or refactorings and introduces a fairly small set of new features. But
those are important features!
'pyrorql' sources mapping is now stored in the database instead of a python
file in the instance's home. This eases the deployment and maintenance of
distributed aplications.
A new 'datafeed' source was introduced, inspired by the soon to be
deprecated datafeed cube. It needs polishing but sets the foundation for
advanced semantic web applications that import content from others site
using simple http request.
A 'datafeed' source is associated to a parser that analyses the imported
data and then creates/updates entities accordingly. There is currently a
single parser in the core that imports CubicWeb-generated xml and needs to
be configured with a mapping information that defines how relations are to
be followed. It provides a viable alternative to 'pyrorql' sources. Other
parsers to import RDF, RSS, etc should come soon.
A new facet to filter entities based on the source they came from is now
available.
The management interface for users, groups, sources and site preferences
was simplified so it should be more intuitive to newbies (and others). Most
items have been dropped from the user drop-down menu and the simpler views
were made available through the '/manage' url.
The default 'index' / 'manage' view has been simplified to deprecate features
that rely on external folder and card cubes. That's almost the only
deprecation warning you'll get in upgrading to 3.11. Just this one won't
hurt!
The old_calendar module has been dropped in favor of
jQuery's fullcalendar powered views. That's a great news for applications
using calendar features. Since it was added to the exising calendar
module, you shouldn't have to change anything to get it working, unless you
were using old_calendar in which case you may have to update a few things.
This work was initiated by our mexican friends from Crealibre.
As usual, the 3.11 also includes a bunch of other minor enhancements,
refactorings and bug fixes. Please download and install CubicWeb 3.11 and
report any problem to the mailing-list!
Enjoy!
Having deployed and maintained several public medium sized web sites running
CubicWeb when I worked at SecondWeb, I was asked
by my friends from Logilab to write a blog post
describing how we managed our deployment while working with the customer and the
hosting company.
Customers that want to run such a medium traffic web site either tell you
which hosting company they partner with, or ask you to find one, so you have no
other choice to deal with an external hosting structure to manage the servers.
I prefer this by the way because:
- High Availability (HA) hosting really requires skills and hardware that are
neither common nor cheap;
- HA hosting requires 24/7/365 availability that SecondWeb could not (and did
not even want to) offer.
It is clearly difficult for all parties (try to put yourself in the shoes of the
customer...) to manage a website with 3 partners involved, each with their own
goals. From the development leader point of view, you will notice that the
technical people of the hosting company continuously change and you keep seeing
the same operational errors even if you provide and keep improving high quality
documentation. The software upgrade documentation has to be particularly clear
as it greatly influences the overall web site availability. You also have to
keep an history of the interventions on the servers yourself and maintain an
up-to-date copy of the configuration files.
The overall architecture proposed here partly benefits from this experience with
managed hosting company, in that we tried to keep it simple.
The architecture proposed here has been successfully tested with sites
delivering web pages to up to 2 millions unique visitors per month. It should
scale further up depending on your site database access needs: if you need very
fresh data and have a lot of write operations to the database, you will need
to distribute database access amongst several servers, which is beyond the scope
of this post.
This is the main limitation of the proposed architecture and the reason why it
is not well-suited for a bigger traffic.
To achieve very high availability for your web site, you must have no single
point of failure in the whole architecture, which can be far from reasonable
from the costs point of view. However, hosting companies can share costs
between their customers and have them benefit from a double network
infrastructure all along the way from the Internet to your web servers,
themselves hosted on two distant locations. You may then choose an even number
of web servers, half of them hosted on each network infrastructure.
The important thing is that you must preserve user sessions. As of CubicWeb
3.10, DB persistent sessions have not been implemented yet (it will soon, there
is a ticket planned for this
functionality), thus you must preserve session cookies by always directing a
given user to the same web server, which is usually achieved by configuring the
load balancer(s) in IP hash mode (it is faster than balancing on the session
cookie, which implies reaching the http stack rather than staying at the TCP/IP level).
Now if you have multi-processor web servers (which is very likely these times)
you will need to use one CubicWeb application instance per processor or the
Python GIL will limit the CPU of your application to a fraction of the available
power. This is pretty easy, you just have to duplicate configuration directories
from /etc/cubicweb.d, changing instance names and ports. You can use a simple
sed-based script to generate these copies automatically and keep them in sync.
Now that we have one instance per processor, the problem of preserving sessions
is back. It can be elegantly solved using Squid,
which can of course deliver cached objects (in particular images, more on this
later), but also listen on several ports and distribute incoming requests evenly
among the CubicWeb instances based on their port of origin. Note that the load
balancer must be set up to balance between ports of the web servers, one port
for each processor. The Squid configuration file to achieve this, looks like:
http_port 81 defaultsite=www.example.org vhost
acl portA myport 81
http_port 82 defaultsite=www.example.org vhost
acl portB myport 82
acl site1 dstdomain www.example.org
cache_peer 127.0.0.1 parent 8081 0 no-query originserver default name=server_1
cache_peer_access server_1 allow portA site1
cache_peer_access server_1 deny all
cache_peer 127.0.0.1 parent 8082 0 no-query originserver default name=server_2
cache_peer_access server_2 allow portB site1
cache_peer_access server_2 deny all
This is a way to setup Squid to listen to ports 81 and 82 and distribute requests
for www.example.org to ports 8081 and 8082 respectively. This way, requests
should be evenly balanced between the processors a on bi-processor web server.
You can now setup Squid more classically to achieve what it is initially done
for: caching. See Squid docs for this, particularly the
refresh_pattern
directive. Note you do not need to force any HTTP cache standard feature in
Squid, as CubicWeb enables you to fine tune caching using simple
HTTPCacheManager classes found in cubicweb/web/httpcache.py (at the end of this
file, you will also find default cache manager configuration for the entity and
startup views).
This is controversial but it did not hurt for me: I like to put an Apache frontend
between Squid and the Twisted-based CubicWeb application, because the hosting
companies are usually pretty good at setting it up, like to use server status for
monitoring, mod_deflate for textual content compression, mod_rewrite and other
modules to customize, monitor or fine tune the web servers.
It can however be argued that Apache is a huge piece of software for such a
restrictive usage, and its memory footprint would be better used for caching.
This is an interesting part that simplifies the overall setup: if you want to
save data on disk, it is likely that you also want to keep it in sync between
the web servers, or use a highly secure network storage solution.
As we already have a data store accessible from the web servers, namely the
database itself, I often choose to use it even for images. This looks like the
nightmare of every sysadmin, but if you make sure the images are not fetched
every second from the database, by using fine tuned cache settings, it will not
hurt. And this way you still benefit from the flexibility of a database and the
easier maintenance of a single data store. We can use CubicWeb cache settings
to allow squid caching images for 1 hour for example. If you have a very dynamic
web site however, you will then need to force a URL change when an image is
edited. This can easily be achieved in CubicWeb using a custom edit controller
that creates a new image when the data attribute of an Image instance was
edited, as illustrated here:
from cubicweb import typed_eid
from cubicweb.selectors import yes
from cubicweb.web.views.editcontroller import EditController
class CustomEditController(EditController):
__select__ = EditController.__select__ & yes()
def handle_updated_image(self, old_eid):
'modify submitted form to change old_eid into a new entity eid in all key/ values'
old_eid = unicode(old_eid)
form = self._cw.form
new_eid = self._cw.varmaker.next()
# handle image eid
del form['__type:%s' % old_eid]
form['__type:%s' % new_eid] = u'Image'
# handle eid list
index = form['eid'].index(old_eid)
form['eid'] = form['eid'][:index] + [new_eid] + form['eid'][index+1:]
# handle attribute and relations
for (k, v) in form.iteritems():
if v == old_eid:
form[k] = new_eid
if k.endswith(u':%s' % old_eid):
form[k[:-len(old_eid)] + new_eid] = v
del form[k]
def _default_publish(self):
# implement image creation when data image was updated, so that we can use
# a far expiry date cache on download view
images = []
for (k, v) in self._cw.form.iteritems():
if v != 'Image' or not k.startswith('__type') or k == self._cw.form['__maineid']:
continue
try:
eid = typed_eid(k[7:])
except ValueError:
continue
if self._cw.form.get('data-subject:%s' % eid, None):
self.handle_updated_image(eid)
images.append(eid)
super(CustomEditController, self)._default_publish()
for eid in images:
self._cw.execute('DELETE Image I WHERE I eid %(eid)s', {'eid': eid})
To add the 1 hour expiry date for image download view, you can use:
from cubicweb.selectors import yes
from cubicweb.web import httpcache
from cubicweb.web.views.idownloadable import DownloadView
class CustomDownloadView(DownloadView):
__select__ = DownloadView.__select__ & yes()
http_cache_manager = httpcache.MaxAgeHTTPCacheManager
cache_max_age = 3600
Hosting companies now often have a pretty good knowledge of PostgreSQL, the
favorite DB back end for CubicWeb. They usually propose to replicate the database
for data safety at a low cost, using PostgreSQL log shipping feature. Note that
new PostgreSQL 9 versions should make it easier to setup replication modes that
could be useful to improve performance and scalability, but there is still a
lack of production level experience for the moment. Please share if you have,
because it is the main issue to deal with to scale up further.
This is worth mentioning you need a pre-production server hosted by the same
company on the same hardware (or virtual machine), because:
- software upgrade will run smoother if the technical staff of the hosting company
has already performed the same upgrade operation once: check the same person
does both within a short timeframe if possible;
- you will feel better if your migration scripts have successfully run on a
fresh copy of the production data: ask for a db copy before a pre-production
upgrade; this is much easier to do if you do not have to copy the database
dumps remotely.
- the pre-production server can host its own database server and the replication
of the production one.
When you experience a web site downtime, it is much too late to take a look at
the available monitoring. It is important to prepare the tools you need to
diagnose a problem, get used to read the graphs and have the orders of
magnitude of the values and their variations in mind.
Even the simplest graphs, like CPU usage, need to be correctly interpreted. In
a recent setup, I did not realize that only one CPU was used on a bi-pro server,
delivering half the power it should... When you cannot access the machine and
use top, you only see the information of the monitoring graphs, so you must
know how to read them !
Apart from the classical CPU, CPU load, (detailed) memory usage, and network
traffic, ask for PostgreSQL, Squid, and Apache specific graphs (plug-ins for them
are easy to find and install for classic monitoring solutions).
For CubicWeb web sites, it is also worth setting up following views and use
them for automatic alerts:
- a software / db version consistency monitoring
- a db pool size monitoring
- a simple db connection check view
- a view writing the server host name is not interesting for automatic alerts but
to see on which server your IP is directed to: this is needed when you do not
reproduce the behaviour the customer is complaining about...
There are some classes I use for these tasks. Feel free to reuse and adapt them
to your needs:
from socket import gethostname
from cubicweb.view import View
class _MonitoringView(View):
__abstract__ = True
__select__ = yes()
content_type = 'text/plain'
templatable = False
class PoolMonitoringView(_MonitoringView):
__regid__ = 'monitor_pool'
def call(self):
repo = self._cw.cnx._repo
max_pool = self._cw.vreg.config['connections-pool-size']
percent = ((max_pool - repo._available_pools.qsize()) * 100.0) / max_pool
self.w(u'%s%%' % percent)
class DBMonitoringView(_MonitoringView):
__regid__ = 'monitor_db'
def call(self):
try:
count = self._cw.execute('Any COUNT(X) WHERE X is CWUser')[0][0]
self.w(u'ServiceOK : %s users in DB' % count)
except:
self.w(u'ServiceKO')
class VersionMonitoringView(_MonitoringView):
__regid__ = 'monitor_version'
def versions_text(self, versions):
return u' | '.join(cube + u': ' + u'.'.join(unicode(x) for x in version)
for (cube, version) in versions)
def call(self):
config = self._cw.vreg.config
vc_config = config.vc_config()
db_config = [('cubicweb', vc_config.get('cubicweb', '?'))]
fs_config = [('cubicweb', config.cubicweb_version())]
for cube in sorted(config.cubes()):
db_config.append((cube, vc_config.get(cube, '?')))
try:
fs_version = config.cube_version(cube)
except:
fs_version = '?'
fs_config.append((cube, fs_version))
db_config = self.versions_text(db_config)
fs_config = self.versions_text(fs_config)
if db_config == fs_config:
self.w(u'ServiceOK : FS config %s == DB config %s' % (fs_config, db_config))
else:
self.w(u'ServiceKO : FS config %s !$ DB config %s' % (fs_config, db_config))
class HostnameMonitoringView(_MonitoringView):
__regid__ = 'monitor_hostname'
def call(self):
self.w(unicode(gethostname()))
There is a sketch of the proposed architecture. Please comment on it and share
your experience on the topic, I would be happy to learn your tips and tricks.
I would conclude with an important remark regarding performance: a good scalable
architecture is of great help to run a busy web site smoothly, however the
performance boost you get by optimizing your software performance is usually
worth it and must be seriously considered before any hardware upgrade, may it
seem costly at first glance.
We'll now see how to benefit from features introduced in 3.9 and 3.10 releases of CubicWeb
OK... Now our site has its most desired features. But... I would like to make it look
somewhat like my website. It is not www.cubicweb.org after all. Let's tackle this
first!
The first thing we can to is to change the logo. There are various way to achieve
this. The easiest way is to put a logo.png file into the cube's data
directory. As data files are looked at according to cubes order (CubicWeb
resources coming last), that file will be selected instead of CubicWeb's one.
As the location for static resources are cached, you'll have to restart
your instance for this to be taken into account.
Though there are some cases where you don't want to use a logo.png file.
For instance if it's a JPEG file. You can still change the logo by defining in
the cube's uiprops.py file:
The uiprops machinery has been introduced in CubicWeb 3.9. It is used to define
some static file resources, such as the logo, default Javascript / CSS files, as
well as CSS properties (we'll see that later).
This file is imported specifically by CubicWeb, with a predefined name space,
containing for instance the data function, telling the file is somewhere
in a cube or CubicWeb's data directory.
One side effect of this is that it can't be imported as a regular python
module.
The nice thing is that in debug mode, change to a uiprops.py file are detected
and then automatically reloaded.
Now, as it's a photos web-site, I would like to have a photo of mine as background...
After some trials I won't detail here, I've found a working recipe explained here.
All I've to do is to override some stuff of the default CubicWeb user interface to
apply it as explained.
The first thing to to get the <img/> tag as first element after the
<body> tag. If you know a way to avoid this by simply specifying the image
in the CSS, tell me! The easiest way to do so is to override the
HTMLPageHeader view, since that's the one that is directly called once
the <body> has been written. How did I find this? By looking in the
cubiweb.web.views.basetemplates module, since I know that global page
layouts sits there. I could also have grep the "body" tag in
cubicweb.web.views... Finding this was the hardest part. Now all I need is
to customize it to write that img tag, as below:
class HTMLPageHeader(basetemplates.HTMLPageHeader):
# override this since it's the easier way to have our bg image
# as the first element following <body>
def call(self, **kwargs):
self.w(u'<img id="bg-image" src="%sbackground.jpg" alt="background image"/>'
% self._cw.datadir_url)
super(HTMLPageHeader, self).call(**kwargs)
def registration_callback(vreg):
vreg.register_all(globals().values(), __name__, (HTMLPageHeader))
vreg.register_and_replace(HTMLPageHeader, basetemplates.HTMLPageHeader)
As you may have guessed, my background image is in a background.jpg file
in the cube's data directory, but there are still some things to explain
to newcomers here:
- The call method is there the main access point of the view. It's called by
the view's render method. It is not the only access point for a view, but
this will be detailed later.
- Calling self.w writes something to the output stream. Except for binary views
(which do not generate text), it must be passed an Unicode string.
- The proper way to get a file in data directory is to use the datadir_url
attribute of the incoming request (e.g. self._cw).
I won't explain again the registration_callback stuff, you should understand it
now! If not, go back to previous posts in the series :)
Fine. Now all I've to do is to add a bit of CSS to get it to behave nicely (which
is not the case at all for now). I'll put all this in a cubes.sytweb.css
file, stored as usual in our data directory:
/* fixed full screen background image
* as explained on http://webdesign.about.com/od/css3/f/blfaqbgsize.htm
*
* syt update: set z-index=0 on the img instead of z-index=1 on div#page & co to
* avoid pb with the user actions menu
*/
img#bg-image {
position: fixed;
top: 0;
left: 0;
width: 100%;
height: 100%;
z-index: 0;
}
div#page, table#header, div#footer {
background: transparent;
position: relative;
}
/* add some space around the logo
*/
img#logo {
padding: 5px 15px 0px 15px;
}
/* more dark font for metadata to have a chance to see them with the background
* image
*/
div.metadata {
color: black;
}
You can see here stuff explained in the cited page, with only a slight modification
explained in the comments, plus some additional rules to make things somewhat cleaner:
- a bit of padding around the logo
- darker metadata which appears by default below the content (the white frame in the page)
To get this CSS file used everywhere in the site, I have to modify the uiprops.py file
introduced above:
STYLESHEETS = sheet['STYLESHEETS'] + [data('cubes.sytweb.css')]
sheet is another predefined variable containing values defined by
already process uiprops.py file, notably the CubicWeb's one.
Here we simply want our CSS in addition to CubicWeb's base CSS files, so we
redefine the STYLESHEETS variable to existing CSS (accessed through the sheet
variable) with our one added. I could also have done:
sheet['STYLESHEETS'].append(data('cubes.sytweb.css'))
But this is less interesting since we don't see the overriding mechanism...
At this point, the site should start looking good, the background image being
resized to fit the screen.
The final touch: let's customize CubicWeb's CSS to get less orange... By simply adding
contextualBoxTitleBg = incontextBoxTitleBg = '#AAAAAA'
and reloading the page we've just seen, we know have a nice greyed box instead of
the orange one:
This is because CubicWeb's CSS include some variables which are
expanded by values defined in uiprops file. In our case we controlled the
properties of the CSS background property of boxes with CSS class
contextualBoxTitleBg and incontextBoxTitleBg.
Boxes present to the user some ways to use the application. Let's first do a few
user interface tweaks in our views.py file:
from cubicweb.selectors import none_rset
from cubicweb.web.views import bookmark
from cubes.zone import views as zone
from cubes.tag import views as tag
# change bookmarks box selector so it's only displayed on startup views
bookmark.BookmarksBox.__select__ = bookmark.BookmarksBox.__select__ & none_rset()
# move zone box to the left instead of in the context frame and tweak its order
zone.ZoneBox.context = 'left'
zone.ZoneBox.order = 100
# move tags box to the left instead of in the context frame and tweak its order
tag.TagsBox.context = 'left'
tag.TagsBox.order = 102
# hide similarity box, not interested
tag.SimilarityBox.visible = False
The idea is to move all boxes in the left column, so we get more space for the
photos. Now, serious things: I want a box similar to the tags box but to handle
the Person displayed_on File relation. We can do this simply by adding a
AjaxEditRelationCtxComponent subclass to our views, as below:
from logilab.common.decorators import monkeypatch
from cubicweb import ValidationError
from cubicweb.web import uicfg, component
from cubicweb.web.views import basecontrollers
# hide displayed_on relation using uicfg since it will be displayed by the box below
uicfg.primaryview_section.tag_object_of(('*', 'displayed_on', '*'), 'hidden')
class PersonBox(component.AjaxEditRelationCtxComponent):
__regid__ = 'sytweb.displayed-on-box'
# box position
order = 101
context = 'left'
# define relation to be handled
rtype = 'displayed_on'
role = 'object'
target_etype = 'Person'
# messages
added_msg = _('person has been added')
removed_msg = _('person has been removed')
# bind to js_* methods of the json controller
fname_vocabulary = 'unrelated_persons'
fname_validate = 'link_to_person'
fname_remove = 'unlink_person'
@monkeypatch(basecontrollers.JSonController)
@basecontrollers.jsonize
def js_unrelated_persons(self, eid):
"""return tag unrelated to an entity"""
rql = "Any F + ' ' + S WHERE P surname S, P firstname F, X eid %(x)s, NOT P displayed_on X"
return [name for (name,) in self._cw.execute(rql, {'x' : eid})]
@monkeypatch(basecontrollers.JSonController)
def js_link_to_person(self, eid, people):
req = self._cw
for name in people:
name = name.strip().title()
if not name:
continue
try:
firstname, surname = name.split(None, 1)
except:
raise ValidationError(eid, {('displayed_on', 'object'): 'provide <first name> <surname>'})
rset = req.execute('Person P WHERE '
'P firstname %(firstname)s, P surname %(surname)s',
locals())
if rset:
person = rset.get_entity(0, 0)
else:
person = req.create_entity('Person', firstname=firstname,
surname=surname)
req.execute('SET P displayed_on X WHERE '
'P eid %(p)s, X eid %(x)s, NOT P displayed_on X',
{'p': person.eid, 'x' : eid})
@monkeypatch(basecontrollers.JSonController)
def js_unlink_person(self, eid, personeid):
self._cw.execute('DELETE P displayed_on X WHERE P eid %(p)s, X eid %(x)s',
{'p': personeid, 'x': eid})
You basically subclass to configure with some class attributes. The fname_*
attributes give the name of methods that should be defined on the json control to
make the AJAX part of the widget work: one to get the vocabulary, one to add a
relation and another to delete a relation. These methods must start by a js_
prefix and are added to the controller using the @monkeypatch decorator. In my
case, the most complicated method is the one which adds a relation, since it
tries to see if the person already exists, and else automatically create it,
assuming the user entered "firstname surname".
Let's see how it looks like on a file primary view:
Great, it's now as easy for me to link my pictures to people than to tag them.
Also, visitors get a consistent display of these two pieces of information.
The ui component system has been refactored in CubicWeb 3.10, which also
introduced the AjaxEditRelationCtxComponent class.
The last feature we'll add today is facet configuration. If you access to the
'/file' url, you'll see a set of 'facets' appearing in the left column. Facets
provide an intuitive way to build a query incrementally, by proposing to the user
various way to restrict the result set. For instance CubicWeb proposes a facet to
restrict based on who created an entity; the tag cube proposes a facet to
restrict based on tags; the zoe cube a facet to restrict based on geographical
location, and so on. In that gist, I want to propose a facet to restrict based on
the people displayed on the picture. To do so, there are various classes in the
cubicweb.web.facet module which simply have to be configured using class
attributes as we've done for the box. In our case, we'll define a subclass of
RelationFacet.
Since that's ui stuff, we'll continue to add code below to our
views.py file. Though we begin to have a lot of various code their, so
it's may be a good time to split our views module into submodules of a view
package. In our case of a simple application (glue) cube, we could start using
for instance the layout below:
views/__init__.py # uicfg configuration, facets
views/layout.py # header/footer/background stuff
views/components.py # boxes, adapters
views/pages.py # index view, 404 view
from cubicweb.web import facet
class DisplayedOnFacet(facet.RelationFacet):
__regid__ = 'displayed_on-facet'
# relation to be displayed
rtype = 'displayed_on'
role = 'object'
# view to use to display persons
label_vid = 'combobox'
Let's say we also want to filter according to the visibility attribute. This is
even simpler as we just have to derive from the AttributeFacet class:
class VisibilityFacet(facet.AttributeFacet):
__regid__ = 'visibility-facet'
rtype = 'visibility'
Now if I search for some pictures on my site, I get the following facets available:
By default a facet must be applyable to every entity in the result set and
provide at leat two elements of vocabulary to be displayed (for instance you
won't see the created_by facet if the same user has created all
entities). This may explain why you don't see yours...
We started to see the power behind the infrastructure provided by the
framework, both on the pure ui (CSS, Javascript) side and on the Python side
(high level generic classes for components, including boxes and facets). We now
have, with a few lines of code, a full-featured web site with a personalized look.
Of course we'll probably want more as time goes, but we can now
concentrate on making good pictures, publishing albums and sharing them with
friends...
|