Last week, we finally took a few days to dive into SPARQL in
order to transform any CubicWeb application into a potential
SPARQL endpoint.
The first step was to get a parser. Fortunately
the w3c provides a grammar definition and around 200 test
cases. There was a few interesting options around there: we
tried to reuse rdflib, rasqal, the sparql.g version
designed for antlr3 and SimpleParse but after two days of
work, we had nothing that worked well enough. We decided it was
not worth it and switched to yapps since we knew yapps and rql
already had a dependency on it.
Maybe we'll consider changing the parser at some point later but
the priority was to get something working as soon as we could and
we finally came up with a version of fyzz passing 90% of the
W3C test suite (of course, there might be some false positives).
Fyzz parses the SPARQL query and generates something we decided to call an
AST although it's still a bit rough for now. Fyzz understands simple triples,
distincts, limits, offsets and other basic functionalities.
Please note that fyzz is totally independent of cubicweb and it can
be reused by any project.
Here's an example of how to use fyzz:
>>> from fyzz.yappsparser import parse
>>> ast = parse("""PREFIX doap: <http://usefulinc.com/ns/doap#>
... SELECT ?project ?name WHERE {
... ?project a doap:Project;
... doap:name ?name.
... }
... ORDER BY ?name LIMIT 5 OFFSET 10
... """)
>>> print ast.selected
[SparqlVar('project'), SparqlVar('name')]
>>> print ast.prefixes
{'doap': 'http://usefulinc.com/ns/doap#'}
>>> print ast.orderby
[(SparqlVar('name'), 'asc')]
>>> print ast.limit, ast.offset
5 10
>>> print ast.where
[(SparqlVar('project'), ('', 'a'), ('http://usefulinc.com/ns/doap#', 'Project')),
(SparqlVar('project'), ('http://usefulinc.com/ns/doap#', 'name'), SparqlVar('name'))]
This AST is then processed and transformed into a RQL query which
can finally be processed by CubicWeb directly.
Here's what can be done in cubicweb-ctl shell session (of course,
this can also be done in the web application) of our forge
cube:
>>> from cubicweb.spa2rql import Sparql2rqlTranslator
>>> query = """PREFIX doap: <http://usefulinc.com/ns/doap#>
... SELECT ?project ?name WHERE {
... ?project a doap:Project;
... doap:name ?name.
... }
... ORDER BY ?name LIMIT 5 OFFSET 10
... """
>>> qinfo = translator.translate(query)
>>> rql, args = qinfo.finalize()
>>> print rql, args
Any PROJECT, NAME ORDERBY NAME ASC LIMIT 5 OFFSET 10 WHERE PROJECT name NAME, PROJECT is Project {}
From the above example, we can notice two things. First, for
cubicweb to understand the doap namespace, we have to
declare the correspondance between the standard doap vocabulary
and our internal schema, this is done with yams.xy:
>>> from yams import xy
>>> xy.register_prefix('http://usefulinc.com/ns/doap#', 'doap')
>>> xy.add_equivalence('Project', 'doap:Project')
>>> xy.add_equivalence('Project name', 'doap:Project doap:name')
Secondly, for now, we notice that the case is not preserved during the
transformation : ?project becomes PROJECT in the rql query. This
is probably something that we'll need to tackle quickly.
We've also add a few views in CubicWeb to wrap that and it will
be available in the upcoming version 3.4.0 and is already
available through our pulic mercurial repository.
The door is now open, the path is still long, stay tuned !
image under creative commons by beger (original)