One of our next project for cubicweb and its ecosystem is to implement the langserver protocol for the RQL language that we are using to query the data stored in CubicWeb. The langserver protocol is an idea to solve one problem: to integrate operation for various languages, most IDE/tools needs to reimplement the wheel all the time, doing custom plugin etc... To solve this issue, this protocol has been invented with one idea: make one server for a language, then all IDE/tools that talks this protocol will be able to integrate it easily.
So the idea is simple: let's build our own server for RQL so we'll be able to integrate it everywhere and build tools for it.
So this post has several objectives:
- gather people that would be motivate to work on that subject, for now there is Laurent Wouters and me :)
- explain to you in more details (not all) how the language server protocol works
- show what is already existing for both langserver in python and rql
- show the first roadmap we've discussed with Laurent Wouters on how we think we can do that :)
- be a place to discuss this project, things aren't fixed yet :)
So, what is the language server protocol (LSP)?
It's a JSON-RPC based protocol where the IDE/tool talks to the server. JSON-RPC, said simply, is a bi-directional protocol in json.
In this procotol you have 2 kind of exchanges:
- requests: where the client (or server) ask the server (or the server ask the client) something and a reply is expected. For example: where is the definition of this function?
- notifications: the same but without an expected reply. For example: linting information or error detection
The LSP specifications has 3 bigs categories:
- everything about initialization/shutdown the server etc...
- everything regarding text and workspace synchronization between the server and the client
- the actual things that interest us: a list of languages features that the server supports (you aren't in the obligation to implement everything)
Here is the simplified list of possible languages features that the website present:
- Code completion
- Jump to def
- Workspace symbols
- Find references
The specification is much more detailed but way less comprehensive (look at the "language features" on the right menu for more details):
- completion/completion resolve
- hover (when you put your cursor on something)
- declaration (go to...)
- definition (go to...)
- typeDefinition (go to...)
- implementation (go to...)
- documentHighlight (highlight all references to a symbol)
- documentSymbol ("symbol" is a generic term for variable, definitions etc...)
- codeAction (this one is interesting)
- codeLens/codeLens resolve
- documentLink/documentLink resolve
- documentColor/colorPresentation (stuff about picking colors)
- formatting/rangeFormatting/onTypeFormatting (set tab vs space)
(Comments are from my current understanding of the spec, it might not be perfect)
The one that is really interesting here (but not our priority right now) is "codeAction", it's basically a generic entry point for every refactoring kind of operations as some examples from the spec shows:
Example extract actions:
- Extract method
- Extract function
- Extract variable
- Extract interface from class
Example inline actions:
- Inline function
- Inline variable
- Inline constant
Example rewrite actions:
- Add or remove parameter
- Encapsulate field
- Make method static
- Move method to base class
But I'm not expecting us to have direct need for it but that really seems one to keep in mind.
One question that I frequently got was: is syntax highlight included in the langserver protocol? Having double checked with Laurent Wouters, it's actually not the case (I thought documentSymbol could be used for that but actually no).
But we already have an implementation for that in pygments: https://hg.logilab.org/master/rql/file/d30c34a04ebf/rql/pygments_ext.py
What is currently existing for LSP in python and rql
The state is not great in the python ecosystem but not a disaster. Right now I haven't been able to find any generic python implementation of LSP that we could really reuse and integrate.
There is, right now and to my knowledge, only 2 maintained implementation of LSP in python. One for python and one for ... Fortran x)
Palantir's one makes extensive use of advanced magic code doesn't seems really necessary but it is probably of higher quality code since the Fortran one doesn't seems very idiomatic but looks much simpler.
So we'll ever need to extract the needed code from one of those of implement our own, not so great.
On the RQL side, everything that seems to be useful for our current situation is located in the RQL package that we maintain: https://hg.logilab.org/master/rql
After a discussion with Laurent Wouters, a first roadmap looks like this:
- extract the code from either palantir or fortran LSP implementation and come with a generic implementation (I'm probably going to do it but Laurent told me he his going to take a look too) When I'm talking about a generic implementation I'm talking about everything listed in the big category of the protocol that isn't related to language features which we don't really want to rewrite again.
Once that's done, start implementing the language features for RQL:
- the easiest is the syntax errors detection code, we just need to launch to parser on the code and handle the potential errors
- do that with pretty specific red underline
- play with RQL AST to extract the symbols and start doing things like codeLens and hover
- much more complex (and for later): autocompletion (we'll either need a demi compiler or to modify the current one for that)
To better understand the motivation behind this move, it is part of the more global move of drop the "Web" from CubicWeb and replace all the front end current implementation by reactjs+typescript views. In this context CubicWeb (or Cubic?) will only serves as a backend provide with which we will talk in... RQL! Therefor writing and using RQL will be much more important than right now.