cubicweb #2503918 Rework Repository Access API [validation pending]

so it will become easier to handle session across several repositories.

Short description of the proposed changes

Current status

The current architecture will receive a major rework. The current architecture basically looks like:

SERVER SIDE                               DBAPI SIDE

/---------------\ 1                     ? /---------------\
| Session       |-------------------------| Connection    |
\---------------/                         \---------------/
        | 1                                      | 1
        |  (dispatch on thread name)             |  (dispatch on thread name)
        | *                                      | *
/---------------\                         /---------------\
|TransactionData|                         |    Request    |
\---------------/                         |       +       |
        | ?                               |    Cursors    |
        |                                 \---------------/
        | 1
| Cnx Set       |

In practice two objects are actually used, server-side Session and client-side Request.

One major issue is that the current implementation of this chain depends on a thread based dispatching to end up using the right TransactionData object.

Another important issue is that the so-called DBAPI has hardly anything to do with Python's stdlib DBAPI.

Short-term solution

The new proposal is to deprecate cubicweb.dbapi in favor of the following scheme:

SERVER SIDE                               CLIENT SIDE

| Session       |
        | 1
        | *
/---------------\ 1                     ? /---------------\
|  Connection   |-------------------------|    Request    |
\---------------/                         \---------------/
        | ?
        | 1
| Cnx Set       |

In this solution:

Session:becomes a long lived object holding credentials (user) and long living data (session_data).
Connection:is the main entry point on the server side. It allows all necessary read and write server side operations. It is performing at most one database transaction at any given time but can be used for multiple sequential transactions. A Connection has explicit start and end. The name Transaction is not used for this object as multiple Transaction can occur. Naming it Connection is preferred since it matches how such objects are named in a standard database context.
Request:now directly references a server-side Connection object.

Long-term solution

At some point in the future we will want to split the Request object in two. One object dedicated to Cubicweb repository operations ClientConnection, and one related to http request processing WebRequest:

SERVER SIDE                               CLIENT SIDE

| Session       |
        | 1
        | *
/---------------\ 1                     ? /------------------\
|  Connection   |-------------------------| ClientConnection |
\---------------/                         \------------------/
        | ?                                       | 1
        |                                         |
        | 1                                       | ?
/---------------\                         /---------------\
| Cnx Set       |                         | WebRequest    |
\---------------/                         \---------------/

In distant future ClientConnection may be dropped in favor of direct usage of Connection.

The reason for this is that the solution using only one Request object imply that this latter is responsible for both database access and for the Web resquest managements, which are 2 distinct roles that are not meant to be merged in a single object. Thus, this is not a strong requirement, but would improve the global architecture and allow a cleaner API.

Underlying issue we are trying to solve

(We use the terminology from the proposed API)

Intricated Session, Connection and Transaction

Mixed Session/Connection/Transaction makes it hard to have persistent sessions shared between multiple processes.

Cubicweb is currently confined to a single process. We want to be able to run multiple processes for a single cubicweb instance. For this purpose, we need to share sessions between multiple processes.

For example WSGI server use multiple short lived process to serve web page. We want to be able to share a session between all worker processes and have them persisted a worker restart.

Currently the Session object does a lot of work including handling in-progress transaction and database access.

We want to slim the Session class to its core role:

  • User (privilege) for permission checking,
  • last time access, for garbage collection,
  • session-wide data.

For this purpose we are extracting the database access logic into a Connection object, and we delegate transaction management at the process level.

Non-explicit life cycle

Life spawn of Connection/Transaction is implicit and hard to control and bound to a specific context.

Creation of a new transaction is implicit. When you ask for a Transaction you either get it or create it. By default, one asks for the default Transaction of the current thread (created on-the-fly if None). So when using the session to access the database, one do not know wether a new transaction is created (thus started) or not.

Moreover, the Session code tries to delete the transaction as soon as possible in order to have as short transactions as possible to free connection sets (free_cnx_set). As a result, the current implicit transaction may be purged after any read or commit/rollback operation. All associated data is then lost. Context managers that control hooks or security behavior use a trick to prevent the Transaction to be freed until they __exit__. The behavior altering information being hold by the TransactionData object.

When you need to use the same transaction, you can either get the same object than before (because it did not died yet for some reason) or create a new one.

So you can try to "close" a transaction by calling:

session.set_tx(<txid>) # may recreate the transaction
session.rollback()     # will kill it if no context manager is going.

The user code makes heavy use of the current behavior and does not try to control transaction life span at all. This means that moving to a clean begin/end architecture requires a refactoring of a significant part of the code.

Thread isolation

Session automatic thread isolation is confusing at best.


Thread-based dispatching

(DBPAPI)Connection use magic thread dispatching too


Inconsistent handling of cnxset (server Session vs. DBAPI)

DBAPI automatically handle cnxset, (server) session do not

There are three core functions on the objects used to access the database (namely: server.Session, dbapi.Connection, dbapi.Request; soon: Connection):

  • execute,
  • rollback,
  • commit.

These 3 functions need to access to a ConnectionSet to do actual talking to the database. Any other method on the API of RequestSessionBase either end up calling one of those methods or does not need a cnxset.

On the client side, (dbapi.Connection, dbapi.Request) you can blindly call those methods: they just work. A cnxset is automatically acquired and freed if possible.

However this is not the case on the server.Session. You need to manually acquire a cnx_set and make sure to free. As it is automatically freed on commit/rollback, you most of the time have to make sure a cnx_set is hold (reacquire it).

Doing so manually is complex, cumbersome, and has no real justification.

Why this is the case is unclear for now. It is suspected that it has been seen as a way to enforce control on the transaction life span.

How the new API works

In the new API:

Session are only responsible for holding credential and long lived data:

user = repo.authenticate(login, password)
session = Session(repo, user) # prototype not contractual
session.session_data['Counselor'] = 'Cornelius'

The repo is responsible for authentification only. Managing already open sessions is the responsability of web.application.SessionManager used by the web application.

Session can be explicitly closed. They also automatically timeout (as they do now).

Database connection

The session is used to create Connection object, which are objects actually used to connect to the database:

with Connection(session) as cnx:
    cnx.cnx_data['Host'] = 'Santa Claus'
    with bfss_import(cnx): # stored in cnx.cnx_data
        cnx.execute('Insert Elephant B, B name "Babar"')
        cnx.execute('Insert Elephant C, C name "Celestine", C spouse B WHERE B
    cnx.execute('SET B king EK WHERE EK name "Babar"')

Connection is never running more than one database transaction at a given time. However a unique Connection can be used for multiple transactions sequentially. Connection handle its cnx_set automatically as the DBAPI used to.

Connection holds transaction-level data and may hold a new Connection level data. A ticket is dedicated to this topic #2912807.

Connection have a clear begining and end. People are encouraged to use context manager but open/close (or begin/end?) function will be probably available.

Connection will probably NOT close automaticaly on garbage collection to prevent garbage collection issue with object having a __del__ method.

Connection can not be created from a closed session. But it's not clear wether we want to force closing an already open Connection when closing the session.

Web request

Serving a web request requires a connection to the cubicweb repository. But the WebRequest object live span is greater than its connection to the Cubicweb database:

class WebApplication(object):
    def handle_request(self, web_request):
        session = get_session(web_request) # read credential from http data
        with ClientConnection(session) as cnx:
            # web stuff
        # finalize http reply

This means that each request has its own connection that live just for the duration of the request.

At some point we will want to have the two concepts involved in the WebRequest API properly splitted.

  • appobject._cw is the ClientConnection responsible for accessing the database.
  • appobject.web is the WebRequest responsible for managin http input/output.

Impact on current codebase

  • DBAPI is not used by the new schemes, meaning that we will be able to drop it as we know it as soon as every remaining use case has been refactored not to use it anymore.
  • Remote access to the repo must be reworked.
  • There should be only minor impact in WebApplication.
done in3.19.0
load left0.000
closed by<not specified>
patch[repository] add an ``internal_cnx`` method to replace ``internal_session`` [applied][server/session] add a login property [applied][testlib] move repo and related attribute back on Instance instead of Class [applied][cwuser] make CWUser callable, returning self for dbapi compatibility [applied][client-connect] drop rqlst on rset returned client side [applied][client-connection] add a repo property for dbapi compatibility [applied][client-connection] add a connection property for dbapi compatibility [applied][client-connection] add a sessionid property for dbapi compatibility [applied][client-connection] add a cursor() method for dbapi compatibility [applied][client-connection] add a request() method for dbapi compatibility [applied][repoapi] introduce a basic ClientConnection class [applied][server/session] allow access to session id using sessionid [applied][dbapi] move ProgrammingError into cubicweb module [applied][server/session] Implement anonymous_session [applied][repoapi] move get_repository function into a new repoapi module [applied][webrequest] simplify set_session code [applied][webrequest] use a DBAPISession without [applied][request] drop the user argument for set_session [applied][web-request] handle default language earlier [applied]rename server.session.transaction into server.session.connection [applied][application] call req.set_session in application.main_handle_request [applied][session-handler] use session directly to update last usage [applied][application/connect] simplify connection logic [applied][transaction] move security control logic on Transaction [applied][transaction] move hook control logic on Transaction [applied][transaction] reinstall as tx.transaction_data [applied][transaction] give access to is_internal_session boolean [applied]