Ticket #126 (closed enhancement: fixed)
Writer contention leads to persistent database lock
| Reported by: | cboos@… | Owned by: | gh |
|---|---|---|---|
| Priority: | high | Milestone: | |
| Component: | implementation | Version: | pysqlite-2.0 |
| Severity: | critical | Keywords: | writer thread lock |
| Cc: |
Description
This ticket describes the problem and a possible solution for the database lock problem PySQLite has when multiple threads attempt to write to the database at the same time.
This writer contention problem has already been described before:
- by Karl Mierle in Argon's Locking page
- on the sqlite mailing list
- see also #124, which links to my original problem in Trac
If one wants to retain some degree of concurrency and not use EXCLUSIVE transactions, the solution would be that all writer threads (but one) perform a "ROLLBACK". By doing so, they would release their SHARED lock and give the remaining thread a chance to acquire the EXCLUSIVE lock.
First, I'll provide a sample test program (attachment:writer_contention.py) which tries to reproduce the problematic scenario:
- thread A and B both acquire the SHARED lock,
- then they both attempt to get a RESERVED lock
- one of them succeeds (A), the other waits (B)
- when A tries to promote its RESERVED lock to an EXCLUSIVE lock, it will first have to wait for B which still has its SHARED lock...
- both wait for each other: "deadlock" situation
The test program also illustrates the fact that with pysqlite-2.0.4, such a deadlock has the undesirable side-effect of leaving the database in a locked state. This prevents any further operation to succeed, even read attempts from other threads will fail.
Then, I propose a way to implement a workaround (attachment:safe_writer_contention.py). This involves writing an intermediate layer, wrapping the Connection and Cursor objects. In the case of a database lock, the exception is intercepted, and a "ROLLBACK" is performed. This can only work if all the existing cursors for that connection are first closed, hence the intermediate layer which keeps track of the active cursors using weakrefs. This could eventually be integrated in the binding itself, later on.
Though this Python-only solution works by itself, pysqlite can also be improved to better handle the situation:
- A first patch, attachment:finalized_statements.patch makes the database recover safely after a writer contention problem, by avoiding to have leaked statements
- A second patch, attachment:reduced_timeout.patch reduces the duration of the timeout used while waiting for a SHARED or RESERVED lock; the thread will only use the full duration of the timeout while waiting for an EXCLUSIVE lock; that way, the write contention can be concluded without having to wait for the full timeout period, only for a fraction of it; this also reduces the risk that the thread attempting to get the EXCLUSIVE lock will hit its timeout even if the other threads are giving up
The results of the test programs, with or without the patches, are summarized in the following table:
| patch | writer_contention.py | safe_writer_contention.py |
| 2.0.4 | FAILED (errors=2) in 6.855s | OK in 2.318s |
| 2.0.4 + finalized_statements | FAILED (failures=1) in 2.320s | OK 2.317s |
| 2.0.4 + reduced_timeout | FAILED (errors=2) in 3.571s | OK in 0.679s |
| 2.0.4 + combined | FAILED (failures=1) in 2.320s | OK in 0.680s |
One can see that:
- with the first patch, the errors disappear; those errors were database lock exception when the main thread attempted to read or write to the database after the 2 writer threads were gone
- with the second patch, one can see that the writer contention issue is solved much quicker (0.7 instead of 2.3 seconds, timeout was 2.0 seconds)
The attachment:combined.patch is the merge of the two other patches. All the patches apply on 2.0.4 or on source:trunk@913 after the cleanup in #125 has been applied.
