Ticket #187 (assigned defect)

Opened 2 years ago

Last modified 20 months ago

Statement cache not thread-safe?

Reported by: christian.boos Owned by: gh
Priority: high Milestone:
Component: implementation Version: pysqlite-SVN
Severity: critical Keywords: segfault statement cache
Cc:

Description

In Trac, I've enabled connection pooling a while back (see [trac 3830]). That was working great to reduce (even eliminate) the infamous database is locked errors we had (trac:ticket:3446).

However, I should have tested more on Linux :/
On Windows, it's been working fine under heavy tests, for months now.

But on Linux, you very quickly run into the following crash (all it takes are 2 concurrent requests to the timeline):

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1086347616 (LWP 6335)]
0x0000002a98bdae3a in pysqlite_cursor_iternext (self=0x2a9b170628)
    at cursor.c:864
864         rc = _sqlite_step_with_busyhandler(self->statement->st, self->connection);
(gdb) bt
#0  0x0000002a98bdae3a in pysqlite_cursor_iternext (self=0x2a9b170628)
    at cursor.c:864
#1  0x0000002a98bdaef6 in pysqlite_cursor_fetchone (
    self=<value optimized out>, args=<value optimized out>) at cursor.c:882
#2  0x0000002a9570a183 in PyEval_EvalFrame (f=0xc31300) at ceval.c:3542
#3  0x0000002a956b55dc in gen_iternext (gen=0x2a9b248a70) at genobject.c:47
#4  0x0000002a95705630 in PyEval_EvalFrame (f=0xc7aa10) at ceval.c:2121
 .
 .
 .

and:

(gdb) p *self
$1 = {ob_refcnt = 2, ob_type = 0x9008b0, connection = 0x2a97d29e88,
  description = 0x2a9b184050, row_cast_map = 0x2a9b193758, arraysize = 1,
  lastrowid = 0x2a9587db10, rowcount = 0x509598, row_factory = 0x2a9587db10,
  statement = 0x0, next_row = 0x0}

(therefore self->statement->st crashes)

The python frames are:

(gdb) f 2
#2  0x0000002a9570a183 in PyEval_EvalFrame (f=0xc31300) at ceval.c:3542
3542                                    C_TRACE(x, (*meth)(self,NULL));
(gdb) pyframe
/packages/trac/sandbox/blame/trac/db/util.py (41): __iter__
(gdb) f 4
#4  0x0000002a95705630 in PyEval_EvalFrame (f=0xc7aa10) at ceval.c:2121
2121                            x = (*v->ob_type->tp_iternext)(v);
(gdb) pyframe
/packages/trac/sandbox/blame/trac/versioncontrol/cache.py (56): get_changesets

this corresponds to a simple iteration over a result set:

    def get_changesets(self, start, stop):
        cursor = self.db.cursor()
        cursor.execute("SELECT rev FROM revision "
                       "WHERE time >= %s AND time < %s "
                       "ORDER BY time", (to_timestamp(start), to_timestamp(stop)))
        for rev, in cursor:
            if self.authz.has_permission_for_changeset(rev):
                yield self.get_changeset(rev)

(line 56 is the for)

I used pysqlite trunk at current latest (r301).

Change History

Changed 2 years ago by christian.boos

I just rebuilt pysqlite2 at r292 for good measure. At first, things seemed to be more robust, but the crash still happens, only in a different place:

(gdb) f 0
#0  0x0000002a98c1ae39 in cursor_close (self=0x2a9a7c98d0,
    args=<value optimized out>) at cursor.c:966
966             Py_DECREF(self->statement);
(gdb) p *self
$1 = {ob_refcnt = 4, ob_type = 0x924f90, connection = 0x2a98f0bdb0,
  description = 0x2a9b11df10, row_cast_map = 0x2a9561ab90, arraysize = 1,
  lastrowid = 0x2a9587db10, rowcount = 0x509598, row_factory = 0x2a9587db10,
  statement = 0x0, next_row = 0x2a9bbb4610}
(gdb) f 1
#1  0x0000002a9570a183 in PyEval_EvalFrame (f=0xe2eca0) at ceval.c:3542
3542                                    C_TRACE(x, (*meth)(self,NULL));
(gdb) pystack
/packages/trac/sandbox/blame/trac/db/sqlite_backend.py (189): rollback
/packages/trac/sandbox/blame/trac/db/sqlite_backend.py (189): _rollback_on_error
/packages/trac/sandbox/blame/trac/db/sqlite_backend.py (58): execute
/packages/trac/sandbox/blame/trac/db/util.py (51): execute
/packages/trac/sandbox/blame/trac/versioncontrol/cache.py (157): __init__
/packages/trac/sandbox/blame/trac/versioncontrol/cache.py (48): get_changeset
...

the code in __init__ being:

        cursor.execute("SELECT time,author,message FROM revision "
                       "WHERE rev=%s", (rev,))

Changed 23 months ago by theultramage

I can confirm that this also happens on windows (took me hours to set up a debug build...). All that is needed is some cpu load and a few concurrent requests. During normal usage, this makes apache crash cca once per 4 page views.

rc = _sqlite_step_with_busyhandler(self->statement->st, self->connection); Same problem, statement is null.

if (!self->statement) return NULL; This hack seems to prevent it from crashing, 'gonna use it until this gets fixed.

Changed 23 months ago by gh

theultramage, the lacking check you mentioned can produce crashes in single-threaded environments as well. I've just submitted #189 for this problem.

I'll fix this one, then release pysqlite 2.3.3 on Sunday and merge the bugfixes so far into Python core for Python 2.5.1. I'd be glad if you could run your tests with the new version afterwards.

Changed 23 months ago by gh

  • status changed from new to assigned

Changed 22 months ago by christian.boos

With r379, things are a bit better, Trac doesn't crash right away as described above. But when trying to load a dozen pages simultaneously, I get another crash:

(gdb) bt
#0  0x0000002a9856bc84 in sqlite3BtreeCloseCursor (pCur=0x9c2f60)
    at ../sqlite-3.3.8/src/btree.c:2848
#1  0x0000002a9859dc04 in sqlite3VdbeFreeCursor (p=0x9bfed0, pCx=0xba3460)
    at ../sqlite-3.3.8/src/vdbeaux.c:862
#2  0x0000002a9859dd03 in closeAllCursors (p=0x9bfed0)
    at ../sqlite-3.3.8/src/vdbeaux.c:891
#3  0x0000002a9859e05c in sqlite3VdbeHalt (p=0x9bfed0)
    at ../sqlite-3.3.8/src/vdbeaux.c:1287
#4  0x0000002a9859e7b3 in sqlite3VdbeReset (p=0x9bfed0)
    at ../sqlite-3.3.8/src/vdbeaux.c:1451
#5  0x0000002a9858154d in sqlite3_reset (pStmt=0x9bfed0)
    at ../sqlite-3.3.8/src/main.c:1039
#6  0x0000002a98452296 in pysqlite_statement_reset (self=0x2a9a14fb58)
    at src/statement.c:278
#7  0x0000002a984510c5 in pysqlite_cursor_close (self=0x2a9a13dc00,
    args=<value optimized out>) at src/cursor.c:977
#8  0x0000002a9570acf2 in PyEval_EvalFrame (f=0x7b7920) at Python/ceval.c:3552
#9  0x0000002a9570a53b in PyEval_EvalFrame (f=0xa6be50) at Python/ceval.c:3651
#10 0x0000002a9570a53b in PyEval_EvalFrame (f=0xa9d650) at Python/ceval.c:3651
#11 0x0000002a9570a53b in PyEval_EvalFrame (f=0x2a9a36aef0)
    at Python/ceval.c:3651
#12 0x0000002a9570a53b in PyEval_EvalFrame (f=0x2a9a30f0e0)
    at Python/ceval.c:3651
---Type <return> to continue, or q <return> to quit---q
Quit
(gdb) f 6
#6  0x0000002a98452296 in pysqlite_statement_reset (self=0x2a9a14fb58)
    at src/statement.c:278
278             rc = sqlite3_reset(self->st);
(gdb) p *self
$12 = {ob_refcnt = 2, ob_type = 0x2a98556bc0, db = 0x9b0940, st = 0x9bfed0,
  sql = 0x2a98755ed0, in_use = 1, in_weakreflist = 0x2a9a1512b8}
(gdb) f 0
#0  0x0000002a9856bc84 in sqlite3BtreeCloseCursor (pCur=0x9c2f60)
    at ../sqlite-3.3.8/src/btree.c:2848
2848        pBt->pCursor = pCur->pNext;
(gdb) p *pCur
$13 = {pBtree = 0xdaa070, pNext = 0x7b85f0, pPrev = 0x0,
  xCompare = 0x2a9859edf0 <sqlite3VdbeRecordCompare>, pArg = 0x9c0ef0,
  pgnoRoot = 2, pPage = 0x7b8a68, idx = 1, info = {
    pCell = 0x7b8a3e "\024\005\001\027\002\032û147729³\002\02714772",
    nKey = 20, nData = 0, nHeader = 1, nLocal = 20, iOverflow = 0, nSize = 0},
  wrFlag = 1 '\001', eState = 0 '\0', pKey = 0x0, nKey = 0, skip = 0}
(gdb) p *pCur->pBtree
$14 = {pSqlite = 0x0, pBt = 0x121, inTrans = 16 '\020'}

Changed 22 months ago by christian.boos

The above is triggered when I'm about to "rollback" the connection, and when it happens, looks like it's always from within a different thread than the one which was used to create the connection.

Immediately before the actual Connection.rollback, I go through all the existing cursors for that connection, and close them: the crash itself happens during one of the Cursor.close call. That code was necessary for pysqlite versions older than 2.0.5, IIRC. It seems that now it badly interferes somehow, because if I comment that out, the crash goes away (I've got ProgrammingError: library routine called out of sequence errors instead, #T4331, but that's probably another story...).

For reference, the code I'm using: sqlite_backend.py, and the patch for it, which:

  1. re-enables connection pooling on Linux (as everything works fine when connection pooling is disabled, of course)
  2. comments out the closing of cursors
  • trac/db/sqlite_backend.py

     
    146146    """Connection wrapper for SQLite.""" 
    147147 
    148148    __slots__ = ['_active_cursors'] 
    149     poolable = have_pysqlite and os.name == 'nt' and sqlite_version >= 30301 
     149    poolable = have_pysqlite and sqlite_version >= 30301 
    150150 
    151151    def __init__(self, path, params={}): 
    152152        assert have_pysqlite > 0 
     
    184184            return cursor 
    185185 
    186186        def rollback(self): 
    187             for cursor in self._active_cursors.keys(): 
    188                 cursor.close() 
     187            # for cursor in self._active_cursors.keys(): 
     188            #    cursor.close() 
    189189            self.cnx.rollback() 
    190190 
    191191    else: 
Note: See TracTickets for help on using tickets.