(#7954) - Connection is reopened, with retry, if automatically closed (idb adapter) #8455

paolosanchi · 2022-02-17T17:20:35Z

The setup process has been moved into the setup() function, so it can be called also in openTransactionSafely()
The function openTransactionSafely has been moved from utils.js into index.js so it has access to the new setup() function and the data in the adapter (cachedDBs and idb variables).
The function openTransactionSafely has been changed to be async (with callback).
Instead of passing the current idb, it's passed the adapter contextualized openTransactionSafely() function.

… closed by the browser

AlbaHerrerias · 2022-02-23T11:15:42Z

Thank you for your contribution. Would you be able to add a test for this?

paolosanchi · 2022-02-23T13:11:31Z

Thank you for your contribution. Would you be able to add a test for this?

I'd like to, and I understand it is necessary, but I'm not sure how.
What I do in debug is to call to idb.close() from inside the openTransactionSafely().
To do it in an automatic test we should add a call to idb.close() (because it is what the browser does: calling pouchdb.close() is not what we want to simulate).
To have a total coverage, should we do replicate all the integration tests adding the idb.close() after the new PouchDb()?
Now, the idb variable is nested in the init() function, and I think is not accessible from the test.

I'm not very skilled on testing, and I don't know how to test this case since it's an inner behavior. Have you an advice or another view on the thing @AlbaHerrerias?

jcoglan · 2022-02-25T16:19:34Z

Hi @paolosanchi, thanks for identifying this issue and working on a fix for it.
I've been reviewing this code and I'd like to see us merge a fix for the bug,
but there are some design issues we'd need to iron out of this that could cause
some control flow issues.

My understanding of this PR is that it aims to handle the fact that
idb.transaction() can fail, if the DB connection has been closed. If that
happens, we want to re-open the connection and resume what we were trying to do.

It might help if I summarise how adapters get started up first -- apologies if
you're already familiar with this but it will benefit other people maintaining
this code.

In the PouchDB constructor, the adapter is selected and its initialiser
function is called. The PouchDB instance contains a task queue;
when the adapter is set up successfully, the queue is marked as ready.
All the public API methods are wrapped in adapterFun(), for example
ths post() method. This wrapper checks the status of the task queue,
and if it's not ready, it defers the execution of the method until the queue
is ready.

This task queue isn't strictly a queue in that it doesn't make functions
execute one at a time. It's just a check to make sure the adapter is set up
before any public API method executes. Once the queue is marked ready, API
methods are allowed to execute concurrently.

Inside the IdbPouch() function, some more queue-based control flow happens. It
uses enqueueTask() to make sure only one instance of init() executes at
a time. This enqueueTask() function creates a global queue, rather than
the per-instance task queue found in PouchDB(). This queue makes sure all
functions execute one at a time, and its purpose is to prevent concurrent calls
to indexedDB.open(), which is a race condition in some browsers.

So this design guarantees a couple of important things. First, the adapter setup
code is executed exactly once, because it's invoked inline by the PouchDB
constructor and is not accessible via other methods. Second, API methods cannot
execute until the adapter setup is complete. Third, the idb setup code
executes at most one instance at a time so we avoid concurrent calls to open
IndexedDB databases, which is unsafe.

The new openTransactionSafely() function breaks some of these guarantees.
In the case where the connection is still open, it works just as before. But if
a retryable exception happens in idb.transaction(), it invokes the setup()
function containing all the adapter setup code that follows opening the DB.
Because it doesn't use either the PouchDB per-instance task queue, or the
idb global task queue to control its execution, it doesn't block public API
methods from executing while setup happens, and it doesn't prevent concurrent
calls to indexedDB.open(). It also breaks some assumptions that the setup code
makes, which are detailed below.

Next there's the question of what the idb setup code, which you've extracted
into the setup() function, actually does. Some of these tasks make sense to do
every time the DB is re-opened, and some only make sense the first time this
happens.

It calls indexedDB.open() to get a new database handle; this is needed for
us to do anything with the database, so we need this on every connection
attempt where idb.transaction() fails.
It stores the open() request in a map against the database name. This
is used in the _destroy() method where it's assumed there is only one
such request per database. This should be done each time a new connection is
opened, but the assumption that only one request exists should be upheld. If
multiple instances of setup() execute concurrently, or if setup() is
executed while the connection is still valid, this assumption is broken.
It runs any pending migrations. In idb, the database version only
changes if PouchDB is upgraded, so this should only be done once per client
at most, not once per connection. It should be a no-op most of the time, but
it's worth skipping if possible.
It points the idb variable at the new connection. This may discard a
reference to a valid connection without doing proper cleanup, if setup() is
allowed to run concurrently.
It registers some event handlers on the connection. This should be done
every time a new connection is made.
It starts a new transaction covering stores it needs to access during
setup. IndexedDB does not reject conflicting transactions, it delays the
execution of requests so that transactions on the same stores do not overlap.
If setup() is executed concurrently, the following tasks will not interleave
but they will be executed multiple times, which might be a mistake.
The "meta store" document is set up, generating a new UUID if the DB
does not already have one. The IDB transaction should make sure this
get()/put() interaction does not interleave and is safe, but this may only
need doing on the first connection, not on reconnections.
The document count is updated. Again, transactions should mean this is
safe, but may be skipped on reconnection.
The browser/database is checked for blob support. Again this is probably
concurrent-safe but can be skipped on reconnection; the code assumes it is
only run once.

A lot of this code uses variables scoped to the IdbPouch() function that might
behave weirdly under concurrent access. The IDB transaction might make sure none
of this code interleaves, but I can't be certain of it. The idb task queue
should be used to make sure this code does not execute concurrently, and
portions that only apply to the first connection attempt should be skipped on
reconnection.

Finally there's a control flow issue with the fact that
openTransactionSafely() takes a callback. In current master, the try only
covers the idb.transaction() call, whereas is now covers the callback
invocation as well. This means if the callback itself throws a synchronous
exception, it may be executed a second time, which will probably produce
strange behaviour and make the control flow confusing. It taking a callback also
means all the code that uses it needs to be indented, so we have a lot of lines
changed making the code harder to review and making conflicts more likely.

It would improve the design if openTransactionSafely() only covered
idb.transaction() and called setup() if necessary, and returned a promise.
This would prevent the continuation being executed twice and would remove the
need to rewrite a lot of lines if we can use await to get the result.

So just to summarise the changes we'd need for this to be safe:

setup() should avoid running code that's only necessary on the first
connection when it's used to reconnect
setup() must not be allowed to be executed concurrently with itself; only
one active call at a time
all public API methods must defer if there is an active setup() call in
progress
openTransactionSafely() must not cause multiple executions of its
continuation, this is most easily done by returning a promise instead of
taking a callback.

paolosanchi · 2022-02-26T11:45:38Z

Thank you very much @jcoglan for the detailed explanation, I wasn't aware of the meaning of the taskqueue in PouchDB(). Now it makes sense to me, also I understand the need of making the "reconnection" concurrent safe.
I'd liked to use promises instead of the "callback design", but I thought it was better to mimic the code I wasn't very familiar with. I was aware that with the callback wrapping many lines of code are changed though (this is why I always use async/await when possible).
I can implement another bugfix based on your directive.
The only doubt I have is on this point of your summary:

all public API methods must defer if there is an active setup() call in
progress

If I defer the execution of the adapter's api methods, I should indented almost all the code of the adapter in something like enqueueTask(), or move it to private functions. The adapter's API interface uses the callback paradigm and it's doesn't return promises, so I can't use async/await to avoid it. All the code would be affected.

What if we move this handling to the openTransactionSafely() function?

The process would be this:

if the connection is lost most of the calls to the API methods would result in multiple InvalidStateError, that can happen concurrently.
in the catched errors we have to reconnect() (only the first error occurrence) and retry() (to openTransactionSafely()).
with the first one of the catched errors we create a lock with a queue, that ensures that the following catches doesn't cause the reconnect(), but only the retry().
all the calls to openTransactionSafely() are enqueued if there is the lock.
when the connection (or setup()) is complete (with or without errors), we retry to open the transaction and unlock the queue.
if the error is not solved, also all the enqued retry should fail.

Can you see any issue whit this?

jcoglan · 2022-03-09T14:11:37Z

I'd liked to use promises instead of the "callback design", but I thought it was better to mimic the code I wasn't very familiar with.

You're right to notice that the public API uses callbacks, and there's a lot of callback use internally owing to this being a long-lived codebase. It's still fine to use promises internally where it makes things easier, and the public API actually supports both callbacks and promises -- adapterFun() adds promise support to functions written using callbacks.

If I defer the execution of the adapter's api methods, I should indented almost all the code of the adapter in something like enqueueTask(), or move it to private functions. The adapter's API interface uses the callback paradigm and it's doesn't return promises, so I can't use async/await to avoid it. All the code would be affected.

All I meant here is that public methods -- those in adapter.js in pouchdb-core need to be deferred until the adapter is ready to handle them. The adapterFun() wrapper takes care of this by using the internal task queue to wait until the PouchDB instance is fully initialised.

It might be possible to change things so that the task queue can be put back in a "deferred" state if the adapter needs to reconnect, but I suspect there's a lot of code that assumes that once the task queue is ready, it never changes state again so this might be risky.

This has reminded me that the newer indexeddb adapter re-opens the database on every API method as part of normal operation so it should be safe for idb to do this also.

I think your proposal for putting all the safety logic in openTransactionSafely() makes sense -- the key thing to ensure here is that if a call triggers an exception, then all further calls get queued up while the DB is being re-openned, so these concurrent calls don't all try to re-open the connection themselves.

A promise is actually a great way of implementing this because you can make sure there's only one active promise representing the DB connection, and then make all calls wait on it. i.e. you'd define a function for opening the DB, that only creates a new connection if dbPromise is not set:

let dbPromise = null

function openDB() {
  if (dbPromise !== null) {
    return dbPromise
  }

  dbPromise = new Promise((resolve, reject) => {
    let openReq = indexedDB.open(dbName)

    // do all the connection setup logic...

    openReq.onsuccess = (event) => {
      resolve(event.target.result)
    }
    openReq.onerror = (event) => {
      reject(event.target.error)
    }
  })

  return dbPromise
}

And then you can use this in openTransactionSafely() to make all calls open the DB if needed, do a transaction normally, then retry if the transaction fails:

function openTransactionSafely(stores, mode) {
  return openDB().then((db) => {
    try {
      return db.transaction(stores, mode)
    } catch (err) {
      dbPromise = null
      return openTransactionSafely(stores, mode)
    }
  })
}

This is a simplification and you'd need to use the "cached DBs" store instead of the single dbPromise variable, and you'd need to limit retries and so on, but this is the general idea. Does that make sense?

paolosanchi · 2022-03-16T10:13:15Z

It might be possible to change things so that the task queue can be put back in a "deferred" state if the adapter needs to reconnect, but I suspect there's a lot of code that assumes that once the task queue is ready, it never changes state again so this might be risky.

I also think that this would not always work, because the api public function can do multiple calls to the adapter api and every adapter function opens a transaction, so the connection can drop in between of adapter multiple calls, and putting the queue back to "deferred" state would not simply work, because the check is done at PouchDb level.

ensure here is that if a call triggers an exception, then all further calls get queued up while the DB is being re-openned

It's actually what I tried to explain, so we agree ;)

I'm going to implement a new solution with all of these new information in mind, It remains the fact that I totally don't know how to make the integrations tests for this case.

(pouchdb#7954) - Connection is reopened, with retry, if automatically…

a58386a

… closed by the browser

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(#7954) - Connection is reopened, with retry, if automatically closed (idb adapter) #8455

(#7954) - Connection is reopened, with retry, if automatically closed (idb adapter) #8455

paolosanchi commented Feb 17, 2022 •

edited

AlbaHerrerias commented Feb 23, 2022

paolosanchi commented Feb 23, 2022 •

edited

jcoglan commented Feb 25, 2022

paolosanchi commented Feb 26, 2022 •

edited

jcoglan commented Mar 9, 2022 •

edited

paolosanchi commented Mar 16, 2022

(#7954) - Connection is reopened, with retry, if automatically closed (idb adapter) #8455

Are you sure you want to change the base?

(#7954) - Connection is reopened, with retry, if automatically closed (idb adapter) #8455

Conversation

paolosanchi commented Feb 17, 2022 • edited

AlbaHerrerias commented Feb 23, 2022

paolosanchi commented Feb 23, 2022 • edited

jcoglan commented Feb 25, 2022

paolosanchi commented Feb 26, 2022 • edited

jcoglan commented Mar 9, 2022 • edited

paolosanchi commented Mar 16, 2022

paolosanchi commented Feb 17, 2022 •

edited

paolosanchi commented Feb 23, 2022 •

edited

paolosanchi commented Feb 26, 2022 •

edited

jcoglan commented Mar 9, 2022 •

edited