Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(#7954) - Connection is reopened, with retry, if automatically closed (idb adapter) #8455

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

paolosanchi
Copy link
Contributor

@paolosanchi paolosanchi commented Feb 17, 2022

  • The setup process has been moved into the setup() function, so it can be called also in openTransactionSafely()
  • The function openTransactionSafely has been moved from utils.js into index.js so it has access to the new setup() function and the data in the adapter (cachedDBs and idb variables).
  • The function openTransactionSafely has been changed to be async (with callback).
  • Instead of passing the current idb, it's passed the adapter contextualized openTransactionSafely() function.

@AlbaHerrerias
Copy link
Contributor

Thank you for your contribution. Would you be able to add a test for this?

@paolosanchi
Copy link
Contributor Author

paolosanchi commented Feb 23, 2022

Thank you for your contribution. Would you be able to add a test for this?

I'd like to, and I understand it is necessary, but I'm not sure how.
What I do in debug is to call to idb.close() from inside the openTransactionSafely().
To do it in an automatic test we should add a call to idb.close() (because it is what the browser does: calling pouchdb.close() is not what we want to simulate).
To have a total coverage, should we do replicate all the integration tests adding the idb.close() after the new PouchDb()?
Now, the idb variable is nested in the init() function, and I think is not accessible from the test.

I'm not very skilled on testing, and I don't know how to test this case since it's an inner behavior. Have you an advice or another view on the thing @AlbaHerrerias?

@jcoglan
Copy link
Contributor

jcoglan commented Feb 25, 2022

Hi @paolosanchi, thanks for identifying this issue and working on a fix for it.
I've been reviewing this code and I'd like to see us merge a fix for the bug,
but there are some design issues we'd need to iron out of this that could cause
some control flow issues.

My understanding of this PR is that it aims to handle the fact that
idb.transaction() can fail, if the DB connection has been closed. If that
happens, we want to re-open the connection and resume what we were trying to do.

It might help if I summarise how adapters get started up first -- apologies if
you're already familiar with this but it will benefit other people maintaining
this code.

This task queue isn't strictly a queue in that it doesn't make functions
execute one at a time. It's just a check to make sure the adapter is set up
before any public API method executes. Once the queue is marked ready, API
methods are allowed to execute concurrently.

Inside the IdbPouch() function, some more queue-based control flow happens. It
uses enqueueTask() to make sure only one instance of init() executes at
a time. This enqueueTask() function creates a global queue, rather than
the per-instance task queue found in PouchDB(). This queue makes sure all
functions execute one at a time, and its purpose is to prevent concurrent calls
to indexedDB.open(), which is a race condition in some browsers.

So this design guarantees a couple of important things. First, the adapter setup
code is executed exactly once, because it's invoked inline by the PouchDB
constructor and is not accessible via other methods. Second, API methods cannot
execute until the adapter setup is complete. Third, the idb setup code
executes at most one instance at a time so we avoid concurrent calls to open
IndexedDB databases, which is unsafe.

The new openTransactionSafely() function breaks some of these guarantees.
In the case where the connection is still open, it works just as before. But if
a retryable exception happens in idb.transaction(), it invokes the setup()
function containing all the adapter setup code that follows opening the DB.
Because it doesn't use either the PouchDB per-instance task queue, or the
idb global task queue to control its execution, it doesn't block public API
methods from executing while setup happens, and it doesn't prevent concurrent
calls to indexedDB.open(). It also breaks some assumptions that the setup code
makes, which are detailed below.

Next there's the question of what the idb setup code, which you've extracted
into the setup() function, actually does. Some of these tasks make sense to do
every time the DB is re-opened, and some only make sense the first time this
happens.

  • It calls indexedDB.open() to get a new database handle; this is needed for
    us to do anything with the database, so we need this on every connection
    attempt where idb.transaction() fails.

  • It stores the open() request in a map against the database name. This
    is used in the _destroy() method where it's assumed there is only one
    such request per database. This should be done each time a new connection is
    opened, but the assumption that only one request exists should be upheld. If
    multiple instances of setup() execute concurrently, or if setup() is
    executed while the connection is still valid, this assumption is broken.

  • It runs any pending migrations. In idb, the database version only
    changes if PouchDB is upgraded, so this should only be done once per client
    at most, not once per connection. It should be a no-op most of the time, but
    it's worth skipping if possible.

  • It points the idb variable at the new connection. This may discard a
    reference to a valid connection without doing proper cleanup, if setup() is
    allowed to run concurrently.

  • It registers some event handlers on the connection. This should be done
    every time a new connection is made.

  • It starts a new transaction covering stores it needs to access during
    setup. IndexedDB does not reject conflicting transactions, it delays the
    execution of requests so that transactions on the same stores do not overlap.
    If setup() is executed concurrently, the following tasks will not interleave
    but they will be executed multiple times, which might be a mistake.

  • The "meta store" document is set up, generating a new UUID if the DB
    does not already have one. The IDB transaction should make sure this
    get()/put() interaction does not interleave and is safe, but this may only
    need doing on the first connection, not on reconnections.

  • The document count is updated. Again, transactions should mean this is
    safe, but may be skipped on reconnection.

  • The browser/database is checked for blob support. Again this is probably
    concurrent-safe but can be skipped on reconnection; the code assumes it is
    only run once.

A lot of this code uses variables scoped to the IdbPouch() function that might
behave weirdly under concurrent access. The IDB transaction might make sure none
of this code interleaves, but I can't be certain of it. The idb task queue
should be used to make sure this code does not execute concurrently, and
portions that only apply to the first connection attempt should be skipped on
reconnection.

Finally there's a control flow issue with the fact that
openTransactionSafely() takes a callback. In current master, the try only
covers the idb.transaction() call, whereas is now covers the callback
invocation as well
. This means if the callback itself throws a synchronous
exception, it may be executed a second time, which will probably produce
strange behaviour and make the control flow confusing. It taking a callback also
means all the code that uses it needs to be indented, so we have a lot of lines
changed making the code harder to review and making conflicts more likely.

It would improve the design if openTransactionSafely() only covered
idb.transaction() and called setup() if necessary, and returned a promise.
This would prevent the continuation being executed twice and would remove the
need to rewrite a lot of lines if we can use await to get the result.

So just to summarise the changes we'd need for this to be safe:

  • setup() should avoid running code that's only necessary on the first
    connection when it's used to reconnect

  • setup() must not be allowed to be executed concurrently with itself; only
    one active call at a time

  • all public API methods must defer if there is an active setup() call in
    progress

  • openTransactionSafely() must not cause multiple executions of its
    continuation, this is most easily done by returning a promise instead of
    taking a callback.

@paolosanchi
Copy link
Contributor Author

paolosanchi commented Feb 26, 2022

Thank you very much @jcoglan for the detailed explanation, I wasn't aware of the meaning of the taskqueue in PouchDB(). Now it makes sense to me, also I understand the need of making the "reconnection" concurrent safe.
I'd liked to use promises instead of the "callback design", but I thought it was better to mimic the code I wasn't very familiar with. I was aware that with the callback wrapping many lines of code are changed though (this is why I always use async/await when possible).
I can implement another bugfix based on your directive.
The only doubt I have is on this point of your summary:

  • all public API methods must defer if there is an active setup() call in
    progress

If I defer the execution of the adapter's api methods, I should indented almost all the code of the adapter in something like enqueueTask(), or move it to private functions. The adapter's API interface uses the callback paradigm and it's doesn't return promises, so I can't use async/await to avoid it. All the code would be affected.

What if we move this handling to the openTransactionSafely() function?

The process would be this:

  • if the connection is lost most of the calls to the API methods would result in multiple InvalidStateError, that can happen concurrently.
  • in the catched errors we have to reconnect() (only the first error occurrence) and retry() (to openTransactionSafely()).
  • with the first one of the catched errors we create a lock with a queue, that ensures that the following catches doesn't cause the reconnect(), but only the retry().
  • all the calls to openTransactionSafely() are enqueued if there is the lock.
  • when the connection (or setup()) is complete (with or without errors), we retry to open the transaction and unlock the queue.
  • if the error is not solved, also all the enqued retry should fail.

Can you see any issue whit this?

@jcoglan
Copy link
Contributor

jcoglan commented Mar 9, 2022

I'd liked to use promises instead of the "callback design", but I thought it was better to mimic the code I wasn't very familiar with.

You're right to notice that the public API uses callbacks, and there's a lot of callback use internally owing to this being a long-lived codebase. It's still fine to use promises internally where it makes things easier, and the public API actually supports both callbacks and promises -- adapterFun() adds promise support to functions written using callbacks.

If I defer the execution of the adapter's api methods, I should indented almost all the code of the adapter in something like enqueueTask(), or move it to private functions. The adapter's API interface uses the callback paradigm and it's doesn't return promises, so I can't use async/await to avoid it. All the code would be affected.

All I meant here is that public methods -- those in adapter.js in pouchdb-core need to be deferred until the adapter is ready to handle them. The adapterFun() wrapper takes care of this by using the internal task queue to wait until the PouchDB instance is fully initialised.

It might be possible to change things so that the task queue can be put back in a "deferred" state if the adapter needs to reconnect, but I suspect there's a lot of code that assumes that once the task queue is ready, it never changes state again so this might be risky.

This has reminded me that the newer indexeddb adapter re-opens the database on every API method as part of normal operation so it should be safe for idb to do this also.

I think your proposal for putting all the safety logic in openTransactionSafely() makes sense -- the key thing to ensure here is that if a call triggers an exception, then all further calls get queued up while the DB is being re-openned, so these concurrent calls don't all try to re-open the connection themselves.

A promise is actually a great way of implementing this because you can make sure there's only one active promise representing the DB connection, and then make all calls wait on it. i.e. you'd define a function for opening the DB, that only creates a new connection if dbPromise is not set:

let dbPromise = null

function openDB() {
  if (dbPromise !== null) {
    return dbPromise
  }

  dbPromise = new Promise((resolve, reject) => {
    let openReq = indexedDB.open(dbName)

    // do all the connection setup logic...

    openReq.onsuccess = (event) => {
      resolve(event.target.result)
    }
    openReq.onerror = (event) => {
      reject(event.target.error)
    }
  })

  return dbPromise
}

And then you can use this in openTransactionSafely() to make all calls open the DB if needed, do a transaction normally, then retry if the transaction fails:

function openTransactionSafely(stores, mode) {
  return openDB().then((db) => {
    try {
      return db.transaction(stores, mode)
    } catch (err) {
      dbPromise = null
      return openTransactionSafely(stores, mode)
    }
  })
}

This is a simplification and you'd need to use the "cached DBs" store instead of the single dbPromise variable, and you'd need to limit retries and so on, but this is the general idea. Does that make sense?

@paolosanchi
Copy link
Contributor Author

It might be possible to change things so that the task queue can be put back in a "deferred" state if the adapter needs to reconnect, but I suspect there's a lot of code that assumes that once the task queue is ready, it never changes state again so this might be risky.

I also think that this would not always work, because the api public function can do multiple calls to the adapter api and every adapter function opens a transaction, so the connection can drop in between of adapter multiple calls, and putting the queue back to "deferred" state would not simply work, because the check is done at PouchDb level.

ensure here is that if a call triggers an exception, then all further calls get queued up while the DB is being re-openned

It's actually what I tried to explain, so we agree ;)

I'm going to implement a new solution with all of these new information in mind, It remains the fact that I totally don't know how to make the integrations tests for this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants