Concurrent access to one db #21

tbigot · 2018-05-02T15:50:02Z

Hi,
I’m using taxadb in a pipeline. I launch a typical job that lasts ~ 5 seconds, with dozens of taxadb calls inside it. But when I launch dozens of these jobs (≥ 30 jobs), one job can last > 10 minutes, performing the same tasks. It could be a simple I/O issue, but since I use a high performance «pure storage» medium, I suspect a lock issue.

Is there a way peewee query the sqlite database in read-only mode, with the guarantee of no lock? Is this already the case? I could make direct sqlite queries in the DB to test this hypothesis, but it’s faster to ask you first. 😃

Thanks in advance for your help!

HadrienG · 2018-05-04T05:44:23Z

Hi!

Peewee is self-locking on read or write by default. The first read creates a shared lock and the first write creates a reserved lock. When taxadb queries its database it shouldn't perform any write operations so several processes should be able to access the database at the same time.

If there is a locking issue it's most likely not peewee's fault but mine, although I don't see anything in the code that could trigger the reserved lock.

I can do some testing and come back to you.

Best,
Hadrien

tbigot · 2018-05-04T09:46:27Z

Hi Hadrien, and thanks a lot for your reply.

For my part, I’ve tested querying directly the database file (in read only mode), and it doesn’t seem to change the performances. So I guess both peewee and taxadb are harmless.

I’ve discovered the storage I used and which was supposed to be very fast is not compliant with a lot of parallel queries on the same file, I’ve reported that to the ITs at my side. Changing the storage type helped a lot to have decent performances.

Therefore I thing nothing is wrong at your side. It may be interesting to check if there is a way to ensure db connections in read-only mode.

danieldjewell · 2020-08-12T18:54:44Z

If you're trying to do concurrent access, SQLite seems like it could be a limiting factor? (SQLite is great and all but it isn't a database server and it never will be...) I don't know for sure, but regardless of storage IO, AFAIK, SQLite can't do some of the more advanced caching/planning that something like MySQL or Postgres can. (Running a taxadb create against a PostgreSQL 12 server is giving ~25 chunks (@500/each) = 12,500 records/sec on the processing of the nucl_gb file -- versus ~5-8 with SQLite on the same hardware. I recognize that's an insert and not a select but...)

Maybe try Postgres? It doesn't have to be on some big DB server - it can run locally. The additional overhead of the server service (which isn't that much) should be minimal compared to the possible performance gain...

It's different than MySQL and is surprisingly straightforward. (MySQL has lots of tuning options and it can take a long time to optimize ... whereas with Postgres, yes there are options but it does seem to work better/faster out of the box.)

HadrienG added the performance label May 4, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concurrent access to one db #21

Concurrent access to one db #21

tbigot commented May 2, 2018

HadrienG commented May 4, 2018

tbigot commented May 4, 2018

danieldjewell commented Aug 12, 2020

Concurrent access to one db #21

Concurrent access to one db #21

Comments

tbigot commented May 2, 2018

HadrienG commented May 4, 2018

tbigot commented May 4, 2018

danieldjewell commented Aug 12, 2020