Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrent access to one db #21

Open
tbigot opened this issue May 2, 2018 · 3 comments
Open

Concurrent access to one db #21

tbigot opened this issue May 2, 2018 · 3 comments

Comments

@tbigot
Copy link

tbigot commented May 2, 2018

Hi,
I’m using taxadb in a pipeline. I launch a typical job that lasts ~ 5 seconds, with dozens of taxadb calls inside it. But when I launch dozens of these jobs (≥ 30 jobs), one job can last > 10 minutes, performing the same tasks. It could be a simple I/O issue, but since I use a high performance «pure storage» medium, I suspect a lock issue.

Is there a way peewee query the sqlite database in read-only mode, with the guarantee of no lock? Is this already the case? I could make direct sqlite queries in the DB to test this hypothesis, but it’s faster to ask you first. 😃

Thanks in advance for your help!

@HadrienG
Copy link
Owner

HadrienG commented May 4, 2018

Hi!

Peewee is self-locking on read or write by default. The first read creates a shared lock and the first write creates a reserved lock. When taxadb queries its database it shouldn't perform any write operations so several processes should be able to access the database at the same time.

If there is a locking issue it's most likely not peewee's fault but mine, although I don't see anything in the code that could trigger the reserved lock.

I can do some testing and come back to you.

Best,
Hadrien

@tbigot
Copy link
Author

tbigot commented May 4, 2018

Hi Hadrien, and thanks a lot for your reply.

For my part, I’ve tested querying directly the database file (in read only mode), and it doesn’t seem to change the performances. So I guess both peewee and taxadb are harmless.

I’ve discovered the storage I used and which was supposed to be very fast is not compliant with a lot of parallel queries on the same file, I’ve reported that to the ITs at my side. Changing the storage type helped a lot to have decent performances.

Therefore I thing nothing is wrong at your side. It may be interesting to check if there is a way to ensure db connections in read-only mode.

@danieldjewell
Copy link

If you're trying to do concurrent access, SQLite seems like it could be a limiting factor? (SQLite is great and all but it isn't a database server and it never will be...) I don't know for sure, but regardless of storage IO, AFAIK, SQLite can't do some of the more advanced caching/planning that something like MySQL or Postgres can. (Running a taxadb create against a PostgreSQL 12 server is giving ~25 chunks (@500/each) = 12,500 records/sec on the processing of the nucl_gb file -- versus ~5-8 with SQLite on the same hardware. I recognize that's an insert and not a select but...)

Maybe try Postgres? It doesn't have to be on some big DB server - it can run locally. The additional overhead of the server service (which isn't that much) should be minimal compared to the possible performance gain...

It's different than MySQL and is surprisingly straightforward. (MySQL has lots of tuning options and it can take a long time to optimize ... whereas with Postgres, yes there are options but it does seem to work better/faster out of the box.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants