Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create abstract interface #61

Open
reece opened this issue May 29, 2019 · 9 comments
Open

Create abstract interface #61

reece opened this issue May 29, 2019 · 9 comments
Milestone

Comments

@reece
Copy link
Member

reece commented May 29, 2019

seqrepo currently has distinct backends for aliases and sequences. Both are essentially key-value stores.

In order to open up other possibilities (redis, elasticache, federated repo proxy, local cache, etc.), implement abstract classes and subclass for each backend type.

@reece reece added this to To do in biocommons roadmap Jun 29, 2019
@teemuvesala
Copy link
Contributor

Does this mean that we'll be able to move data off from SQLite3? If that's the case, then I can help you with this to speed up the development. That's part of our requirement and I'm just investigating the possibility for that.

@teemuvesala
Copy link
Contributor

For me the code with DB management seems to be really straightforward and uses mostly Python's DB API. My notes and first review points following non standard DB stuff:

  • Initialisation
  • Couple places where integrity errors are ignored by passing the sqlite3 specific exception. Why they are ignored? Can this cause missing data from database?

Of course if you want more than "select any SQL database" kind of abstraction, then the code requires a lot more work. But in our case it would be just enough to

  1. Get SQL dump from pulled SQLite3 database
  2. Export it to the PostgreSQL or some other database.
  3. Make library SQL database engine agnostic by accepting the DB object at initialisation parameters.

@teemuvesala
Copy link
Contributor

And after reading it even better it seems that SQLite3 module is used non-standard way. It provides short cut to avoid separate (visible) cursor object. That shortcut is used everywhere. The DB-API 2.0 requires separate use of cursor object. So e.g following is illegal:

this._db.execute("<sql query>").fetchone()

This should be:

cursor = this._db.cursor()
cursor.execute("<sql query>")
cursor.fetchone()

@reece
Copy link
Member Author

reece commented Jul 2, 2019

@teemuvesala...

Re: Yes, the intention for this issue is to refactor the current sqlite-specific backend into a service interface with multiple implementations (one of which is sqlite).

Can you please explain elaborate on your requirements? I would certainly appreciate code contributions if we're aligned on direction.

Are you imagining putting sequences in postgresql too? If so, do you have plans for optimizing sequence slicing (e.g., reading a few nucleotides from a chromosome)?

Two of your suggestions should be undertaken anyway: 1) db as init arg, and 2) deconstruct the execute().fetchone() construct. Let's save (1) for the refactor (since it'll come up then anyway). Want to submit a PR for (2)?

Can you elaborate on your comment about exceptions being ignored? Examples will help.

@teemuvesala
Copy link
Contributor

Our current requirement (which is almost instant) is to replace the SQLite with PostgreSQL, but long term plan is to get DynamoDB support.

So from roadmap perspective good idea would be to 1) Remove SQLite specific code (which is your step 2). I've started it and I'll do. 2) Add support for any Python module which is DB API 2.0 compatible. 3) Add non-SQL-way to do things.

File biocommons.seqrepo/src/biocommons/seqrepo/seqaliasdb/seqaliasdb.py has following snippet at function store_alias:

try:
    c = self._db.execute(SQL statement removed for simplicity)
    # success => new record
    return c.lastrowid
except sqlite3.IntegrityError:
    pass

The problem with this snippet is that exception is SQLite specific. But this follows the DB API 2.0 specification correctly. Now if we think that _db is now some other (e.g. imaginarydb) driver, the exception would be imaginarydb.IntegrityError. The only common parent class is Exception. So to support this non-SQLite3-way we'd have to check if the class name of exception has IntegrityError, if it has, then ignore the error, otherwise rise the caught exception.

@teemuvesala
Copy link
Contributor

Now at my repository is the code which can use also PostgreSQL for database. It needed quite many changes. E.g. parametrisation required changes. At SQLIte the style is SELECT * FROM table WHERE id=?, but with Psycopg2 it's SELECT * FROM table WHERE id=%s. It works at read only mode well enough for our use.

@reece
Copy link
Member Author

reece commented Aug 3, 2023

The intended class hierarchy is shown here:

Image

@reece reece changed the title Create abstract backend classes and adapt current backends Create abstract interface Aug 19, 2023
@github-actions
Copy link

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the stale Issue is stale and subject to automatic closing label Sep 26, 2023
@github-actions
Copy link

github-actions bot commented Oct 3, 2023

This issue was closed because it has been stalled for 7 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 3, 2023
biocommons roadmap automation moved this from Next Up to Done Oct 3, 2023
@reece reece removed the stale Issue is stale and subject to automatic closing label Nov 27, 2023
@reece reece reopened this Nov 27, 2023
biocommons roadmap automation moved this from Done to In progress Nov 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
biocommons roadmap
  
In progress
Development

No branches or pull requests

2 participants