Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Unify SQLTable code for fallback and SQLAlchemy mode and move differences into database class #8562

Closed
wants to merge 2 commits into from

Conversation

artemyk
Copy link
Contributor

@artemyk artemyk commented Oct 15, 2014

In discussion of the SQL API in #7960 , it has been suggested that it may be possible to factor out differences in SQLAlchemy and fallback backend . This is an initial attempt to do so. Now SQLTable is uncoupled from the backend. PandasSQL is base class for SQLDatabase (SQLAlchemy backend) and SQLiteDatabase (fallback backend). This allows for the removal of some duplicate code.

@jorisvandenbossche
Copy link
Member

@artemyk Thanks for starting this! I don't have time at the moment, but will try to look at it later next week (in any case, this has to wait until 0.15 is released)

@jreback jreback added IO SQL to_sql, read_sql, read_sql_query API Design labels Oct 25, 2014
@jorisvandenbossche
Copy link
Member

@artemyk Coming back to this, sorry for the delay.

I am looking at it, but to start, could you make a short schematic overview of what now belongs to which class (roughly the concepts)? That maybe eases the discussion.

Furter, I think we should try to look at some of the usage cases mentioned in #7960, to see how they match with this PR.

@artemyk
Copy link
Contributor Author

artemyk commented Nov 17, 2014

@jorisvandenbossche OK, going back to this and remind myself / reworking a bit more.

The main motivation here is:

  • To pull out (from SQLTable and SQLiteTable) and unify logical code that does not depend on the kind of DB backend (in a new SQLTable) from that which does depend on DB backend.
  • Provide an API for writing new backends
  • Simplify and rename some internal things to make sql.py easier to follow

The constructor of the reworked SQLTable takes a pandas_sql_engine parameter which should be an instance of SQLBackend (right now there are two subclasses, SQLDatabase and SQLiteDatabase).

Any instance of SQLBackend should implement the following methods:

has_table
drop_table
run_transaction (returns contextmanager)
execute
read_query
to_sql

get_backend_table_object
create_statement
create_table
insert_data

The first 6 are pretty straightforward, while the last 4 are a bit novel. get_backend_table_object returns a 'backend table object', while create_statement, create_table and insert_table consume a 'backend table object'. In the SQLAlchemy case, this is simply a SQLAlchemy table object. In the legacy case, this is an instance of SQLiteBackendTable (which is basically a stub class that keeps in state the table's name, dataframe, specified keys and indices).

There's a lot of code changes here but functionally it actually ends up being not all that different from what was before. Maybe a bit cleaner? Not sure, what do you think?

Comment fix

Class hierarchy reorg

Further refactoring

Simplify SQLiteBackendTable

Fixes

Fixes
@artemyk artemyk changed the title Unify SQLTable code for fallback and SQLAlchemy mode and move differences into database class API: Unify SQLTable code for fallback and SQLAlchemy mode and move differences into database class Nov 17, 2014
@jreback
Copy link
Contributor

jreback commented May 9, 2015

status of this?

@artemyk
Copy link
Contributor Author

artemyk commented May 10, 2015

@jreback Not sure, there does not seem to be much momentum behind these changes to unify the SQL API.
@jorisvandenbossche Maybe you can take a look and move toward merging if you think they are worthwhile, otherwise let's close the PR without merging.

#9960 is relevant too.

@jorisvandenbossche
Copy link
Member

It is mainly a time constraint I think. We need some more eyes to this, but haven't come around this.

(but problem is the longer we wait, the more difficult it will become to change, as people start using the existing classes).

@jorisvandenbossche jorisvandenbossche added this to the Next Major Release milestone May 14, 2015
@artemyk
Copy link
Contributor Author

artemyk commented May 15, 2015

@jorisvandenbossche One thing to think about is that this kind of design --- where logic and DB-specific things are separated as much as possible in different classes --- is that it may make it easier to extend pandas to allow for niche DB options / cases.

One possibility I've been thinking about is the following: make a SQLConnection class (similar to SQLBackend class in this PR) that handles db-related things. Allow to_sql and other functions that take an engine parameter to accept instances of SQLConnection. Specifically, allow them to accept as connections:

  • sqlite/sqlalchemy to_sql engine (or connection) objects. In this case, a new SQLConnection instance would be created internally.
  • SQLConnection instances (i.e. the user first creates a SQLConnection instance, then passes it in)

DB-specific things could now be handled by extending/subclassing SQLConnection. For example, consider multi-row inserts, mentioned in #8953 . multirow=True could now be a parameter to the SQLConnection constructor, we don't have to add yet one more parameter to to_sql.

con = pd.io.sql.SQLConnection(engine, multirow=True)
df.to_sql('test_default', con, if_exists='replace')

More niche cases (which might not be merged into pandas) could subclass SQLConnection and pass that in.

@jreback
Copy link
Contributor

jreback commented Jul 28, 2015

status on this?

@artemyk
Copy link
Contributor Author

artemyk commented Jul 28, 2015

@jreback It doesn't seem like this PR is getting the code review / feedback needed. If you want to close it, that's fine w. me.

@jreback
Copy link
Contributor

jreback commented Jul 28, 2015

@jorisvandenbossche is more of an expert on this

@jreback
Copy link
Contributor

jreback commented Oct 25, 2015

@jorisvandenbossche status of this?

1 similar comment
@jreback
Copy link
Contributor

jreback commented Jan 6, 2016

@jorisvandenbossche status of this?

@jreback
Copy link
Contributor

jreback commented Jan 20, 2016

closing, pls reopen if you wish to update

@jreback jreback closed this Jan 20, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design IO SQL to_sql, read_sql, read_sql_query
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants