Use multi-row inserts for massive speedups on to_sql over high latency connections #8953

maxgrenderjones · 2014-12-01T18:29:49Z

I have been trying to insert ~30k rows into a mysql database using pandas-0.15.1, oursql-0.9.3.1 and sqlalchemy-0.9.4. Because the machine is as across the atlantic from me, calling data.to_sql was taking >1 hr to insert the data. On inspecting with wireshark, the issue is that it is sending an insert for every row, then waiting for the ACK before sending the next, and, long story short, the ping times are killing me.

However, following the instructions from SQLAlchemy, I changed

def _execute_insert(self, conn, keys, data_iter):
    data = [dict((k, v) for k, v in zip(keys, row)) for row in data_iter]
    conn.execute(self.insert_statement(), data)

to

def _execute_insert(self, conn, keys, data_iter):
    data = [dict((k, v) for k, v in zip(keys, row)) for row in data_iter]
    conn.execute(self.insert_statement().values(data))

and the entire operation completes in less than a minute. (To save you a click, the difference is between multiple calls to insert into foo (columns) values (rowX) and one massive insert into foo (columns) VALUES (row1), (row2), row3)). Given how often people are likely to use pandas to insert large volumes of data, this feels like a huge win that would be great to be included more widely.

Some challenges:

Not every database supports multirow inserts (SQLite and SQLServer didn't in the past, though they do now). I don't know how to check for this via SQLAlchemy
The MySQL server I was using didn't allow me to insert the data all in one go, I had to set the chunksize (5k worked fine, but I guess the full 30k was too much). If we made this the default insert, most people would have to add a chunk size (which might be hard to calculate, as it might be determined by the maximum packet size of the server).

The easiest way to do this, would be to add a multirow= boolean parameter (default False) to the to_sql function, and then leave the user responsible for setting the chunksize, but perhaps there's a better way?

Thoughts?

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2014-12-02T23:21:53Z

This seems reasonable. Thanks for investigating this!

For the implementation, it will depend on how sqlalchemy deals with database flavors that does not support this (I can't test this at the moment, but it seems that sqlalchemy raises an error (eg http://stackoverflow.com/questions/23886764/multiple-insert-statements-in-mssql-with-sqlalchemy). Also, if it has the consequence that a lot of people will have to set chunksize, this is indeed not a good idea to do as default (unless we set chunksize to a value by default).
So adding a keyword seems maybe better.

@artemyk @mangecoeur @hayd @danielballan

artemyk · 2014-12-02T23:34:57Z

Apparently SQLAlchemy has a flag dialect.supports_multivalues_insert (see e.g. http://pydoc.net/Python/SQLAlchemy/0.8.3/sqlalchemy.sql.compiler/ , possibly called supports_multirow_insert in other versions, https://www.mail-archive.com/mediawiki-commits@lists.wikimedia.org/msg202880.html ).

Since this has the potential to speed up inserts a lot, and we can check for support easily, I'm thinking maybe we could do it by default, and also set chunksize to a default value (e.g. 16kb chunks... not sure what's too big in most situations). If the multirow insert fails, we could throw an exception suggesting lowering the chunksize?

maxgrenderjones · 2014-12-03T14:36:19Z

Now I just need to persuade the SQLAlchemy folks to set supports_multivalues_insert to true on SQL Server >2005 (I hacked it into the code and it works fine, but it's not on by default).

On a more on-topic note, I think the chunksize could be tricky. On my mysql setup (which I probably configured to allow large packets), I can set chunksize=5000, on my SQLServer setup, 500 was too large, but 100 worked fine. However, it's probably true that most of the benefits from this technique come from going from inserting 1 row at a time to 100, rather than 100 to 1000.

danielballan · 2014-12-03T14:56:48Z

What if chunksize=None meant "Adaptively choose a chunksize"? Attempt something like 5000, 500, 50, 1. Users could turn this off by specifying a chunksize. If the overhead from these attempts is too large, I like @maxgrenderjones suggestion: chunksize=10 is a better default than chunksize=1.

jorisvandenbossche · 2014-12-03T15:07:48Z

On that last comment "chunksize=10 is a better default than chunksize=1" -> that is not fully true I think. The current situation is to do one execute statement that consists of multiline single-row insert statements (which is not a chunksize of 1), while chunksize=10 would mean doing a lot of execute statements with each time one multi-row insert.
And I don't know if this is necessarily faster, but much depends on the situation. For example with the current code and with a local sqlite database:

In [4]: engine = create_engine('sqlite:///:memory:') #, echo='debug')

In [5]: df = pd.DataFrame(np.random.randn(50000, 10))

In [6]: %timeit df.to_sql('test_default', engine, if_exists='replace')
1 loops, best of 3: 956 ms per loop

In [7]: %timeit df.to_sql('test_default', engine, if_exists='replace', chunksize=10)
1 loops, best of 3: 2.23 s per loop

But of course this does not use the multi-row feature

nhockham · 2015-02-26T08:31:02Z

We've figured out how to monkey patch - might be useful to someone else. Have this code before importing pandas.

from pandas.io.sql import SQLTable

def _execute_insert(self, conn, keys, data_iter):
    print "Using monkey-patched _execute_insert"
    data = [dict((k, v) for k, v in zip(keys, row)) for row in data_iter]
    conn.execute(self.insert_statement().values(data))

SQLTable._execute_insert = _execute_insert

jorisvandenbossche · 2015-02-26T12:29:58Z

Maybe we can just start with adding this feature through a new multirow=True keyword (with a default of False for now), and then we can later always see if we can enable it by default?

@maxgrenderjones @nhockham interested to do a PR to add this?

mangecoeur · 2015-02-26T12:50:25Z

@jorisvandenbossche I think it's risky to start adding keyword arguments to address specific performance profiles. If you can guarantee that it's faster in all cases (if necessary by having it determine the best method based on the inputs) then you don't need a flag at all.

Different DB-setups may have different performance optimizations (different DB perf profiles, local vs network, big memory vs fast SSD, etc, etc), if you start adding keyword flags for each it becomes a mess.

I would suggest creating subclasses of SQLDatabase and SQLTable to address performance specific implementations, they would be used through the object-oriented API. Perhaps a "backend switching" method could be added but frankly using the OO api is very simple so this is probably overkill for what is already a specialized use-case.

I created such a sub-class for loading large datasets to Postgres (it's actually much faster to save data to CSV then use the built-in non-standard COPY FROM sql commands than to use inserts, see https://gist.github.com/mangecoeur/1fbd63d4758c2ba0c470#file-pandas_postgres-py). To use it you just do PgSQLDatabase(engine, <args>).to_sql(frame, name,<kwargs>)

artemyk · 2015-02-26T17:13:17Z

Just for reference, I tried running the code by @jorisvandenbossche (Dec 3rd post) using the multirow feature. It's quite a bit slower. So the speed-tradeoffs here is not trivial:

In [4]: engine = create_engine('sqlite:///:memory:') #, echo='debug')

In [5]: df = pd.DataFrame(np.random.randn(50000, 10))

In [6]: 

In [6]: %timeit df.to_sql('test_default', engine, if_exists='replace')
1 loops, best of 3: 1.05 s per loop

In [7]: 

In [7]: from pandas.io.sql import SQLTable

In [8]: 

In [8]: def _execute_insert(self, conn, keys, data_iter):
   ...:         data = [dict((k, v) for k, v in zip(keys, row)) for row in data_iter]
   ...:         conn.execute(self.insert_statement().values(data))
   ...:     

In [9]: SQLTable._execute_insert = _execute_insert

In [10]: 

In [10]: reload(pd)
Out[10]: <module 'pandas' from '/usr/local/lib/python2.7/site-packages/pandas/__init__.pyc'>

In [11]: 

In [11]: %timeit df.to_sql('test_default', engine, if_exists='replace', chunksize=10)
1 loops, best of 3: 9.9 s per loop

Also, I agree that adding keyword parameters is risky. However, the multirow feature seems pretty fundamental. Also, 'monkey-patching' is probably not more robust to API changes than keyword parameters.

mangecoeur · 2015-02-26T17:31:22Z

Its as i suspected. Monkey patching isn't the solution I was suggesting - rather that we ship a number of performance oriented subclasses that the informed user could use through the OO interface (to avoid loading the functional api with too many options)

-----Original Message-----
From: "Artemy Kolchinsky" notifications@github.com
Sent: ‎26/‎02/‎2015 17:13
To: "pydata/pandas" pandas@noreply.github.com
Cc: "mangecoeur" jon.chambers3001@gmail.com
Subject: Re: [pandas] Use multi-row inserts for massive speedups on to_sqlover high latency connections (#8953)

Just for reference, I tried running the code by @jorisvandenbossche (Dec 3rd post) using the multirow feature. It's quite a bit slower. So the speed-tradeoffs here is not trivial:
In [4]: engine = create_engine('sqlite:///:memory:') #, echo='debug')

In [5]: df = pd.DataFrame(np.random.randn(50000, 10))

In [6]:

In [6]: %timeit df.to_sql('test_default', engine, if_exists='replace')
1 loops, best of 3: 1.05 s per loop

In [7]:

In [7]: from pandas.io.sql import SQLTable

In [8]:

In [8]: def _execute_insert(self, conn, keys, data_iter):
...: data = [dict((k, v) for k, v in zip(keys, row)) for row in data_iter]
...: conn.execute(self.insert_statement().values(data))
...:

In [9]: SQLTable._execute_insert = _execute_insert

In [10]:

In [10]: reload(pd)
Out[10]: <module 'pandas' from '/usr/local/lib/python2.7/site-packages/pandas/init.pyc'>

In [11]:

In [11]: %timeit df.to_sql('test_default', engine, if_exists='replace', chunksize=10)
1 loops, best of 3: 9.9 s per loop
Also, I agree that adding keyword parameters is risky. However, the multirow feature seems pretty fundamental. Also, 'monkey-patching' is probably not more robust to API changes than keyword parameters.
—
Reply to this email directly or view it on GitHub.

maxgrenderjones · 2015-03-16T15:55:25Z

As per the initial ticket title, I don't think this approach is going to be preferable in all cases, so I wouldn't make it the default. However, without it, the pandas to_sql unusable for me, so it's important enough for me to continue to request the change. (It's also become the first thing I change when I upgrade my pandas version). As for sensible chunksize values, I don't think there is one true n, as the packet size will depend on how many columns there are (and what's in them) in hard to predict ways. Unfortunately SQLServer fails with an error message that looks totally unrelated (but isn't) if you set the chunksize too high (which is probably why multirow inserts aren't turned on except with a patch in SQLAlchemy), but it works fine with mysql. Users may need to experiment to determine what value of n is likely to result in an acceptably large packet size (for whatever their backing database is). Having pandas chose n is likely to get land us way further down in the implementation details than we want to be (i.e. the opposite direction from the maximum-possible-abstraction SQLALchemy approach)

In short, my recommendation would be to add it as a keyword, with some helpful commentary about how to use it. This wouldn't be the first time a keyword was used to select an implementation (see: http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.apply.html) but that perhaps isn't the best example, as I haven't the first idea about what raw= means, even having read the explanation!

As suggested here - pandas-dev#8953

dragonator4 · 2016-07-21T06:57:29Z

I have noticed that it also consumes a huge amount of memory. Like a 1.6+ GB DataFrame with some 700,000 rows and 301 columns requires almost 34 GB during insert! That is like over the top inefficient. Any ideas on why that might be the case? Here is a screen clip:

andreacassioli · 2016-10-19T08:52:29Z

Hi guys,
any progress on this issue?

I am try to insert around 200K rows using to_sql but it takes forever and consume a huge amount of memory! Using chuncksize helps with the memory but still the speed is very slow.

My impression, looking at the MSSQL DBase trace is that the insertion is actually performed one row at the time.

The only viable approach now is to dump to a csv file on a shared folder and use BULK INSERT. But it very annoying and inelegant!

ostrokach · 2016-10-20T15:57:37Z

@andreacassioli You can use odo to insert a DataFrame into an SQL database through an intermediary CSV file. See Loading CSVs into SQL Databases.

I don't think you can come even close to BULK INSERT performance using ODBC.

andreacassioli · 2016-10-20T20:03:35Z

@ostrokach thank you, indeed I am using csv files now. If I could get close, I would trade a bit of time for simplicity!

indera · 2017-03-03T03:05:07Z

I thought this might help somebody:
http://docs.sqlalchemy.org/en/latest/faq/performance.html#i-m-inserting-400-000-rows-with-the-orm-and-it-s-really-slow

jorisvandenbossche · 2017-03-03T09:41:26Z

@indera pandas does not use the ORM, only sqlalchemy Core (which is what the doc entry there suggests to use for large inserts)

russlamb · 2017-03-06T15:17:29Z

is there any consensus on how to work around this in the meantime? I'm inserting a several million rows into postgres and it takes forever. Is CSV / odo the way to go?

jreback · 2017-03-06T15:31:50Z

@russlamb a practical way to solve this problem is simply to bulk upload. This is someone db specific though, so odo has solutions for postgresl (and may be mysql) I think. for something like sqlserver you have to 'do this yourself' (IOW you have to write it).

indera · 2017-03-06T15:37:37Z

For sqlserver I used the FreeTDS driver (http://www.freetds.org/software.html and https://github.com/mkleehammer/pyodbc ) with SQLAlchemy entities which resulted in very fast inserts (20K rows per data frame):

from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()


class DemographicEntity(Base):
    __tablename__ = 'DEMOGRAPHIC'

    patid = db.Column("PATID", db.Text, primary_key=True)
    """
    patid = db.Column("PATID", db.Text, primary_key=True, autoincrement=False, nullable=True)
    birth_date = db.Column("BIRTH_DATE", db.Date)
    birth_time = db.Column("BIRTH_TIME", db.Text(5))
    sex = db.Column("SEX", db.Text(2))

def get_db_url(db_host, db_port, db_name, db_user, db_pass):
    params = parse.quote(
        "Driver={{FreeTDS}};Server={};Port={};"
        "Database={};UID={};PWD={};"
        .format(db_host, db_port, db_name, db_user, db_pass))
    return 'mssql+pyodbc:///?odbc_connect={}'.format(params)

def get_db_pool():
    """
    Create the database engine connection.
    @see http://docs.sqlalchemy.org/en/latest/core/engines.html

    :return: Dialect object which can either be used directly
            to interact with the database, or can be passed to
            a Session object to work with the ORM.
    """
    global DB_POOL

    if DB_POOL is None:
        url = get_db_url(db_host=DB_HOST, db_port=DB_PORT, db_name=DB_NAME,
                         db_user=DB_USER, db_pass=DB_PASS)
        DB_POOL = db.create_engine(url,
                                   pool_size=10,
                                   max_overflow=5,
                                   pool_recycle=3600)

    try:
        DB_POOL.execute("USE {db}".format(db=DB_NAME))
    except db.exc.OperationalError:
        logger.error('Database {db} does not exist.'.format(db=DB_NAME))

    return DB_POOL


def save_frame():
    db_pool = get_db_pool()
    records = df.to_dict(orient='records')
    result = db_pool.execute(entity.__table__.insert(), records)

jorisvandenbossche · 2017-03-06T15:42:43Z

Is CSV / odo the way to go?

This solution will almost always be faster I think, regardless of the multi-row / chunksize settings.

But, @russlamb, it is always interesting to hear whether such a multi-row keyword would be an improvement in your case. See eg #8953 (comment) on a way to easily test this out.

I think there is agreement that we want to have a way to specify this (without necessarily changing the default). So if somebody wants to make a PR for this, that is certainly welcome.
There was only some discussion on how to add this ability (new keyword vs subclass using OO api).

indera · 2017-03-06T15:53:28Z

@jorisvandenbossche The document I linked above mentions "Alternatively, the SQLAlchemy ORM offers the Bulk Operations suite of methods, which provide hooks into subsections of the unit of work process in order to emit Core-level INSERT and UPDATE constructs with a small degree of ORM-based automation."

What I am suggesting is to implement a sqlserver specific version for to_sql which under the hood uses the SQLAlchemy ORMs for speedups as in the code I posted above.

mangecoeur · 2017-03-07T04:31:10Z

This was proposed before. The way you go is to implement an pandas sql class optimised for a backend. I posted a gist in the past for using postgres COPY FROM command which is much faster. However something similar is now available in odo, and built in a more robust way. There isn't much point IMHO in duplicating work from odo.

…

On 7 Mar 2017 00:53, "Andrei Sura" ***@***.***> wrote: @jorisvandenbossche <https://github.com/jorisvandenbossche> The document I linked above mentions "Alternatively, the SQLAlchemy ORM offers the Bulk Operations suite of methods, which provide hooks into subsections of the unit of work process in order to emit Core-level INSERT and UPDATE constructs with a small degree of ORM-based automation." What I am suggesting is to implement a sqlserver specific version for "to_sql" which under the hood uses the SQLAlchemy core for speedups. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8953 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAtYVDXKLuTlsh9ycpMQvU5C0hs_RxuYks5rjCwBgaJpZM4DCjLh> .

mangecoeur · 2017-03-07T04:33:45Z

Also noticed you mentioned sqlalchemy could core instead. Unless something has changed a lot, only sqlalchemy core is used in any case, no orm. If you want to speed up more than using core you have to go to lower level, db specific optimisation

…

On 7 Mar 2017 00:53, "Andrei Sura" ***@***.***> wrote: @jorisvandenbossche <https://github.com/jorisvandenbossche> The document I linked above mentions "Alternatively, the SQLAlchemy ORM offers the Bulk Operations suite of methods, which provide hooks into subsections of the unit of work process in order to emit Core-level INSERT and UPDATE constructs with a small degree of ORM-based automation." What I am suggesting is to implement a sqlserver specific version for "to_sql" which under the hood uses the SQLAlchemy core for speedups. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8953 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAtYVDXKLuTlsh9ycpMQvU5C0hs_RxuYks5rjCwBgaJpZM4DCjLh> .

dfernan · 2017-06-01T20:47:29Z

Is this getting fixed/taken care of? As of now inserting pandas dataframes into a SQL db is extremely slow unless it's a toy dataframe. Let's decide on a solution and push it forward?

ostrokach · 2017-06-04T22:57:09Z

@dfernan As mentioned above, you may want to look at the odo. Using an intermediary CSV file will always be orders of magnitude faster that going through sqlalchemy, no matter what kind of improvements happen here...

…ndas-dev#8953)

…andas-dev#8953)

citynorman · 2018-10-14T22:05:16Z

I found d6tstack much simpler to use, it's a one-liner d6tstack.utils.pd_to_psql(df, cfg_uri_psql, 'benchmark', if_exists='replace') and it's much faster than df.to_sql(). Supports postgres and mysql. See https://github.com/d6t/d6tstack/blob/master/examples-sql.ipynb

VincentLa14 · 2018-11-25T22:08:43Z

I've been using the Monkey Patch Solution:

from pandas.io.sql import SQLTable

def _execute_insert(self, conn, keys, data_iter):
    print "Using monkey-patched _execute_insert"
    data = [dict((k, v) for k, v in zip(keys, row)) for row in data_iter]
    conn.execute(self.insert_statement().values(data))

SQLTable._execute_insert = _execute_insert

for some time now, but now I'm getting an error:

TypeError: insert_statement() missing 2 required positional arguments: 'data' and 'conn'

Is anyone else getting this? I'm on Python 3.6.5 (Anaconda) and pandas==0.23.0

hnazkani · 2018-12-25T09:34:10Z

is this getting fixed ? Currently, df.to_sql is extremely slow and can't be used at all for many practical use cases. Odo project seems to have been abandoned already.
I have following use cases in financial time series where df.to_sql is pretty much not usable:

copying historical csv data to postgres database - can't use df.to_sql and had to go with custom code around psycopg2 copy_from functionality
streaming data (coming in a batch of ~500-3000 rows per second) to be dumped to postgres database - again df.to_sql performance is pretty disappointing as it is taking too much time to insert these natural batches of data to postgres.
The only place where I find df.to_sql useful now is to create tables automatically !!! - which is not the use case it was designed for.
I am not sure if other people also share the same concern but this issue needs some attention for "dataframes-to-database" interfaces to work smoothly.
Look forward.

#21401) * ENH: to_sql() add parameter "method" to control insertions method (#8953) * ENH: to_sql() add parameter "method". Fix docstrings (#8953) * ENH: to_sql() add parameter "method". Improve docs based on reviews (#8953) * ENH: to_sql() add parameter "method". Fix unit-test (#8953) * doc clean-up * additional doc clean-up * use dict(zip()) directly * clean up merge * default --> None * Remove stray default * Remove method kwarg * change default to None * test copy insert snippit * print debug * index=False * Add reference to documentation

…ndas-dev#8… (pandas-dev#21401) * ENH: to_sql() add parameter "method" to control insertions method (pandas-dev#8953) * ENH: to_sql() add parameter "method". Fix docstrings (pandas-dev#8953) * ENH: to_sql() add parameter "method". Improve docs based on reviews (pandas-dev#8953) * ENH: to_sql() add parameter "method". Fix unit-test (pandas-dev#8953) * doc clean-up * additional doc clean-up * use dict(zip()) directly * clean up merge * default --> None * Remove stray default * Remove method kwarg * change default to None * test copy insert snippit * print debug * index=False * Add reference to documentation

jconstanzo · 2019-09-26T22:52:45Z

Hey, I'm getting this error when I try to perform a multi-insert to a SQLite database:

This is my code:
df.to_sql("financial_data", con=conn, if_exists="append", index=False, method="multi")

and I get this error:

Traceback (most recent call last):

  File "<ipython-input-11-cf095145b980>", line 1, in <module>
    handler.insert_financial_data_from_df(data, "GOOG")

  File "C:\Users\user01\Documents\Code\FinancialHandler.py", line 110, in insert_financial_data_from_df
    df.to_sql("financial_data", con=conn, if_exists="append", index=False, method="multi")

  File "C:\Users\user01\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py", line 2531, in to_sql
    dtype=dtype, method=method)

  File "C:\Users\user01\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\sql.py", line 460, in to_sql
    chunksize=chunksize, dtype=dtype, method=method)

  File "C:\Users\user01\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\sql.py", line 1547, in to_sql
    table.insert(chunksize, method)

  File "C:\Users\user01\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\sql.py", line 686, in insert
    exec_insert(conn, keys, chunk_iter)

  File "C:\Users\user01\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\sql.py", line 609, in _execute_insert_multi
    conn.execute(self.table.insert(data))

TypeError: insert() takes exactly 2 arguments (1 given)

Why is this happening? I'm using Python 3.7.3 (Anaconda), pandas 0.24.2 and sqlite3 2.6.0.

Thank you very much in advance!

jorisvandenbossche · 2019-09-27T06:51:34Z

@jconstanzo can you open this as a new issue?
And if possible, can you try to provide a reproducible example? (eg a small example dataframe that can show the problem)

Bauxitedev · 2019-11-22T09:52:40Z

@jconstanzo Having the same issue here. Using method='multi' (in my case, in combination with chunksize) seems to trigger this error when you try to insert into a SQLite database.

Unfortunately I can't really provide an example dataframe because my dataset is huge, that's the reason I'm using method and chunksize in the first place.

jconstanzo · 2019-11-28T22:14:34Z

I'm sorry for the delay. I just opened an issue for this problem: #29921

ban04toufuonline · 2023-11-10T04:41:08Z

tow to hack this? @maxgrenderjones

Now I just need to persuade the SQLAlchemy folks to set supports_multivalues_insert to true on SQL Server >2005 (I hacked it into the code and it works fine, but it's not on by default).

On a more on-topic note, I think the chunksize could be tricky. On my mysql setup (which I probably configured to allow large packets), I can set chunksize=5000, on my SQLServer setup, 500 was too large, but 100 worked fine. However, it's probably true that most of the benefits from this technique come from going from inserting 1 row at a time to 100, rather than 100 to 1000.

jorisvandenbossche added the IO SQL to_sql, read_sql, read_sql_query label Dec 2, 2014

artemyk mentioned this issue May 15, 2015

API: Unify SQLTable code for fallback and SQLAlchemy mode and move differences into database class #8562

Closed

nmichaud added a commit to redhawkresearch/pandas that referenced this issue Mar 3, 2016

Use multi-row inserts for massive speedups on to_sql

2018d5d

As suggested here - pandas-dev#8953

jorisvandenbossche mentioned this issue Sep 29, 2016

to_sql function takes forever to insert in oracle database #14315

Closed

schettino72 mentioned this issue May 18, 2018

"too many SQL variables" Error with pandas 0.23 - enable multivalues insert #19664 issue #21103

Closed

schettino72 mentioned this issue May 30, 2018

ENH: 'to_sql()' add param 'method' to control insert statement (#21103) #21199

Closed

5 tasks

jreback added this to the 0.23.1 milestone Jun 4, 2018

jorisvandenbossche modified the milestones: 0.23.1, 0.24.0 Jun 7, 2018

schettino72 added a commit to schettino72/pandas that referenced this issue Jun 9, 2018

ENH: to_sql() add parameter "method" to control insertions method (pa…

a77cdfd

…ndas-dev#8953)

schettino72 mentioned this issue Jun 9, 2018

ENH: to_sql() add parameter "method" to control insertions method (#8… #21401

Merged

4 tasks

schettino72 added a commit to schettino72/pandas that referenced this issue Jun 9, 2018

ENH: to_sql() add parameter "method". Fix docstrings (pandas-dev#8953)

21e8c04

gfyoung added the Performance Memory or execution speed performance label Jun 10, 2018

schettino72 added a commit to schettino72/pandas that referenced this issue Jun 10, 2018

ENH: to_sql() add parameter "method". Improve docs based on reviews (p…

1e5d1cc

…andas-dev#8953)

schettino72 added a commit to schettino72/pandas that referenced this issue Jun 11, 2018

ENH: to_sql() add parameter "method". Fix unit-test (pandas-dev#8953)

085313c

wxianxin mentioned this issue Aug 22, 2018

I would love to snowflake to support supports_multivalues_insert snowflakedb/snowflake-sqlalchemy#55

Closed

TomAugspurger mentioned this issue Oct 22, 2018

to_sql is very slow for sql server with sqlalchemy #23274

Closed

jreback modified the milestones: 0.24.0, Contributions Welcome Nov 6, 2018

bsaunders23 mentioned this issue Dec 20, 2018

to_sql is too slow #15276

Closed

jreback modified the milestones: Contributions Welcome, 0.24.0 Dec 27, 2018

mroeschke closed this as completed in #21401 Dec 28, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use multi-row inserts for massive speedups on to_sql over high latency connections #8953

Use multi-row inserts for massive speedups on to_sql over high latency connections #8953

maxgrenderjones commented Dec 1, 2014

jorisvandenbossche commented Dec 2, 2014

artemyk commented Dec 2, 2014

maxgrenderjones commented Dec 3, 2014

danielballan commented Dec 3, 2014

jorisvandenbossche commented Dec 3, 2014

nhockham commented Feb 26, 2015

jorisvandenbossche commented Feb 26, 2015

mangecoeur commented Feb 26, 2015

artemyk commented Feb 26, 2015

mangecoeur commented Feb 26, 2015

maxgrenderjones commented Mar 16, 2015

dragonator4 commented Jul 21, 2016

andreacassioli commented Oct 19, 2016

ostrokach commented Oct 20, 2016 •

edited

andreacassioli commented Oct 20, 2016

indera commented Mar 3, 2017

jorisvandenbossche commented Mar 3, 2017

russlamb commented Mar 6, 2017

jreback commented Mar 6, 2017

indera commented Mar 6, 2017 •

edited

jorisvandenbossche commented Mar 6, 2017

indera commented Mar 6, 2017 •

edited

mangecoeur commented Mar 7, 2017 via email

mangecoeur commented Mar 7, 2017 via email

dfernan commented Jun 1, 2017

ostrokach commented Jun 4, 2017

citynorman commented Oct 14, 2018 •

edited

VincentLa14 commented Nov 25, 2018 •

edited

hnazkani commented Dec 25, 2018

jconstanzo commented Sep 26, 2019

jorisvandenbossche commented Sep 27, 2019

Bauxitedev commented Nov 22, 2019

jconstanzo commented Nov 28, 2019

ban04toufuonline commented Nov 10, 2023

Use multi-row inserts for massive speedups on to_sql over high latency connections #8953

Use multi-row inserts for massive speedups on to_sql over high latency connections #8953

Comments

maxgrenderjones commented Dec 1, 2014

jorisvandenbossche commented Dec 2, 2014

artemyk commented Dec 2, 2014

maxgrenderjones commented Dec 3, 2014

danielballan commented Dec 3, 2014

jorisvandenbossche commented Dec 3, 2014

nhockham commented Feb 26, 2015

jorisvandenbossche commented Feb 26, 2015

mangecoeur commented Feb 26, 2015

artemyk commented Feb 26, 2015

mangecoeur commented Feb 26, 2015

maxgrenderjones commented Mar 16, 2015

dragonator4 commented Jul 21, 2016

andreacassioli commented Oct 19, 2016

ostrokach commented Oct 20, 2016 • edited

andreacassioli commented Oct 20, 2016

indera commented Mar 3, 2017

jorisvandenbossche commented Mar 3, 2017

russlamb commented Mar 6, 2017

jreback commented Mar 6, 2017

indera commented Mar 6, 2017 • edited

jorisvandenbossche commented Mar 6, 2017

indera commented Mar 6, 2017 • edited

mangecoeur commented Mar 7, 2017 via email

mangecoeur commented Mar 7, 2017 via email

dfernan commented Jun 1, 2017

ostrokach commented Jun 4, 2017

citynorman commented Oct 14, 2018 • edited

VincentLa14 commented Nov 25, 2018 • edited

hnazkani commented Dec 25, 2018

jconstanzo commented Sep 26, 2019

jorisvandenbossche commented Sep 27, 2019

Bauxitedev commented Nov 22, 2019

jconstanzo commented Nov 28, 2019

ban04toufuonline commented Nov 10, 2023

ostrokach commented Oct 20, 2016 •

edited

indera commented Mar 6, 2017 •

edited

indera commented Mar 6, 2017 •

edited

citynorman commented Oct 14, 2018 •

edited

VincentLa14 commented Nov 25, 2018 •

edited