Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Insertion speed with PostgresSQL Driver #241

Open
gabm opened this issue Jul 24, 2020 · 2 comments
Open

Insertion speed with PostgresSQL Driver #241

gabm opened this issue Jul 24, 2020 · 2 comments

Comments

@gabm
Copy link
Contributor

gabm commented Jul 24, 2020

Environment

  • nanodbc version:master
  • DBMS name/version:
  • ODBC connection string:
  • OS and Compiler: Windows, MSVC
  • CMake settings:

Actual behavior

I am trying to batch-insert data into the database, but I never achieve rates faster than ~2k rows per second. How can I improve insertion performance?

I broke it down to a test database that I created like this:

            execute(conn, NANODBC_TEXT("create unlogged table rowset_iteration (i int, d float);"));

Then I created batch_size integers and floats in a std::vector and then I insert them like this:

{
    statement stmt(conn);
    prepare(stmt, NANODBC_TEXT("insert into rowset_iteration (i, d) values (?, ?);"));

    stmt.bind(0, integers.data(), integers.size());
    stmt.bind(1, floats.data(), floats.size());

    transact(stmt, batch_size);
}

As batch_sizes I used values like 500 and more but it doesn't make much difference.

Swappshot Fri Jul 24 17:15:28 2020

I observed the network traffic with wireshark and realised that each and every row is being sent synchronously and once at a time... that seems to be the problem here.. I thought that batch-insert would send the data at once and not individually? Any ideas?

Expected behavior

Minimal Working Example

@lexicalunit
Copy link
Contributor

From what I can see it looks like you're doing everything correctly here. Using the transaction is required for batch operations, and upping the batch size should set the SQL_ATTR_PARAMSET_SIZE attribute on the StatementHandle using SQLSetStmtAttr (). Note that this requires ODBC 3.0 or newer if that matters. It's not clear to me why you are seeing this problem.

@gabm
Copy link
Contributor Author

gabm commented Jul 27, 2020

Thank you for confirming my approach! That's really helpful.

Unfortunately I didn't get it to work. I use a PostgreSQL database and the latest postgresodbc driver in Win10. I suppose they use the required version...

However, I was able to achieve what I wanted by create a multi-value insert string in a for loop. Not the most beautiful solution, but it works... I ended up with something like this:

statement stmt(conn);
std::stringstream query;
query <<"insert into rowset_iteration (i, d) values ";
for (int i = 0; i < batch_size; i++) {
    query << "(" << integers[i] << "," << floats[i] << ")";
    if (i != batch_size - 1)
        query << ",";
    else
        query << ";";
}
prepare(stmt, query.str());
transact(stmt,1);

with batch_size = 1e6 I achieve 65Mb in one go which in turn runs up to 100k inserts/sec... good enough for me.

If someone has any idea why the original approach didn't work, I'd be still interested ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants