Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use SQLBindCol when possible #1322

Open
wants to merge 22 commits into
base: master
Choose a base branch
from
Open

Conversation

ffelixg
Copy link

@ffelixg ffelixg commented Jan 28, 2024

When all columns can be bound and the bandwidth is good enough, this should provide a decent performance improvement, as SQLGetData seems to use a bunch of CPU power. I didn't do extensive benchmarks, but in a small scale cloud web app connected to a decently powerful Azure SQL DB (same region), I got up to a 50% lower execution time on a cursor.execute(...).fetchall() call. Probably 30% is more realistic.

Unfortunately the API allows changing native_uuid and output converters between fetch calls, so the code checks for changes and rebinds when necessary. I've added a test for it.

Some config variables that could be added are a cap on total memory that can be allocated for binding and the threshold at which we use SQLGetData for variable length data. But since only one row is bound this might be unnecessary.

@mkleehammer
Copy link
Owner

Wow. That would be a really big performance improvement. It's a lot of code to go through, but I'll try to get to is soon.

@ffelixg
Copy link
Author

ffelixg commented Feb 5, 2024

Cool, thanks! Unfortunately I couldn't get native uuids to work on 64bit windows (seems to work fine everywhere else), so I switched back to GetData for that. The point at which it fails is the Py_BuildValue call in GetUUID. I tried a few things, including using the exact same buffer (in heap) for both cases (getting data via SQLGetData vs SQLBindCol+Fetch) and it looks to me like we're getting the same bytes both ways. Maybe there is some memory alignment issue I'm not understanding. I guess I'm also not sure why PYSQLGUID is defined the way it is and not just 16 bytes, maybe that has something to do with it.

@ffelixg ffelixg marked this pull request as ready for review February 5, 2024 20:50
@ludekmatousek
Copy link

Hello,

I can add some note to performance ....

for last couple of months I'm trying to solve performance issue when downloading the data from IBM iSeries (AS400/ IBM i).
Transfer works fine with the files with simple structure. In case of 10+ columns performance drastically degraded.
During the really log way across the AS400 TCP/buffer settings, networking department analyzing PCAPs, examination of active devices on the network way, playing with the ODBC buffers/LOB and many others, we ended with IBM support ....

IBM support analyzed ODBC trace and missing SQLBindCol resulting in "In the end about 80% of the time is spent in the application, about 5% on the IBM i and the rest is the network".

Looking forward for new version :)

@ludekmatousek
Copy link

ludekmatousek commented Mar 1, 2024

Package built from ffelixg:bindcol and tested against the IBM iSeries (AS400/ IBM i) with positive results.
Left side measurements with standard pyodbc, right side measurements with the new wheel, IBM cwbtrace as a proof of usage SQLBindCol():
SQLBind_pyodbc

@ffelixg
Copy link
Author

ffelixg commented Mar 4, 2024

Thank you for the benchmark! I suppose a cursor attribute to track how many columns were bound would be useful.

I've rewritten GetUUID to build the tuple manually instead of using Py_BuildValue, since that was behaving in funny ways. Now binding works for that too.

I've also tried another approach to integrate SQLBindCol in the BindCol_rowwise branch. The idea is that since we need to pre calculate all of the c types/buffer sizes for SQLBindCol anyway, we can store it inside ColumnInfo and abstract out SQLGetData to GetData from the individual Get* Functions. As a consequence, most of the Get* functions collapse to just a few lines. I think that this way, SQLBindCol and SQLGetData play together in a more natural way, but it's definitely more change to existing code.

This way it was also fairly straight forward to implement fetching arrays in case all columns can be bound (using rowwise binding - felt more natural, since we're returning rows). This seems to mostly just affect narrow fetches though and the performance gain definitely less than binding in the first place.

Curious to hear any thoughts.

- Track encoding ctypes and trigger rebinding on changes
- Prevent losing rows when switching from fetchmany to fetchone and then triggering a rebind
- Add tests for the above
- Avoid unnecessary dict copy when checking if rebind is necessary
- Minor improvements to diagnostic variables and error handling
- Store metadata like c_types and text encodings in ColumnInfo instead of computing it for every GetData call
- compute said metadata in the first call to cur.fetch* and check if it changed before subsequent fetch* calls
- Abstract out SQLGetData and the decision to use it or a bound column as well as the null check to GetData from the individual Get* Functions
- Use row-wise column binding
- Add cursor attributes for diagnostics (bound_columns_count, bound_buffer_rows) and configuration (bind_cell_cap, bind_byte_cap)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants