Use SQLBindCol when possible #1322

ffelixg · 2024-01-28T23:17:11Z

When all columns can be bound and the bandwidth is good enough, this should provide a decent performance improvement, as SQLGetData seems to use a bunch of CPU power. I didn't do extensive benchmarks, but in a small scale cloud web app connected to a decently powerful Azure SQL DB (same region), I got up to a 50% lower execution time on a cursor.execute(...).fetchall() call. Probably 30% is more realistic.

Unfortunately the API allows changing native_uuid and output converters between fetch calls, so the code checks for changes and rebinds when necessary. I've added a test for it.

Some config variables that could be added are a cap on total memory that can be allocated for binding and the threshold at which we use SQLGetData for variable length data. But since only one row is bound this might be unnecessary.

… are changed

mkleehammer · 2024-02-05T15:39:44Z

Wow. That would be a really big performance improvement. It's a lot of code to go through, but I'll try to get to is soon.

ffelixg · 2024-02-05T20:50:09Z

Cool, thanks! Unfortunately I couldn't get native uuids to work on 64bit windows (seems to work fine everywhere else), so I switched back to GetData for that. The point at which it fails is the Py_BuildValue call in GetUUID. I tried a few things, including using the exact same buffer (in heap) for both cases (getting data via SQLGetData vs SQLBindCol+Fetch) and it looks to me like we're getting the same bytes both ways. Maybe there is some memory alignment issue I'm not understanding. I guess I'm also not sure why PYSQLGUID is defined the way it is and not just 16 bytes, maybe that has something to do with it.

ludekmatousek · 2024-02-29T14:12:20Z

Hello,

I can add some note to performance ....

for last couple of months I'm trying to solve performance issue when downloading the data from IBM iSeries (AS400/ IBM i).
Transfer works fine with the files with simple structure. In case of 10+ columns performance drastically degraded.
During the really log way across the AS400 TCP/buffer settings, networking department analyzing PCAPs, examination of active devices on the network way, playing with the ODBC buffers/LOB and many others, we ended with IBM support ....

IBM support analyzed ODBC trace and missing SQLBindCol resulting in "In the end about 80% of the time is spent in the application, about 5% on the IBM i and the rest is the network".

Looking forward for new version :)

ludekmatousek · 2024-03-01T10:34:31Z

Package built from ffelixg:bindcol and tested against the IBM iSeries (AS400/ IBM i) with positive results.
Left side measurements with standard pyodbc, right side measurements with the new wheel, IBM cwbtrace as a proof of usage SQLBindCol():

ffelixg · 2024-03-04T19:41:32Z

Thank you for the benchmark! I suppose a cursor attribute to track how many columns were bound would be useful.

I've rewritten GetUUID to build the tuple manually instead of using Py_BuildValue, since that was behaving in funny ways. Now binding works for that too.

I've also tried another approach to integrate SQLBindCol in the BindCol_rowwise branch. The idea is that since we need to pre calculate all of the c types/buffer sizes for SQLBindCol anyway, we can store it inside ColumnInfo and abstract out SQLGetData to GetData from the individual Get* Functions. As a consequence, most of the Get* functions collapse to just a few lines. I think that this way, SQLBindCol and SQLGetData play together in a more natural way, but it's definitely more change to existing code.

This way it was also fairly straight forward to implement fetching arrays in case all columns can be bound (using rowwise binding - felt more natural, since we're returning rows). This seems to mostly just affect narrow fetches though and the performance gain definitely less than binding in the first place.

Curious to hear any thoughts.

…tching

- Track encoding ctypes and trigger rebinding on changes - Prevent losing rows when switching from fetchmany to fetchone and then triggering a rebind - Add tests for the above - Avoid unnecessary dict copy when checking if rebind is necessary - Minor improvements to diagnostic variables and error handling

- Store metadata like c_types and text encodings in ColumnInfo instead of computing it for every GetData call - compute said metadata in the first call to cur.fetch* and check if it changed before subsequent fetch* calls - Abstract out SQLGetData and the decision to use it or a bound column as well as the null check to GetData from the individual Get* Functions - Use row-wise column binding - Add cursor attributes for diagnostics (bound_columns_count, bound_buffer_rows) and configuration (bind_cell_cap, bind_byte_cap)

ffelixg added 11 commits January 13, 2024 17:40

Use SQLBindCol when possible

76fd658

Run CI

fea02d4

debug

988882a

reorder

4391d24

Add rarely needed SQL_UNBIND

6f3fc14

Check for double frees

97dad9a

Revert debugging

dba6a12

Perform necessary column rebind when native uuid or custom converters…

08dc244

… are changed

Bigger char buffer size

185edcb

Add test to check for proper column rebinding

2d9fec2

Adjust variable names

c0eb022

ffelixg force-pushed the bindcol branch from d801f06 to c0eb022 Compare January 29, 2024 14:35

ffelixg marked this pull request as draft January 29, 2024 14:53

Don't bind native UUIDs

13cdede

ffelixg marked this pull request as ready for review February 5, 2024 20:50

Single row-wise buffer, handle SQLGetData centrally

aea40a7

ffelixg added 5 commits March 1, 2024 19:34

Add support for fetching multiple rows

a1540a0

Rewrite GetUUID & bind

0a5bb97

Only (re)bind before fetch, cleanup

7823550

Fix py3.9 compatibility

64fc661

Rewrite GetUUID and bind native uuids

057cafe

ffelixg added 4 commits March 5, 2024 00:15

Add configuration variables for the number of rows to allocate for fe…

2df52d4

…tching

Undo some unnecessary changes

ff0423f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use SQLBindCol when possible #1322

Use SQLBindCol when possible #1322

ffelixg commented Jan 28, 2024

mkleehammer commented Feb 5, 2024

ffelixg commented Feb 5, 2024

ludekmatousek commented Feb 29, 2024

ludekmatousek commented Mar 1, 2024 •

edited

ffelixg commented Mar 4, 2024

Use SQLBindCol when possible #1322

Are you sure you want to change the base?

Use SQLBindCol when possible #1322

Conversation

ffelixg commented Jan 28, 2024

mkleehammer commented Feb 5, 2024

ffelixg commented Feb 5, 2024

ludekmatousek commented Feb 29, 2024

ludekmatousek commented Mar 1, 2024 • edited

ffelixg commented Mar 4, 2024

ludekmatousek commented Mar 1, 2024 •

edited