Execute prepared statements on client side by default #310

damian3031 · 2023-01-05T07:58:06Z

Draft PR to discuss approach for #306

mdesmet · 2023-01-05T08:52:34Z

Just to complete the information for @hashhar.

We discussed the following approaches

We could also avoid the prepare and deallocation query by simply adding the prepared statement directly on the ClientSession. The advantage is that we can still make use of the query parser on the server side. The disadvantage is that we still use the headers as transport.
Do the parsing on the clientside. then we have to take care of values and identifiers within the query that contain ?. eg. cur.execute("SELECT '?''?', ?", 'test') or cur.execute('select 1 as "?""test", ?', 'test')
Transport the prepared statement parameters in the JSON body instead of the headers as a backwards compatible API change, that can be used by all clients instead of just the trino python client

mdesmet · 2023-01-05T08:56:29Z

trino/dbapi.py

+                question_mark_positions.reverse()
+                for index, value in enumerate(reversed(params)):
+                    operation = "".join([operation[:question_mark_positions[index]],
+                                         "'", value, "'",


The value should not be enclosed by single quotes but use the appropriate type string.

mdesmet · 2023-01-05T08:57:01Z

trino/dbapi.py

+                )
+
+                # substitue parameters in query in reversed order
+                question_mark_positions = [index for index, character in enumerate(operation) if character == '?']


Alternative to the reverse trick is to use string slicing. maybe that would be slightly more readable?

hashhar · 2023-01-06T12:20:12Z

Doing on the client side makes the most sense to me depending on what problem we're trying to solve.

1 (directly manipulate ClientSession) seems like a hack.
2 (client side) solves the problem of multiple queries but introduces security concerns.
3 (new backward compatible API) solves the limitation that prepared statements cannot be very large but it doesn't help the fact that multiple queries need to be sent anyway.

I think we can do the following:

Make paramstyle settable and only allow it to be set to either format or qmark with qmark as default.
When we create Connection we "freeze" the paramstyle value i.e. a Connection's paramstyle is constant.
paramstyle = format would do client side bindings, paramstyle = qmark would do server side (what is already done today)
For paramstyle = format it's easier to do client side binding since you no longer need to worry about "finding" all markers and can just use str % values from Python to do the value setting.

For paramstyle = format we need to:

escape all params only - query text needs no escaping - if something needs escaping there (%) then the user should do that - this is what all Python drivers do (I checked psycopg2, pymysql, Oracle).
Do query % escaped_values and execute the query.

Where the handling of qmark and format style needs to be done? - to me it seems the logical place is to have two Cursor classes and override their execute and executemany methods. Alternatively we can keep existing cursor and introduce two methods process_qmark_params and process_format_params and call them from execute and executemany.

The question we need to answer

What's the biggest problem we want to solve?

multiple queries sent when using params?
the size limitation of prepared statements?

If we want to solve 1 then the only solution is what we discuss above.
If we want to solve 2 then the API change that @mdesmet mentioned in his comment is a good solution.

To be honest I don't know which of these problems people actually care about.

The benefit of customizable paramstyle is that we can solve both problems at once but at the cost of some additional code and complexity (both for users and for us).

cla-bot bot added the cla-signed label Jan 5, 2023

damian3031 requested review from hashhar, mdesmet and hovaesco January 5, 2023 08:00

mdesmet reviewed Jan 5, 2023

View reviewed changes

damian3031 added 2 commits January 5, 2023 10:44

Execute prepared statements on client side by default

1d101a5

approach with ClientSession manipulation

dab8de2

damian3031 force-pushed the prepared-statements-client-side branch from 1daa3d4 to dab8de2 Compare January 5, 2023 09:44

damian3031 mentioned this pull request Jan 19, 2023

Disable prepared statements by default #306

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Execute prepared statements on client side by default #310

Execute prepared statements on client side by default #310

damian3031 commented Jan 5, 2023

mdesmet commented Jan 5, 2023 •

edited

mdesmet Jan 5, 2023

mdesmet Jan 5, 2023

hashhar commented Jan 6, 2023 •

edited

Execute prepared statements on client side by default #310

Are you sure you want to change the base?

Execute prepared statements on client side by default #310

Conversation

damian3031 commented Jan 5, 2023

mdesmet commented Jan 5, 2023 • edited

mdesmet Jan 5, 2023

Choose a reason for hiding this comment

mdesmet Jan 5, 2023

Choose a reason for hiding this comment

hashhar commented Jan 6, 2023 • edited

The question we need to answer

mdesmet commented Jan 5, 2023 •

edited

hashhar commented Jan 6, 2023 •

edited