SqlMagic.autolimit doesn't add a limit clause to the query #1006

takikadiri · 2024-03-29T13:52:18Z

What happens?

We're using jupysql to query a trino cluster through jupyter notebook. I want to autolimit the query results that we query to the cluster, as it's intended to be used for an interactive analytics workload.

I expect the SqlMagic.autolimit to add a LIMIT clause to the query, but it doesn't, i can see the original query in the Trino Admin UI.

To Reproduce

Create a trino sqlalchemy connection. trino_connection = create_engine("trino://trino_cluster....")
set an autolimit. %config SqlMagic.autolimit=500
Use the connection: %sql trino_connection
Run a query: %sql SELECT * FROM BIG_TABLE
Get the result sets or See the query in the trino UI. You'll get the SELECT * FROM BIG_TABLE without the LIMIT Clause

OS:

Linux

JupySQL Version:

0.10.10

Full Name:

Takieddine KADIRI

Affiliation:

Société Générale Assurances

The text was updated successfully, but these errors were encountered:

edublancas · 2024-03-29T14:17:00Z

yeah, the effect of autolimit is not to add a LIMIT to the statement but to send the query and only fetch the top X results, are you seeing performance issues?

you can see the relevant code here.

the main reason not to add LIMIT, is that some DBs don't support it so we'd have to figure out a way to make it portable. I guess it wouldn't be too hard using sqlglot, but given that we support many databases, adding stuff like this ends up breaking features for some users.

perhaps a more reasonable solution would be to add LIMIT only to the dbs that we're sure they support it, but even detecting which db the user is connected to is challenging and we've encountered edge cases.

bottom line is that we've decided not to modify user-submitted queries to prevent these issues. if you have time, feel free to open a PR, perhaps there is another approach that I'm overlooking

takikadiri · 2024-03-30T14:04:50Z

Ah i see, The doc mislead me. Thank you for the quick feedback.

For the interactive analytics workload, we're also using Apache Superset, that manage to systematically LIMIT the query. That's why i was looking for this in jupysql

Maybe i could use jupysql for a slighly different use case, where the desired behavior is to collect the entire query results for further analysis with pandas or for doing joins with another data sources. I'll tests the user bahaviors and the cluster performance with jupysql and autopandas=True and let you know if the LIMIT clause is really needed here.

edublancas · 2024-03-30T14:06:00Z

sounds good, thanks for your feedback!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SqlMagic.autolimit doesn't add a limit clause to the query #1006

SqlMagic.autolimit doesn't add a limit clause to the query #1006

takikadiri commented Mar 29, 2024

edublancas commented Mar 29, 2024 •

edited

takikadiri commented Mar 30, 2024 •

edited

edublancas commented Mar 30, 2024

SqlMagic.autolimit doesn't add a limit clause to the query #1006

SqlMagic.autolimit doesn't add a limit clause to the query #1006

Comments

takikadiri commented Mar 29, 2024

What happens?

To Reproduce

OS:

JupySQL Version:

Full Name:

Affiliation:

edublancas commented Mar 29, 2024 • edited

takikadiri commented Mar 30, 2024 • edited

edublancas commented Mar 30, 2024

edublancas commented Mar 29, 2024 •

edited

takikadiri commented Mar 30, 2024 •

edited