You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have encountered an intermittent issue when using the DuckDB Python API where a query runs indefinitely. The query in question is a standard COPY statement that typically takes only a few minutes to execute under normal circumstances. However, in certain instances, the query hangs and does not complete.
The problematic query is as follows:
query = (
f"""
COPY (SELECT * FROM data) TO '%s' (
FORMAT PARQUET, PARTITION_BY (year, month, day),
FILENAME_PATTERN "data_{{i}}", OVERWRITE_OR_IGNORE 1,
ROW_GROUP_SIZE_BYTES '128MB'
)
"""
% self.output_path
)
When the issue occurs, we have observed that the memory usage graph becomes a flat line, indicating that the query is stuck and not progressing.
It's worth noting that we use similar queries with the DuckDB Java API, and we have not encountered this issue with the Java API.
Upon reviewing the DuckDB documentation, we couldn't find any configuration options for setting a statement timeout or similar functionality to handle long-running queries.
As a workaround, we are currently using an external method to enforce a timeout on the query execution.
Could you folks please provide guidance on how to handle query timeouts effectively within the DuckDB Python API? If there are any existing features or configuration options that we may have overlooked, please let us know.
To Reproduce
The issue is not reproducible by any particular example rather occurs specifically on Python API, had to file a ticket to indicate the pattern of flat memory graph.
OS:
Debian (Python slim image running in an Argo container on EKS)
DuckDB Version:
0.9.2
DuckDB Client:
Python
Full Name:
Mustafa Hasan Khan
Affiliation:
Atlan
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
No - I cannot share the data sets because they are confidential
Did you include all code required to reproduce the issue?
Yes, I have
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?
Yes, I have
The text was updated successfully, but these errors were encountered:
What happens?
We have encountered an intermittent issue when using the DuckDB Python API where a query runs indefinitely. The query in question is a standard COPY statement that typically takes only a few minutes to execute under normal circumstances. However, in certain instances, the query hangs and does not complete.
The problematic query is as follows:
When the issue occurs, we have observed that the memory usage graph becomes a flat line, indicating that the query is stuck and not progressing.
It's worth noting that we use similar queries with the DuckDB Java API, and we have not encountered this issue with the Java API.
Upon reviewing the DuckDB documentation, we couldn't find any configuration options for setting a statement timeout or similar functionality to handle long-running queries.
As a workaround, we are currently using an external method to enforce a timeout on the query execution.
Could you folks please provide guidance on how to handle query timeouts effectively within the DuckDB Python API? If there are any existing features or configuration options that we may have overlooked, please let us know.
To Reproduce
The issue is not reproducible by any particular example rather occurs specifically on Python API, had to file a ticket to indicate the pattern of flat memory graph.
OS:
Debian (Python slim image running in an Argo container on EKS)
DuckDB Version:
0.9.2
DuckDB Client:
Python
Full Name:
Mustafa Hasan Khan
Affiliation:
Atlan
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
No - I cannot share the data sets because they are confidential
Did you include all code required to reproduce the issue?
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?
The text was updated successfully, but these errors were encountered: