Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent indefinitely hanging query [Python API] #11960

Open
2 tasks done
mustafahasankhan opened this issue May 7, 2024 · 0 comments
Open
2 tasks done

Intermittent indefinitely hanging query [Python API] #11960

mustafahasankhan opened this issue May 7, 2024 · 0 comments

Comments

@mustafahasankhan
Copy link

mustafahasankhan commented May 7, 2024

What happens?

We have encountered an intermittent issue when using the DuckDB Python API where a query runs indefinitely. The query in question is a standard COPY statement that typically takes only a few minutes to execute under normal circumstances. However, in certain instances, the query hangs and does not complete.

The problematic query is as follows:

        query = (
            f"""
        COPY (SELECT * FROM data) TO '%s' (
            FORMAT PARQUET, PARTITION_BY (year, month, day),
            FILENAME_PATTERN "data_{{i}}", OVERWRITE_OR_IGNORE 1,
            ROW_GROUP_SIZE_BYTES '128MB'
        )
        """
            % self.output_path
        )

When the issue occurs, we have observed that the memory usage graph becomes a flat line, indicating that the query is stuck and not progressing.
image

It's worth noting that we use similar queries with the DuckDB Java API, and we have not encountered this issue with the Java API.

Upon reviewing the DuckDB documentation, we couldn't find any configuration options for setting a statement timeout or similar functionality to handle long-running queries.

As a workaround, we are currently using an external method to enforce a timeout on the query execution.
Could you folks please provide guidance on how to handle query timeouts effectively within the DuckDB Python API? If there are any existing features or configuration options that we may have overlooked, please let us know.

To Reproduce

The issue is not reproducible by any particular example rather occurs specifically on Python API, had to file a ticket to indicate the pattern of flat memory graph.

OS:

Debian (Python slim image running in an Argo container on EKS)

DuckDB Version:

0.9.2

DuckDB Client:

Python

Full Name:

Mustafa Hasan Khan

Affiliation:

Atlan

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

Did you include all relevant data sets for reproducing the issue?

No - I cannot share the data sets because they are confidential

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

  • Yes, I have
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants