Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DO NOT MERGE: Refactor dataframe where tests, reduce IF conditional compilation blocks #386

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

brendan-ward
Copy link
Member

Split up dataframe read with where test to try and probe at #384

Also tries to remove IF blocks marked as deprecated by Cython, except for those still needed for GDAL version specific functions.

@brendan-ward
Copy link
Member Author

After splicing in changes from #349 to correctly release the Arrow stream, I'm no longer able to easily reproduce the segfaults on repeated reruns of CI. Given that these were intermittent, it's hard to say for certain, but per the GDAL docs, we should have been releasing the stream:

On successful return, and when the stream interfaces is no longer needed, it must must be freed...

@brendan-ward
Copy link
Member Author

Clarification, it seems that the stream.release is not being called (is NULL) - presumably because the stream was used in the pyarrow reader? But free(stream) is getting called. Not sure why either of these would avoid an unpredictable segfault, it seems more likely that initializing the .release method to NULL on the ArrowArrayStream prevents calling a random pointer (per the comment), even though the .release method was not defined prior to these changes (maybe something expecting that interface was expecting a .release anyway).

@jorisvandenbossche
Copy link
Member

I merged #349, so let's see if the failures now disappear on main and other PRs.

Although to be honest I also don't fully understand how the changes would fix the crashes ..

Clarification, it seems that the stream.release is not being called (is NULL) - presumably because the stream was used in the pyarrow reader?

Indeed, that's the idea. When we pass the struct to pyarrow, they will "move" it and take ownership of it (i.e. being responsible of releasing it when done reading the data) by setting the release callback to NULL, to ensure we don't call that anymore.
(https://arrow.apache.org/docs/dev/format/CDataInterface.html#moving-an-array)

@jorisvandenbossche
Copy link
Member

Hmm, still a crash on the merge commit on main .. https://github.com/geopandas/pyogrio/actions/runs/8718790747/job/23916758253

@brendan-ward brendan-ward changed the title Refactor dataframe where tests, reduce IF conditional compilation blocks DO NOT MERGE: Refactor dataframe where tests, reduce IF conditional compilation blocks May 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants