Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arrow c data interface #79

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

Alex-PLACET
Copy link
Contributor

@Alex-PLACET Alex-PLACET commented Apr 18, 2024

Fix #69

@Alex-PLACET Alex-PLACET marked this pull request as ready for review April 19, 2024 14:49
@Alex-PLACET Alex-PLACET force-pushed the arrow_c_data_interface branch 2 times, most recently from c17a20d to 65eacf4 Compare April 29, 2024 07:41
include/sparrow/c_interface.hpp Show resolved Hide resolved
include/sparrow/c_interface.hpp Outdated Show resolved Hide resolved
include/sparrow/c_interface.hpp Outdated Show resolved Hide resolved
include/sparrow/c_interface.hpp Outdated Show resolved Hide resolved
include/sparrow/c_interface.hpp Outdated Show resolved Hide resolved
include/sparrow/c_interface.hpp Outdated Show resolved Hide resolved
include/sparrow/c_interface.hpp Outdated Show resolved Hide resolved
include/sparrow/c_interface.hpp Show resolved Hide resolved
test/test_c_data_interface.cpp Show resolved Hide resolved
test/test_c_data_interface.cpp Show resolved Hide resolved
Copy link

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool! Just some notes from writing a few things along these lines in nanoarrow 🙂

include/sparrow/c_interface.hpp Outdated Show resolved Hide resolved
include/sparrow/c_interface.hpp Outdated Show resolved Hide resolved
*/
template <template <typename> class Allocator>
requires sparrow::allocator<Allocator<char>>
std::unique_ptr<ArrowSchema> make_arrow_schema(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would highly recommend avoiding std::unique_ptr<ArrowSchema/ArrowArray>, unless used with a deleter that ensures that any unreleased array that it is pointing to is released when the unique pointer is deleted. In cudf they use a unique pointer with a deleter:

https://github.com/rapidsai/cudf/blob/e58838b6cc820fc89f1f67eb9117a3ee6ddeaa47/cpp/src/interop/to_arrow_device.cu#L601-L604

...in nanoarrow we have a UniqueArray/Schema class that wraps a stack-allocated ArrowArray/Schema:

https://github.com/apache/arrow-nanoarrow/blob/a0632177c27cb52880c62533fcd441613f41815b/src/nanoarrow/nanoarrow.hpp#L265-L272

..but the idea is the same: always make a home for the struct such that it is guaranteed to be cleaned up, and "move" it between safe homes (e.g., using something like https://github.com/apache/arrow-nanoarrow/blob/a0632177c27cb52880c62533fcd441613f41815b/src/nanoarrow/nanoarrow_types.h#L334-L340 ).

(Apologies if there's some C++ magic going on here that already ensures this)

test/test_c_data_interface.cpp Outdated Show resolved Hide resolved
child->release(child);
SPARROW_ASSERT_TRUE(child->release == nullptr);
}
delete child;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the implementations of an ArrowArray/ArrowSchema I've read choose to own their own memory for the ArrowArray/ArrowSchema structs. In this case, it might mean an std::move() of the std::vector<std::unique_ptr<ArrowArray>> into the private data and rely on the C++ deleter to call the release callbacks (see my comment below about always making sure that std::unique_ptr<ArrowThing> is guaranteed to call the release callback if present).

Alternatively, one could use new ArrowArray[] and something like https://github.com/apache/arrow-nanoarrow/blob/a0632177c27cb52880c62533fcd441613f41815b/src/nanoarrow/nanoarrow_types.h#L334-L340 to "move" the struct from foreign memory.

@Alex-PLACET Alex-PLACET marked this pull request as draft May 6, 2024 06:50
include/sparrow/c_interface.hpp Outdated Show resolved Hide resolved
include/sparrow/c_interface.hpp Outdated Show resolved Hide resolved
include/sparrow/c_interface.hpp Outdated Show resolved Hide resolved
include/sparrow/c_interface.hpp Outdated Show resolved Hide resolved
include/sparrow/c_interface.hpp Outdated Show resolved Hide resolved
include/sparrow/c_interface.hpp Outdated Show resolved Hide resolved
include/sparrow/c_interface.hpp Outdated Show resolved Hide resolved
include/sparrow/c_interface.hpp Outdated Show resolved Hide resolved
include/sparrow/c_interface.hpp Outdated Show resolved Hide resolved
@Alex-PLACET Alex-PLACET marked this pull request as ready for review May 17, 2024 11:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement the Arrow C data interface
4 participants