-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arrow c data interface #79
base: main
Are you sure you want to change the base?
Arrow c data interface #79
Conversation
92bf9d2
to
83312b4
Compare
c17a20d
to
65eacf4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool! Just some notes from writing a few things along these lines in nanoarrow 🙂
include/sparrow/c_interface.hpp
Outdated
*/ | ||
template <template <typename> class Allocator> | ||
requires sparrow::allocator<Allocator<char>> | ||
std::unique_ptr<ArrowSchema> make_arrow_schema( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would highly recommend avoiding std::unique_ptr<ArrowSchema/ArrowArray>
, unless used with a deleter that ensures that any unreleased array that it is pointing to is released when the unique pointer is deleted. In cudf they use a unique pointer with a deleter:
...in nanoarrow we have a UniqueArray/Schema
class that wraps a stack-allocated ArrowArray/Schema
:
..but the idea is the same: always make a home for the struct such that it is guaranteed to be cleaned up, and "move" it between safe homes (e.g., using something like https://github.com/apache/arrow-nanoarrow/blob/a0632177c27cb52880c62533fcd441613f41815b/src/nanoarrow/nanoarrow_types.h#L334-L340 ).
(Apologies if there's some C++ magic going on here that already ensures this)
include/sparrow/c_interface.hpp
Outdated
child->release(child); | ||
SPARROW_ASSERT_TRUE(child->release == nullptr); | ||
} | ||
delete child; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the implementations of an ArrowArray
/ArrowSchema
I've read choose to own their own memory for the ArrowArray
/ArrowSchema
structs. In this case, it might mean an std::move()
of the std::vector<std::unique_ptr<ArrowArray>>
into the private data and rely on the C++ deleter to call the release callbacks (see my comment below about always making sure that std::unique_ptr<ArrowThing>
is guaranteed to call the release callback if present).
Alternatively, one could use new ArrowArray[]
and something like https://github.com/apache/arrow-nanoarrow/blob/a0632177c27cb52880c62533fcd441613f41815b/src/nanoarrow/nanoarrow_types.h#L334-L340 to "move" the struct from foreign memory.
a131967
to
19bf006
Compare
Fix #69