Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Apache Arrow support #26

Open
charlie430 opened this issue Jun 6, 2022 · 8 comments
Open

Feature Request: Apache Arrow support #26

charlie430 opened this issue Jun 6, 2022 · 8 comments

Comments

@charlie430
Copy link

I'm very interested in Apache Arrow being supported for the in-memory scenario.

Is there any information you can provide on when that might be supported?

Thanks!

@Giorgi
Copy link
Owner

Giorgi commented Jun 26, 2022

It's not implemented at the moment but feel free to send a PR for the API: Arrow Interface

@nazar554
Copy link

Would it be ok to use Apache.Arrow package as a dependency?
It has data types for some objects, like CArrowSchema for duckdb_arrow_schema

@Giorgi
Copy link
Owner

Giorgi commented Mar 25, 2024

Would that be added to the Binding project or Data project?

@nazar554
Copy link

Probably Binding project, but the package might be too heavy for it.
I guess for now I can create a DuckDBArrowSchema with private fields that has compatible struct layout.
So consumers can just cast the pointer to CArrowSchema*.

@Giorgi
Copy link
Owner

Giorgi commented Mar 25, 2024

Honestly, I haven't looked much into Arrow and can't tell now for sure. Feel free to join DuckDB Discord, we can discuss it in more detail in the dotnet channel.

nazar554 added a commit to nazar554/DuckDB.NET that referenced this issue Mar 25, 2024
@CurtHagenlocher
Copy link

CurtHagenlocher commented Apr 25, 2024

An alternative for getting data as Arrow could be to use the C# ADBC implementation with the generic driver importer. This code is not very mature yet, but you can run queries and get the result back as Arrow. Example:

            using AdbcDriver duckdb = CAdbcDriverImporter.Load("D:\\testdata\\duckdb.dll", "duckdb_adbc_init");
            using AdbcDatabase db = duckdb.Open(new Dictionary<string, string> { { "path", "d:/testdata/ddbt.db"} });
            using AdbcConnection cn = db.Connect(null);
            using AdbcStatement stmt = cn.CreateStatement();
            stmt.SqlQuery = "CREATE TABLE integers(foo INTEGER, bar INTEGER);";
            stmt.ExecuteUpdate();

            stmt.SqlQuery = "INSERT INTO integers VALUES (3, 4), (5, 6), (7, 8);";
            stmt.ExecuteUpdate();

            stmt.SqlQuery = "SELECT * from integers";
            var results = stmt.ExecuteQuery();

            // results.Stream is an IArrowArrayStream, which lets you get the schema
            // and a set of record batches

NOTE that this code is not super mature and we haven't yet reached a 1.0 release.

@Giorgi
Copy link
Owner

Giorgi commented Apr 26, 2024

Nice! I think it would be great if there was a way to go from DuckDBConnection (provided by this library) to AdbcConnection. I can expose the underlying pointer to the database (obtained by duckdb_open and duckdb_connect) but looks like there is no way to convert such pointer to an AdbcConnection object.

@CurtHagenlocher
Copy link

I know next to nothing about DuckDB internals, so I have no idea how plausible something like this is. For ADBC, we need an array of function pointers that defines the ADBC driver API -- this is what duckdb_adbc_init is initializing -- and the connection is then roughly an indirected opaque pointer that gets passed to some of these function pointers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants