-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
duckplyr with remote tables in a duckdb connection #86
Comments
Thanks, good question. I see two components:
New projectNo need to bother with connections. Start with data frames, use Internally, duckplyr uses a DBI connection to duckdb, but this is not meant to be accessed by the user. There is currently no way to specify the location of the database file for this internal connection. Do you think we need an option for this to avoid keeping everything in memory? Existing projectBecause of the internal DBI connection, it is difficult to mix dbplyr code and duckplyr code. I wonder how to make this more seamless. Ideally, Sketch (with a dummy relational object and unexported functions): con <- DBI::dbConnect(duckdb::duckdb(), "foo.db") # db on disk
DBI::dbWriteTable(con, name = "iris", value = iris)
tbl <-
dplyr::tbl(con, "iris") |>
dplyr::filter(Petal.Length <= 1.2)
tbl |> dplyr::show_query()
#> <SQL>
#> SELECT iris.*
#> FROM iris
#> WHERE ("Petal.Length" <= 1.2)
# Dummy rel object
rel <- duckdb:::rel_from_df(con, data.frame(a = integer()))
duckdb:::rel_sql(rel, dbplyr::sql_render(tbl))
#> # A tibble: 4 × 5
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 4.3 3 1.1 0.1 setosa
#> 2 5.8 4 1.2 0.2 setosa
#> 3 4.6 3.6 1 0.2 setosa
#> 4 5 3.2 1.2 0.2 setosa Created on 2023-11-20 with reprex v2.0.2 |
Hello @krlmlr,
Yes I think this would be helpful. Perhaps an optional |
We allow multiple The opposite may be a bit trickier. @Tmonster: are there any obstacles combining multiple relational objects that were created from different connections, e.g., with joins? I was thinking about an option to configure the default connection too, but passing the connection object may be the easiest. |
I'm a bit confused. For my clarity, remote means a duckdb table in a different duckdb database file right? Or connection I guess? Since duckplyr maintains it's own connection to a duckdb database? I'll have to look into it, but combining relational objects from two different duckdb connections might be difficult. I think it might be easier to integrate the attach/detach functionality that duckdb has. If a user has other existing duckdb database files and they want to use duckplyr functionality without calling Would this work? |
If we can't mix and match relational objects from different connections, we should check that they are the same for joins and other operations. We'll also take a look into connections and database storage modes. |
For joins we already check if they are the same. See |
Hello,
Trying out duckplyr with a remote table results in the following error. Is
duckplyr
compatible with remote tables in a duckdb connection? Not sure if the idea is to usedbplyr
when working with remote tables, andduckplyr
when working with data in memory. Feedback much appreciated.Thank you.
-Ed
Error:
The text was updated successfully, but these errors were encountered: