Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add <JDBCConnection> backend #1359

Open
nviraj opened this issue Sep 9, 2023 · 12 comments
Open

Add <JDBCConnection> backend #1359

nviraj opened this issue Sep 9, 2023 · 12 comments
Labels
backend 🕺 feature a feature request or enhancement

Comments

@nviraj
Copy link

nviraj commented Sep 9, 2023

Hi,

I am using “pool” with “RJDBC” and “dbplyr” packages to connect to Athena tables.
However I see the following warning.

Warning: <JDBCConnection> uses an old dbplyr interface
ℹ Please install a newer version of the package or contact the maintainer

This has been slightly tricky to pinpoint. Initially I thought it had something to do with RJDBC but it does not seem to be the case, as seen here.
I subsequently reported the issue to Pool, but it does not seem to be originating from there as well as I could replicate the issue using a DBI connection as well as Pool.

It was easier to generate a reprex using PostgreSQL Docker and JDBC, and the issue does not seem to be database specific. Please let me know if I am doing something incorrectly or need to report this elsewhere.

Steps to reproduce:

1. Create docker image to connect to using the official image:
docker run --network=host --name postgresdb_reprex -e POSTGRES_PASSWORD=mysecretpassword -e POSTGRES_USER=postgres -d postgres

2. Download JAR file from here
Alternatively you can use the attached Project which contains the script and the JAR file for Windows.
pool_issue.zip

3. Run reprex code
reprex.R contains the actual code. Running through
reprex::reprex(input = "reprex.R", venue = "gh", session_info = TRUE)

Reprex Content:

library(DBI)
#> Warning: package 'DBI' was built under R version 4.3.1
library(pool)
#> Warning: package 'pool' was built under R version 4.3.1
library(dplyr)
#> Warning: package 'dplyr' was built under R version 4.3.1
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(RJDBC)
#> Warning: package 'RJDBC' was built under R version 4.3.1
#> Loading required package: rJava

# Connection details for docker image of postgres
connection_url <- "jdbc:postgresql://localhost:5432/postgres"
username <- "postgres"
password <- "mysecretpassword"
# Where is the JAR file stored
# JDBC downloaded through https://jdbc.postgresql.org/download/
jdbc_path <- "postgresql-42.6.0.jar"

# Create a JDBC driver
driver <-
  RJDBC::JDBC(
    driverClass = "org.postgresql.Driver",
    classPath = jdbc_path,
    identifier.quote = "'"
  )

# Establish connection through DBI
conn <- DBI::dbConnect(driver, connection_url,
  user = username,
  password = password
)
# Write table to database using DBI
if (!DBI::dbExistsTable(conn, "mtcars")) {
  DBI::dbWriteTable(conn, "mtcars", mtcars)
} else {
  base::message("Table already exists!")
}

# Create a Pool connection
con <- pool::dbPool(
  drv = driver, url = connection_url,
  user = username,
  password = password
)

# Connect to table
mtcars_db <- dplyr::tbl(con, "mtcars")
#> Warning: <JDBCConnection> uses an old dbplyr interface
#> ℹ Please install a newer version of the package or contact the maintainer
#> This warning is displayed once every 8 hours.

# Run sample query
mtcars_db %>%
  dplyr::filter(cyl == 8) %>%
  utils::head()
#> # Source:   SQL [6 x 11]
#> # Database: JDBCConnection
#>     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#> 2  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
#> 3  16.4     8  276.   180  3.07  4.07  17.4     0     0     3     3
#> 4  17.3     8  276.   180  3.07  3.73  17.6     0     0     3     3
#> 5  15.2     8  276.   180  3.07  3.78  18       0     0     3     3
#> 6  10.4     8  472    205  2.93  5.25  18.0     0     0     3     4

Created on 2023-09-04 with reprex v2.0.2

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.3.0 (2023-04-21 ucrt)
#>  os       Windows 11 x64 (build 22621)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  English_India.utf8
#>  ctype    English_India.utf8
#>  tz       Asia/Calcutta
#>  date     2023-09-04
#>  pandoc   3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  ! package     * version date (UTC) lib source
#>    blob          1.2.4   2023-03-17 [1] CRAN (R 4.3.1)
#>    cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.1)
#>    DBI         * 1.1.3   2022-06-18 [1] CRAN (R 4.3.1)
#>    dbplyr        2.3.3   2023-07-07 [1] CRAN (R 4.3.1)
#>    digest        0.6.33  2023-07-07 [1] CRAN (R 4.3.1)
#>    dplyr       * 1.1.2   2023-04-20 [1] CRAN (R 4.3.1)
#>    evaluate      0.21    2023-05-05 [1] CRAN (R 4.3.1)
#>    fansi         1.0.4   2023-01-22 [1] CRAN (R 4.3.1)
#>    fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.1)
#>    fs            1.6.3   2023-07-20 [1] CRAN (R 4.3.1)
#>    generics      0.1.3   2022-07-05 [1] CRAN (R 4.3.1)
#>    glue          1.6.2   2022-02-24 [1] CRAN (R 4.3.1)
#>    htmltools     0.5.6   2023-08-10 [1] CRAN (R 4.3.1)
#>    knitr         1.43    2023-05-25 [1] CRAN (R 4.3.1)
#>    later         1.3.1   2023-05-02 [1] CRAN (R 4.3.1)
#>    lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.3.1)
#>    magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.3.1)
#>    pillar        1.9.0   2023-03-22 [1] CRAN (R 4.3.1)
#>    pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.3.1)
#>    pool        * 1.0.1   2023-02-21 [1] CRAN (R 4.3.1)
#>    purrr         1.0.2   2023-08-10 [1] CRAN (R 4.3.1)
#>    R.cache       0.16.0  2022-07-21 [1] CRAN (R 4.3.1)
#>    R.methodsS3   1.8.2   2022-06-13 [1] CRAN (R 4.3.0)
#>    R.oo          1.25.0  2022-06-12 [1] CRAN (R 4.3.0)
#>    R.utils       2.12.2  2022-11-11 [1] CRAN (R 4.3.1)
#>    R6            2.5.1   2021-08-19 [1] CRAN (R 4.3.1)
#>    Rcpp          1.0.11  2023-07-06 [1] CRAN (R 4.3.1)
#>    reprex        2.0.2   2022-08-17 [1] CRAN (R 4.3.1)
#>  D rJava       * 1.0-6   2021-12-10 [1] CRAN (R 4.3.0)
#>    RJDBC       * 0.2-10  2022-03-24 [1] CRAN (R 4.3.1)
#>    rlang         1.1.1   2023-04-28 [1] CRAN (R 4.3.1)
#>    rmarkdown     2.24    2023-08-14 [1] CRAN (R 4.3.1)
#>    rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 4.3.1)
#>    sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.1)
#>    styler        1.10.2  2023-08-29 [1] CRAN (R 4.3.1)
#>    tibble        3.2.1   2023-03-20 [1] CRAN (R 4.3.1)
#>    tidyselect    1.2.0   2022-10-10 [1] CRAN (R 4.3.1)
#>    utf8          1.2.3   2023-01-31 [1] CRAN (R 4.3.1)
#>    vctrs         0.6.3   2023-06-14 [1] CRAN (R 4.3.1)
#>    withr         2.5.0   2022-03-03 [1] CRAN (R 4.3.1)
#>    xfun          0.40    2023-08-09 [1] CRAN (R 4.3.1)
#>    yaml          2.3.7   2023-01-23 [1] CRAN (R 4.3.0)
#> 
#>  [1] C:/Users/viraj/AppData/Local/R/win-library/4.3
#>  [2] C:/Program Files/R/R-4.3.0/library
#> 
#>  D ── DLL MD5 mismatch, broken installation.
#> 
#> ──────────────────────────────────────────────────────────────────────────────

4. Stop and remove Docker image

docker stop postgresdb_reprex
docker rm -f postgresdb_reprex

Please let me know if you need any further details. Thanks!

@shearerpmm
Copy link

I did some digging. The warning is referring entirely to an issue in dbplyr itself. It essentially saying "the developers of dbplyr haven't yet written a backend for JDBC that is up to their own current standards".

Details

  • The warning is generated here, in dbplyr_fallback.
  • It happens because JDBC has no "edition 2" backend implemented in dbplyr. When the generic dbplyr_edition is called here, ideally there's supposed to be an edition 2 backend implemented. If an edition 2 backend is implemented, for example in MySQL, the S3 method for dbplyr_edition returns 2, and dbplyr_fallback doesn't generate a warning.
  • When no backend is implemented, dbplyr_edition dispatches to the default method which returns 1 and triggers the else in this if-else and gets you the warning.

@mgirlich
Copy link
Collaborator

Currently, dbplyr doesn't support JDBC but only ODBC.
@shearerpmm @nviraj does any of you maybe want to create a PR that adds JDBC support?

@shearerpmm
Copy link

shearerpmm commented Sep 18, 2023

I actually would really like to. I'm not sure how big of a lift it is for me, my experience in developing R packages is pretty limited.

Despite not having a specific backend, I've found that dbplyr already works fantastically well over JDBC. So far the only issue I've found is due to a problem in RJDBC, not dbplyr.

There are some places where the generated SQL doesn't work, but I'm not sure that can be solved with a JDBC backend, because JDBC is just a way to connect to a variety of databases with different SQL dialects. In my recent experience, I find SQL generation glitches can be easily routed around by using sql() and manually coding anything dialect-specific.

From this perspective, I'm not sure there a JDBC backend has anything of value to do other than silence this warning. And maybe warn the user that compute() currently doesn't work due to the RJDBC issue mentioned above.

@hadley
Copy link
Member

hadley commented Nov 2, 2023

@shearerpmm I think those are still valuable contributions 😄

@hadley hadley added feature a feature request or enhancement backend 🕺 labels Nov 2, 2023
@hadley hadley changed the title Warning: <JDBCConnection> uses an old dbplyr interface Add <JDBCConnection> backend Nov 2, 2023
@shearerpmm
Copy link

The little I'm able to do is worth my time. IMHO dbplyr is the best tool available today for building complex data pipelines in SQL databases. I know about the alternatives and the competition is not close.

@juniperlsimonis
Copy link

Big +1 on this request and dbplyr being invaluable

I've had to jump over to RJDBC to handle testing an enterprise db migration from an old DB2 to a new SQL Server. i would have used odbc connections, but the recent issues with odbc and db2 meant we were stuck using R 4.1.1 and thus hit memory limits etc (dealing with many tables that have tens of millions of entries and a dozen rows)

there are definitely still a lot of db2 databases out there and I imagine that other folks have had to do similar transitions if they wanted to update R itself, so would expect that there's likely a pick-up in need compared to a few years ago when you could use any old ODBC to connect to a db2 so wouldn't hit this problem.

so far i've been able to generally just hot-swap the source between the db2 and sqlserver but i've just run into this issue of not having a "proper" backend as i've been trying to push more compute towards the database. in particular, i'm trying to do things like inlining tables to avoid collecting the massive dataset just to do a join (rather, i'd like to inline the table and then join it before running a collect statement), and this is what isn't working without the backend. (getting the same warning as OP posted).

i'm not really adept at the backend coding, although i would be up for learning to support this, but i definitely have multiple use cases and would be happy to test things in the wild or provide reprexes if either would be helpful

@hadley
Copy link
Member

hadley commented Jan 22, 2024

@juniperlsimonis what are the problems with odbc and DB2? Could you file an issue in https://github.com/r-dbi/odbc?

@juniperlsimonis
Copy link

@hadley see ibm's support page there have been multiple issues in odbc that were filed like this one . the solutions folks have been able to implement haven't worked on my end given enterprise limitations.

@hadley
Copy link
Member

hadley commented Feb 22, 2024

To get rJava working on my arm64 mac:

brew install openjdk
# follow sudo ln advice
sudo R CMD javareconf

@hadley
Copy link
Member

hadley commented Feb 22, 2024

I don't think there's much dbplyr can do with RJDBC since it doesn't appear to expose a way to figure out the underlying database type from the driver/connection:

library(RJDBC)
#> Loading required package: DBI
#> Loading required package: rJava

# Download from https://github.com/xerial/sqlite-jdbc
class_path <- c(
  "~/Downloads/sqlite-jdbc-3.45.1.0.jar",
  "~/Downloads/slf4j-api-1.7.36.jar"
)
con <- DBI::dbConnect(
  JDBC("org.sqlite.JDBC", class_path),
  url = "jdbc:sqlite::memory:"
)

dbGetInfo(con)
#> list()

Created on 2024-02-22 with reprex v2.1.0

@shearerpmm
Copy link

Since the user generally knows, could they have the option to tell dbplyr what the database is when the connection is JDBC? I thought all it really affects is the SQL dialect used?

@hadley
Copy link
Member

hadley commented Feb 22, 2024

@shearerpmm I think we could in principle do that, but the idea that the connection defines the database type is deeply intwined in the guts of dbplyr, so fixing that would be quite a lot of work.

Yes, it only affects the SQL dialect used but given that dbplyr is all about converting R code to SQL, that's really important!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend 🕺 feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

5 participants