Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple left joins fails in MS Access #1417

Open
dmenne opened this issue Nov 30, 2023 · 10 comments
Open

Multiple left joins fails in MS Access #1417

dmenne opened this issue Nov 30, 2023 · 10 comments
Labels
bug an unexpected problem or unintended behavior verb trans 🤖 Translation of dplyr verbs to SQL

Comments

@dmenne
Copy link

dmenne commented Nov 30, 2023

dbplyr 2.4.0.9000 Microsoft Access 2019, 64 bit
(Edited 1.12.2023 to make case clearer)

I have seen #1130, but I am using the github version of tidyverse

library(dbplyr)
library(dplyr, warn.conflicts = FALSE)
a <- lazy_frame(patient = "a", adate = "11.01.2020", con = simulate_access())
b <- lazy_frame(patient = "a", bdate = "12.01.2020", con = simulate_access())
c <- lazy_frame(patient = "a", cdate = "13.01.2020", con = simulate_access())

a  |> left_join(b) |>
  select(patient, adate, bdate )

Works after replacing "FROM df" by FROM 'a' etc.

SELECT `df_LHS`.*, `bdate`
FROM `a` AS `df_LHS`
LEFT JOIN `b` AS `df_RHS`
ON (`df_LHS`.`patient` = `df_RHS`.`patient`)

One more join

a  |> left_join(b) |> left_join(c) |>
  select(patient, adate, bdate, cdate )

Fails after replacing "FROM df"

SELECT `df...1`.*, `bdate`, `cdate`
FROM `a` AS `df...1`
LEFT JOIN `b` AS `df...2`
ON (`df...1`.`patient` = `df...2`.`patient`)
LEFT JOIN `c` AS `df...3`
ON (`df...1`.`patient` = `df...3`.`patient`)

which looks reasonable . Run it or paste it directly into an Access Query, it errors

Syntax error (missing operator) in Query
(`df...1`.`patient` = `df...2`.`patient`)
LEFT JOIN `df` AS `df...3`
  ON (`df...1`.`patient` = `df...3`.`patient`)

The problem is in the periods. This is ok:

SELECT `df1`.* FROM `a` AS `df1`

This fails

SELECT `df.1`.* FROM `a` AS `df.1`

Workaround: Use a collect() between the joins. Ok for smaller data sets.

Cross-test: a manually created join is fine:

SELECT a.patient, a.adate, b.bdate, c.cdate
FROM (a LEFT JOIN b ON a.patient = b.patient) LEFT JOIN c ON b.patient = c.patient;

Sample test database attached

testdb.zip

--- Another cross-test ---

With sqlite, the following works fine

SELECT `a...1`.*, `bdate`, `cdate`
FROM `a` AS `a...1`
LEFT JOIN `b` AS `b...2`
  ON (`a...1`.`patient` = `b...2`.`patient`)
LEFT JOIN `c` AS `c...3`
  ON (`a...1`.`patient` = `c...3`.`patient`)

@dmenne dmenne changed the title Access error for multiple left joins - confused by backticks? Access error for multiple left joins - confused by periods? Dec 1, 2023
@dmenne
Copy link
Author

dmenne commented Dec 2, 2023

This is a working version of SQL closest to dplyr's syntax. Use ___ instead of ..., and change the opening bracket to the location after FROM. See also SO

SELECT `df___1`.*, `bdate`, `cdate`
FROM (`a` AS `df___1`
LEFT JOIN `b` AS `df___2`
ON `df___1`.`patient` = `df___2`.`patient`)
LEFT JOIN `c` AS `df___3`
ON (`df___1`.`patient` = `df___3`.`patient`)

@hadley
Copy link
Member

hadley commented Dec 19, 2023

Minimal reprex:

library(dbplyr)
library(dplyr, warn.conflicts = FALSE)

a <- lazy_frame(x = "a", y1 = "1", con = simulate_access())
b <- lazy_frame(x = "a", y2 = "2", con = simulate_access())
c <- lazy_frame(x = "a", y3 = "3", con = simulate_access())

a |> 
  left_join(b) |> 
  left_join(c)
#> Joining with `by = join_by(x)`
#> Joining with `by = join_by(x)`
#> <SQL>
#> SELECT `df...1`.*, `y2`, `y3`
#> FROM `df` AS `df...1`
#> LEFT JOIN `df` AS `df...2`
#>   ON (`df...1`.`x` = `df...2`.`x`)
#> LEFT JOIN `df` AS `df...3`
#>   ON (`df...1`.`x` = `df...3`.`x`)

Created on 2023-12-19 with reprex v2.0.2.9000

It sounds like this is an issue with Access not allowing . in identifiers, even when correctly escaped. These identifiers are generated in generate_join_table_names(), which currently doesn't have any way to customise for different backends.

But you don't get the problem if the tables already have different names:

library(dbplyr)
library(dplyr, warn.conflicts = FALSE)

a <- lazy_frame(x = "a", y1 = "1", con = simulate_access(), .name = "a")
b <- lazy_frame(x = "a", y2 = "2", con = simulate_access(), .name = "b")
c <- lazy_frame(x = "a", y3 = "3", con = simulate_access(), .name = "c")

a |> 
  left_join(b, by = "x") |> 
  left_join(c, by = "x")
#> <SQL>
#> SELECT `a`.*, `y2`, `y3`
#> FROM `a`
#> LEFT JOIN `b`
#>   ON (`a`.`x` = `b`.`x`)
#> LEFT JOIN `c`
#>   ON (`a`.`x` = `c`.`x`)

Created on 2023-12-19 with reprex v2.0.2.9000

Can you give more details about how you ended up with this problem in actual code? It might be possible to work around it by providing specific names somewhere.

@hadley hadley changed the title Access error for multiple left joins - confused by periods? Multiple left joins fails in MS Access Dec 19, 2023
@hadley hadley added bug an unexpected problem or unintended behavior verb trans 🤖 Translation of dplyr verbs to SQL labels Dec 19, 2023
@dmenne
Copy link
Author

dmenne commented Dec 22, 2023

For a realistic example, see the testdb.zip Access file I included.

@hadley
Copy link
Member

hadley commented Dec 22, 2023

@dmenne what dplyr code are you using?

@dmenne
Copy link
Author

dmenne commented Dec 23, 2023


library(odbc)
library(dplyr)

con = dbConnect(odbc::odbc(), .connection_string =
  "Driver={Microsoft Access Driver (*.mdb, *.accdb)};Dbq=./testdb.accdb")
a = tbl(con, "a")
b = tbl(con, "b")
c = tbl(con, "c")

# Works
a  |> left_join(b) |>
  select(patient, adate, bdate )

# Fails
a  |> left_join(b) |> left_join(c) |>
  select(patient, adate, bdate, cdate )

Ok, one should not use c as variable name, but I tried c1, same result

@dmenne
Copy link
Author

dmenne commented Dec 23, 2023

@krlmlr
Copy link
Member

krlmlr commented Feb 15, 2024

Is this still an issue with the dev version, in particular after #1396? I might have seen a similar problem.

@hadley
Copy link
Member

hadley commented Feb 15, 2024

@krlmlr specifically with Acess, or more generally?

@krlmlr
Copy link
Member

krlmlr commented Feb 15, 2024

I was thinking about #1396 (comment):

options(conflicts.policy = list(warn = FALSE))
library(dplyr)
library(dbplyr)

lf1 <- lazy_frame(x1 = 1)
lf2 <- lazy_frame(x = 1, y2 = 1)
lf3 <- lazy_frame(x = 1, y = 1)

lf1 %>% 
  left_join(lf2, by = c(x1 = "x")) %>% 
  left_join(lf3, by = c(x1 = "x", y2 = "y"))
#> Error in `purrr::map()`:
#> ℹ In index: 2.
#> Caused by error in `sql_table_name_prefix()`:
#> ! `table` must be a single string, not a character vector.

@krlmlr
Copy link
Member

krlmlr commented Feb 15, 2024

It just rang a bell, I wasn't following this issue too closely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior verb trans 🤖 Translation of dplyr verbs to SQL
Projects
None yet
Development

No branches or pull requests

3 participants