Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read in multiple csvs when file paths aren't amenable to glob syntax #146

Open
nicki-dese opened this issue May 2, 2024 · 2 comments
Open

Comments

@nicki-dese
Copy link

nicki-dese commented May 2, 2024

I routinely work with multiple large csvs with a mess of file paths that aren't amenable to glob syntax. When working with duckdb I can supply these as, say SELECT * FROM read_csv([file_1.csv, file_2.csv]) and that works. I can't figure out how to do the equivalent in duckplyr.

I've tried:

file_paths <- c("file_1.csv", "file_2.csv) OR
file_paths <- list("file_1.csv", "file_2.csv")

duckplyr_df_from_csv(file_paths) %>% do_something

It doesn't error, but it only reads in the first file.

Is this possible? if so how? If not, I think there should at least be a warning if a list or vector of multiple file paths are passed.

@krlmlr
Copy link
Collaborator

krlmlr commented May 2, 2024

Thanks. Code like file_paths %>% map(duckplyr_df_from_csv) %>% bind_rows() has worked for me in practice, but I agree that this should be streamlined. Would you like to contribute a PR?

@nicki-dese
Copy link
Author

I hadn't thouight to use map, thanks for the tip.

I'm sorry I do not have the experience or knowledge of how to do a PR :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants