Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

select_ does not allow for spaces in column name, filter_ does not seem to work #1392

Closed
lazarillo opened this issue Sep 8, 2015 · 5 comments

Comments

@lazarillo
Copy link

These are two separate issues, but I think they may have the same underlying code affecting them both. First, with select_, I cannot use spaces in the name:

> mtcars_tbl <- tbl_df(mtcars)
> mtcars_tbl <- rename(mtcars_tbl, `miles per gallon` = mpg)
> select_(mtcars_tbl, "miles per gallon")

Error in parse(text = x) : <text>:1:7: unexpected symbol
1: miles per

Same thing if I actually use the variable, as "intended".

> tmp <- "miles per gallon"
> select_(mtcars_tbl, tmp)

Error in parse(text = x) : <text>:1:7: unexpected symbol
1: miles per

So, whether I pass an actual variable with a string in it, as I presume was the original intent, or if I just want a cleaner way to deal with spaces, select_ still fails. filter_ (and presumably others) also fails:

> filter_(mtcars_tbl, "miles per gallon")
Error in parse(text = x) : <text>:1:7: unexpected symbol
1: miles per

OK, now onto problem 2:

filter_ does not seem to work at all, even when this whitespace is not an issue. For example:

> filter_(mtcars_tbl, "cyl" > 4) %>% arrange(cyl) %>% head(2)
  miles per gallon cyl  disp hp drat   wt  qsec vs am gear carb
1             22.8   4 108.0 93 3.85 2.32 18.61  1  1    4    1
2             24.4   4 146.7 62 3.69 3.19 20.00  1  0    4    2

versus

> filter(mtcars_tbl, cyl > 4) %>% arrange(cyl) %>% head(2)
  miles per gallon cyl disp  hp drat    wt  qsec vs am gear carb
1               21   6  160 110  3.9 2.620 16.46  0  1    4    4
2               21   6  160 110  3.9 2.875 17.02  0  1    4    4

I played around with it some more, looking at the dim(), etc. It's pretty clear that nothing is happening with filter_. 😦

@romainfrancois
Copy link
Member

About problem 2, this is user error. You can either use "cyl >4" or ~cyl > 4 , but what happens here is that gets evaluated:

> "cyl" > 4
[1] TRUE

so you get all the data back.

@hadley
Copy link
Member

hadley commented Sep 10, 2015

For problem one, you want:

tmp <- "miles per gallon"
select_(mtcars_tbl, as.name(tmp))

@hadley hadley closed this as completed Sep 10, 2015
@lazarillo
Copy link
Author

Romain,

Thank you for catching my error! I guess when I saw that one thing didn't work, I started suspecting the package more than my code.

Hadley,

Thank you for your solution. This will work in the short term, but is it a long term solution? What is the harm of wrapping everything with "as.name" automatically in the "*_" flavors of the package? Does that make something else crash somewhere else?

As I'd mentioned, there are 2 reasons to use the "*_" flavors:

  • to avoid having to deal with backticks or other syntax that I'd rather avoid. Admittedly, this is the less urgent reason.
  • to be able to use the names as variables, the main reason. If "as.name" works for some variable name constructs, and not for others, then this solution is not viable. If it works for all variable name constructs, then can't it just be pulled in as part of the *_ flavors of the functions and methods within dplyr itself?

@hadley
Copy link
Member

hadley commented Sep 10, 2015

It's a fundamental choice - should "my weird variable" work, or should "starts_with('abc') work. I decided on the latter, and it's too late to change now. as.name() says that what you have is a name of a variable, which seems pretty reasonable to me.

@lazarillo
Copy link
Author

Hi Hadley,

OK, understood! So there are unfortunately inevitable conflicts in how names are interpreted. I understand that, as it has been the bane of R for many years.

I guess I was just hoping that with all the marvelous things that dplyr, tidyr, ggplot2, etc. have been able to manage, despite relying upon a language that has made some poor choices under the hood, that maybe you found some cool magical way to resolve this issue, too.

Thanks for your response, and all your amazing packages!

@lock lock bot locked as resolved and limited conversation to collaborators Jun 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants