Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] ArgumentError: malformed expression in formula #507

Open
schlichtanders opened this issue Nov 7, 2023 · 3 comments · May be fixed by #524
Open

[BUG] ArgumentError: malformed expression in formula #507

schlichtanders opened this issue Nov 7, 2023 · 3 comments · May be fixed by #524

Comments

@schlichtanders
Copy link

RCall fails to handle valid R expressions

julia> using RCall
julia> reval("library(tidyverse)")
julia> rcopy(reval("aes(x, y)"))
ERROR: LoadError: ArgumentError: malformed expression in formula ~x
Stacktrace:
  [1] var"@formula"(__source__::LineNumberNode, __module__::Module, ex::Any)
    @ StatsModels ~/.julia/packages/StatsModels/Wzvuu/src/formula.jl:62
  [2] eval
    @ ./boot.jl:370 [inlined]
  [3] rcopy(#unused#::Type{StatsModels.FormulaTerm}, l::Ptr{LangSxp})
    @ RCall ~/.julia/packages/RCall/gOwEW/src/convert/formula.jl:41
  [4] rcopy(s::Ptr{LangSxp}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ RCall ~/.julia/packages/RCall/gOwEW/src/convert/default.jl:14
  [5] rcopy
    @ ~/.julia/packages/RCall/gOwEW/src/convert/default.jl:8 [inlined]
  [6] rcopy(#unused#::Type{Any}, s::Ptr{LangSxp})
    @ RCall ~/.julia/packages/RCall/gOwEW/src/convert/base.jl:21
  [7] rcopy(::Type{OrderedCollections.OrderedDict{Symbol, Any}}, s::Ptr{VecSxp}; normalizenames::Bool)
    @ RCall ~/.julia/packages/RCall/gOwEW/src/convert/base.jl:174
  [8] rcopy(::Type{OrderedCollections.OrderedDict{Symbol, Any}}, s::Ptr{VecSxp})
    @ RCall ~/.julia/packages/RCall/gOwEW/src/convert/base.jl:165
  [9] rcopy(s::Ptr{VecSxp}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ RCall ~/.julia/packages/RCall/gOwEW/src/convert/default.jl:18
 [10] rcopy
    @ ~/.julia/packages/RCall/gOwEW/src/convert/default.jl:8 [inlined]
 [11] #rcopy#16
    @ ~/.julia/packages/RCall/gOwEW/src/convert/default.jl:6 [inlined]
 [12] rcopy(r::RObject{VecSxp})
    @ RCall ~/.julia/packages/RCall/gOwEW/src/convert/default.jl:6
 [13] top-level scope
    @ REPL[50]:1
in expression starting at /home/ssahm/.julia/packages/RCall/gOwEW/src/convert/formula.jl:41

The StatsModels.@formula is apparently not meant to be used with simple variables

@schlichtanders schlichtanders changed the title [BUG] [BUG] ArgumentError: malformed expression in formula Nov 7, 2023
@schlichtanders
Copy link
Author

schlichtanders commented Nov 7, 2023

I am currently using the following workaround/fix

function RCall.rcopy(::Type{RCall.FormulaTerm}, l::Ptr{RCall.LangSxp})
    expr = RCall.rcopy(Expr, l)
    if Meta.isexpr(expr, :call) && length(expr.args) == 2 && expr.args[1] == :~
        # special case of simple variable, like in aes(x, y)
        return expr
    end
    # complex formular
    return @eval RCall StatsModels.@formula($expr)
end

@palday
Copy link
Collaborator

palday commented Nov 14, 2023

I don't know if the example here is artificial, but I'm not sure I think it makes sense to try to copy the result of aes(x, y) into Julia. Here's my reasoning and guess what I thinking is going wrong:

  • aes(x, y) is an example of non-standard evaluation in R -- the x and the y are treated as symbols and not as variables and then those symbols are evaluated in the context of the dataframe you pass into the rest of the ggplot2 call. Julia doesn't have non-standard evaluation -- and intentionally so. Non-standard evaluation is really great for some bits of fun syntax (like the tidyverse uses extensively), but it's very hard for humans and compilers to reason about and thus very hard or even impossible to optimize. This is part of why efforts to add a bytecode compiler to R have had very limited success.[1] In other words, the R expression aes(x, y) has no direct Julia analog and not just because aes is an R function and not a Julia one.
  • Julia relies instead on macros to do syntax rewriting and thus implement things like the Wilkinson-Roger notation, i.e. the formula syntax.
  • If I recall correctly (it's been a while since I messed with ggplot2 internals and much has changed in the mean time), aes and the like are actually rewritten into a mix of formula notation and things like aes_ which doesn't use non-standard evaluation. My guess is that when this happens, then x gets turned into the one-sided formula ~ x.
  • RCall sees this formula and says "aha, I know how to translate a formula!" and calls into StatsModels, which has the canonical Julia implementation of the Wilkinson-Roger notation via its @formula macro. The only problem is that there are no one-sided formulae in this implementation. I haven't talked to @kleinschmidt to know for sure why, but my guess is that this partly related to
    • there are other ways to construct individual terms in Julia
    • there are other ways in Julia to do the types that R uses one-sided formulae for
    • macros can do syntactic rewriting, but the original input still has to be valid Julia syntax, even if it's not "semantically" correct because Julia parses the expression before the macro gets to manipulate it.
  • If you really need a one-sided formula in Julia, then you do something like @formula(0 ~ x)
  • Now that we've covered why your example doesn't work, I'm not sure it's a good idea to try it. I can't see how aes(x, y) is useful in Julia -- it's an entity that's meant to be consumed by ggplot2's functions and, as far as I know, there are no functions in Julia that can consume it. So if you just need a reference to the aes-entity to later pass it back into R, then you don't need to call rcopy -- you can just do aes = reval("aes(x, y)") and you'll have a Julia reference to the object in R.

If there's some cool use case I'm missing, please let me know! Then I could provide more guidance. 😄

One final "nit" -- for the example you're using here, you don't need the whole tidyverse, just ggplot2. Trimming the dependency stack can really help track down a problem, so just FYI. ❤️

@schlichtanders
Copy link
Author

Very impressive detailed answer. Indeed there is a usecase: I am in the process of supporting R in Pluto via RCall. Quite a special usecase, but of course in such a generic "execute some R code via RCall" setting, these cases just happen.

In other words: why should rcopy be left to fail in some known (or unkown) cases? Better let's make it valid in all cases.

@schlichtanders schlichtanders linked a pull request May 2, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants