Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request: Store the Expr behind each Method object #24347

Open
cstjean opened this issue Oct 26, 2017 · 16 comments
Open

Request: Store the Expr behind each Method object #24347

cstjean opened this issue Oct 26, 2017 · 16 comments

Comments

@cstjean
Copy link
Contributor

cstjean commented Oct 26, 2017

If we had a function code_expr(::Method) that returns :(function foo(...) ...), it would make augmenting existing functions (for instrumenting profiler, tracing, ...) vastly easier to do without digging into the internals or parsing source files (which may have been modified). @davidanthoff has another use for it in Query.jl, and it would kinda solve #2625

@StefanKarpinski
Copy link
Sponsor Member

I would propose storing the original source text instead – we can always reparse it on demand.

@cstjean
Copy link
Contributor Author

cstjean commented Oct 26, 2017

What would you store for function-defining macro expansions like @with_kw struct ... end?

@JeffBezanson
Copy link
Sponsor Member

JeffBezanson commented Oct 26, 2017

Are you sure you really want the AST? ASTs are very complicated, with deeply-nested structure and multiple possible representations of many forms. For program transformations the IR is much easier to use.

@davidanthoff
Copy link
Contributor

I think I want the AST, but I'm certainly not sure ;) My use case is that I want to translate queries into say SQL, and parts of query specification by users comes in the form of anonymous functions. I think that is probably easier if the representation I get is as close to what the user typed. That is the path the LINQ folks took, so it is well trotted. Might also work with IR, but I'm just not familiar enough with that to really be able to tell.

@chakravala
Copy link
Contributor

chakravala commented Oct 28, 2017

Yea, the AST would be great, I really want this feature too. If you only provide the string, then when you parse it, there might be unnecessary extra objects inserted, like :line blocks, which I find annoying and also blows up the size of the new AST (especially if function has a lot of lines) in my situation I would have to make an extra call to a recursive function that deletes all the :line objects from the AST, which will take up even more extra time and effort, in addition to parsing. so AST might be more direct and efficient

However, I could see how if you only store the string, it might be smaller storage.

@timholy
Copy link
Sponsor Member

timholy commented Oct 28, 2017

Perhaps it's not ideal, but for the record here is where we stand now. I can illustrate this most easily with tools in Revise, but of course you might want a lower-level implementation that wouldn't require reading the whole file. I'll choose a method in Base because those are a bit trickier (for a package, m.file contains the absolute path, but for Base methods the path is truncated).

julia> Revise.track(Base)

julia> m = first(methods(svd))
svd(A::BitArray{2}) in Base.LinAlg at linalg/bitarray.jl:90

# This is just to get the full path to the source file, in a way that's guaranteed to match the key
# in the src text cache (`normpath` on the result from `Base.find_in_path` should presumably
# work too)
julia> fn = first(Iterators.filter(str->endswith(str, String(m.file)), keys(Revise.file2modules)))
"/home/tim/src/julia-1.0/base/linalg/bitarray.jl"

julia> src = split(Revise.read_from_cache(Revise.file2modules[fn], fn), '\n');

julia> src[m.line]
"svd(A::BitMatrix) = svd(float(A))"

Or for the expression:

julia> src = Revise.read_from_cache(Revise.file2modules[fn], fn);

julia> i = 0; for j = 1:m.line-1 i = findnext(x->x=='\n', src, i+1); end; i
2409

julia> ex = parse(src, i)
(:(svd(A::BitMatrix) = begin
            #= none:2 =#
            svd(float(A))
        end), 2444)

These would allow someone to implement code_expr(m::Method) using currently available tools and then see where it starts to encounter problems. One that is fairly predictable is

for T in (Int, Float64)
    @eval foo(x::$T) = T
end

@jw3126
Copy link
Contributor

jw3126 commented Oct 28, 2017

If one could access the AST of a method, one could build a Cassette like mechanism that overdubs the AST instead of IR of a method. For some transformations AST is better. For example one could build custom @inbounds like macros that do things like:

  • Remove all assertions from a function
  • Remove all @argcheck from a function

@JeffBezanson
Copy link
Sponsor Member

If you only provide the string, then when you parse it, there might be unnecessary extra objects inserted, like :line blocks

The best way to handle this is to add an option to parse to exclude location info. Our default behavior is to include location info, so if we saved the AST for a method it would include line nodes.

I still strongly recommend against doing program transformations on a representation with probably ~100 forms, instead of basically just assignments, branches, and calls. And even if you're ok with that, how are you going to handle unexpanded macros and nested functions? With nested functions, it's almost incoherent to talk about the "AST of a method", since the outer function contains expressions for multiple methods, and the inner function expressions are missing context needed to interpret them correctly. Do you want to re-implement our scope analysis and closure conversion, or have us hand you all the information on a silver platter via the IR?

@chakravala
Copy link
Contributor

The best way to handle this is to add an option to parse to exclude location info. Our default behavior is to include location info, so if we saved the AST for a method it would include line nodes.

Having an option like this would be good.

@cstjean
Copy link
Contributor Author

cstjean commented Oct 29, 2017

I still strongly recommend against doing program transformations on a representation with probably ~100 forms, instead of basically just assignments, branches, and calls.

Tracing and profiling just involve wrapping the function body. Whether it's done with the AST or the IR doesn't matter much, but the AST is exported, supported and documented.

@JeffBezanson
Copy link
Sponsor Member

The IR is also supported and documented, and it's already possible to do transformations at that level, which e.g. ParallelAccelerator.jl and Casette.jl do.

Whether it's done with the AST or the IR doesn't matter much

Not true. The best example, as I said, is probably inner functions, where the AST doesn't correspond to the actual function in a simple way. But at the IR level it's simple, since everything has been converted to a top-level function with no free variables.

If it's useful to be able to wrap function bodies, it's possible we should have a utility for doing just that, since as you observe it's not even necessary to examine the AST in detail to do it --- letting you see the full AST for that would be overkill.

@cstjean
Copy link
Contributor Author

cstjean commented Oct 29, 2017

@chakravala You might find MacroTools.striplines useful.

@chakravala
Copy link
Contributor

chakravala commented Oct 29, 2017

@cstjean Thanks for the tip, I've already implemented my own Reduce.linefilter actually, since I enjoy writing recursive tree algorithms for fun.

@davidanthoff
Copy link
Contributor

@JeffBezanson could you help me understand what IR I would get? For example, say I have a function like

function foo(x::Function)
    ir = ... # Magic!
end

And then I call it like say this: foo(i->2*i + log(i) == 23).

Would the suggestion be that I work with something that looks like what I would get if I call @code_llvm on the anonymous function? Or something else?

@JeffBezanson
Copy link
Sponsor Member

It would be the result of @code_lowered.

@oxinabox
Copy link
Contributor

The updated version of @timholy 's revise based answer from before is:

Use CodeTracking.jl’s definition function.
https://github.com/timholy/CodeTracking.jl

It is what I do in Arborist
https://github.com/oxinabox/Arborist.jl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants