Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More extensive record-keeping during module loading/compilation #23898

Merged
merged 3 commits into from
Oct 7, 2017

Conversation

timholy
Copy link
Sponsor Member

@timholy timholy commented Sep 27, 2017

Currently Revise has to try to parse source code to figure out what files a package depends on and which module a file should be evaluated into. This is fragile. The two commits here should allow Revise to work sensibly on Julia 1.0:

  • callbacks for include so that other packages can log these dependencies (important for __precompile__(false) packages)
  • changes to the precompile cache file to also store the evaluating module (useful for __precompile__(true) packages)

EDIT: now it also saves the raw source text (as a string) to the cache file, addressing #23448.

@timholy timholy changed the title More extensive record-keeping during module loading More extensive record-keeping during module loading/compilation Sep 27, 2017
This allows one to obtain the file dependencies of a non-precompiled package.
@timholy
Copy link
Sponsor Member Author

timholy commented Oct 3, 2017

@vtjnash, if possible it would be great to have you look this over, since this makes a number of changes to the format of the cache files.

@timholy timholy force-pushed the teh/include_hook branch 2 times, most recently from 9a1d74c to 15569d9 Compare October 3, 2017 11:55
Copy link
Sponsor Member

@vchuravy vchuravy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is time that we document the cachefile format.
IIUC this now also stores the source text of a dependent files in the cache file so that it is easier accessible for down-stream consumers?

@test map(x -> x[1], sort(deps)) == [Foo_file, joinpath(dir, "bar.jl"), joinpath(dir, "foo.jl")]
@test map(x -> x[1], sort(discard_module.(deps))) == [Foo_file, joinpath(dir, "bar.jl"), joinpath(dir, "foo.jl")]
srctxt = Base.read_dependency_src(cachefile, Foo_file)
@test !isempty(srctxt) && srctxt == read(Foo_file, String)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test for a manually included dependency. If I understand one could end up parsing the list of dependent files and end up with an ("__external__", "external.so", mtime) and `Base.read_dependency_src(cachefile, "external.so") should fail.

Copy link
Sponsor Member Author

@timholy timholy Oct 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but IMO that's fine. This is an internal method and consumers can make sure they don't try to read "__external__" dependencies. (Reading the src text is not something that's done unless it is requested.) Revise doesn't intend to track external dependencies, because it's hard to know what you're supposed to do about them.

But if you know of packages that are using this feature, I would be happy to go look at how they're using external dependencies; that could end up changing my mind.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything that would lead to a stale cachefile should probably prompt Revise to to reload that file :).
In my use cases I have stateful information from that external dependency (if it exists or not, API version, ...)

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would love a link to your use cases, if they are in public repos.

It would be fine to trigger revise(module) when an external dependency updates, but it seems like it can't be robust: what if the external dependency is a single file in another package? How is Revise supposed to know whether it has to revise that package first? (In some cases it wouldn't but in others it would be necessary.)

In either case, I don't think Revise will need access to the source text of external dependencies. If, as you say, it's an .so file, Revise can't do anything with it anyway. In fact it occurs to me that we probably shouldn't store external dependencies in the src-text cache; it would be crazy to cache, say, libkde.so in a *.ji file 😄.

I will change the format so that each source-text file has its name at the beginning, so that one doesn't have to "count" from the list of dependencies while skipping over #__external__ dependencies.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if that might happen ;) yeah that would be crazy.

base/loading.jl Outdated
@@ -230,7 +234,7 @@ This is only needed if your module depends on a file that is not used via `inclu
no effect outside of compilation.
"""
function include_dependency(path::AbstractString)
_include_dependency(path)
_include_dependency("__external__", path)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently __external__ is a perfectly valid module name, albeit odd.

Solutions I see:

  • make __external__ a reserved module name.
  • add a flag to signify a external file.

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about I change it to #__external__? That seems pretty much impossible unless someone goes to heroic effort to generate it, @eval using $(Symbol("#__external__")).

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good even though I would love a verification step when we write out the module, but maybe that is easier to add to Revise.

This ensures that it's possible to re-evaluate modified code on a file-by-file basis (as with Revise)
@timholy
Copy link
Sponsor Member Author

timholy commented Oct 4, 2017

Barring objections, I'm planning to merge this by the end of the week.

Copy link
Sponsor Member

@vchuravy vchuravy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

src/dump.c Outdated
ios_write(&f, jl_string_data(dep), slen);
posfile = ios_pos(&f);
write_uint64(&f, 0); // placeholder for length of this file in bytes
ios_file(&srctext, jl_string_data(dep), 1, 0, 0, 0);
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs error handling. Alternatively, might be nice to store the contents as they are used. Also might be nice to use some standard format. For example, LLVM provides llvm::writeArchive (http://llvm.org/doxygen/ArchiveWriter_8h_source.html#l00041) for copying a list of files (paths and/or buffers) into the ar format (although LLVM may need a small patch to disable throwing an error when the symbol table cannot be created when it is not needed)

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just added some simple error handling, hopefully it's what you had in mind.

The LLVM interaction seems a little scarier, since I'm less familiar with it. Dumb question: how do you even call out to LLVM from a *.c file?

The thing I like about storing this in the *.ji file is that they are guaranteed to be validated/invalidated together. Otherwise, I agree with you that a standard format has advantages.

@timholy timholy merged commit b9a0d1d into master Oct 7, 2017
@timholy timholy deleted the teh/include_hook branch October 7, 2017 10:43
@stevengj
Copy link
Member

I was looking at whether I should call include_callbacks in NBInclude.jl, but it looks like Revise.jl expects the callback parameter to be a Julia file. Might be nice to pass the include function (e.g. nbinclude) to the callback. Or maybe a parse function? Not sure what would be needed here to handle Julia code that is stored in structured formats like Jupyter notebooks (JSON).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants