Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose type dependencies (call graph) for accurate incremental transpiling #99

Open
tbroyer opened this issue Jul 11, 2020 · 42 comments
Open

Comments

@tbroyer
Copy link
Contributor

tbroyer commented Jul 11, 2020

Is your feature request related to a problem? Please describe.
For accurate incremental transpiling, external tooling (Maven, Gradle, etc.) need to determine which files to re-transpile when a given file changes.
The mapping from source file to generated files is needed too (naming conventions are not enough, because it's not unusual for a Java file to contain multiple types, and can even contain several top-level types; and source files aren't even required to follow the common naming convention based on the package and type names) to accurately cleanup the output directory when a source file is deleted.
J2CL also copies source files (.java and .js) to the output, possibly relocating them, so the mapping from source path to output path is needed as well.

Even better if it could also map to JARs or directories in the -classpath for external dependencies, such that if the JAR (or class in the directory) changes the build tool could re-process all files that depend on it (and possibly on all following path entries in the classpath, in case class is now shadowed, or a class that was shadowed no longer is; if it could map to the actual class file, this could be made even more precise, by detecting those shadowing cases). Otherwise, each change to the classpath would mean reprocessing all files. Gradle, for instance, does such processing for its incremental javac.

For example, given those two source files:

package p;

class A {
  B a() {
    return new B(1);
  }
}

and

package p;

class B {
  B(Number n) {}
}

If src/p/B.java is modified, then we'll want to reprocess both src/p/B.java and src/p/A.java because p.B would have had its API modified in a way that would change p.A's output (e.g. here adding a B(int) constructor overload).

Processing those 2 files will result in 8 files being generated or copied:

  • out/p/A.java (copied from the sources)
  • out/p/A.java.js
  • out/p/A.impl.java.js
  • out/p/A.js.map
  • out/p/B.java (copied from the sources)
  • out/p/B.java.js
  • out/p/B.impl.java.js
  • out/p/B.js.map

If src/p/A.java is deleted, we'll want to delete all the out/p/A.* files.

Now if a src/p/A.native.js is created, then src/p/A.java would have to be reprocessed. And then again if src/p/A.native.js is deleted. But because J2CL uses a naming convention here, this does not need to appear in the mapping (correct me if I'm wrong).

If src/p/B.java is modified to add an inner class, it would generate 3 new files (out/p/B$Inner.java.js, out/p/B$Inner.impl.java.js, and out/p/B$Inner.js.map). If src/p/B.java is then deleted, then all 7 files need to be deleted, and src/p/A.java be reprocessed (which would fail, unless it was also modified to remove its dependency on p.B).

Describe the solution you'd like
Define a stable file format that J2clCommandLineRunner could output for use by external (non-Bazel) tooling.
It needs to contain 2 things:

  • the list of dependencies between source files (src/p/A.java depends on src/p/B.java; this could use types as an indirection, e.g. type p.A comes from src/p/A.java and references type p.B that comes from src/p/B.java, and foo.Bar that comes from jar:foo.jar!/foo/Bar.class)
  • the mapping back to the source file(s) (there could be a .java and a .native.js) for each output file

In an incremental use of the transpiler, the build tool would have to merge the generated mapping file with the previously known mapping (and removing entries from that global mapping when a file is deleted).

Describe alternatives you've considered
J2CL outputs a source map file (.js.map) for each generated file (pair of .java.js and .impl.java.js), containing the path of the source; but that references the source that J2CL copied to the output, not the actual path to the source (as can be seen in bazel-bin/third_party/jbox2d.js.zip vs bazel-bin/third_party/libjbox2d-src.jar after a bazel build //third_party:jbox2d.js.zip: the .js.zip contains a java/lang/StrictMath.js.map referencing, using a relative path, java/lang/StrictMath.java, but the actual source in the -src.jar was external/org_jbox2d/src/main/java/org/jbox2d/gwtemul/java/lang/StrictMath.java), because it's targeted at tools that need to access those .java files.

Additional context
J2CL internally already has all the information; it populates a LibraryInfo protobuf that it can serialize to a file (for later use by RTA) but the flag to do so is not exposed to J2clCommandLineRunner (probably to keep the LibraryInfo as an implementation detail of J2CL). Just like the source map though, the library_info does not contain the actual source path either (modifying third_party/BUILD to pass readable_library_info = True to //third_party:jbox2d generates a library_info_debug.json in the .js.zip but it does not contain the full src/main/java/org/jbox2d/gwtemul/java/lang/StrictMath.java path)
The Kythe indexing metadata includes the information though (I modified build_defs/internal_do_not_use/j2cl_common.bzl to unconditionally pass -generatekytheindexingmetadata), so J2CL has all the needed information to generate such a mapping file.

@mdproctor
Copy link

I created a separate visitor, that produces cache of all graph information, and then built an incremental reference manager around this. I need just a single callback within J2cL to run and maintain this. my first impl, was putting the code inside of J2CL, i'm now separating this into it's own project and will submit a PR to the small change I need to J2cL for this ti work. I was working on the to provide incremental J2CL, that could work with any IDE. Once the code is separated, I'll then integrate this into maven.

ReferenceManager:
https://github.com/mdproctor/j2cl/blob/0bf7a647ca3629c39d89314b9a3abbcf3b54e9d3/transpiler/java/com/google/j2cl/sdm/ReferenceManager.java

UnitTests:
https://github.com/mdproctor/j2cl/blob/0bf7a647ca3629c39d89314b9a3abbcf3b54e9d3/transpiler/javatests/com/google/j2cl/transpiler/sdm/TestJ2CL.java

@gkdn
Copy link
Member

gkdn commented Jul 16, 2020

With J2CL, even compiling whole JRE takes less than 3 secs on a warmed up worker.
On the Bazel land, we avoid giant targets and with the help of ijar/header jars the changes usually will not propagate up and cause additional re-compilations.
Those two together gives us millisecond refresh times so we never had the need for 'incremental' transpilation within a target.

I would assume Gradle has parallel concepts to what we have in Bazel. So is incremental transpiling really necessary in practice for Gradle?

@tbroyer
Copy link
Contributor Author

tbroyer commented Jul 16, 2020

I think 3 seconds for transpiling + many more seconds for Closure is too long for a usable "dev mode".

Gradle does not have ijar/hjar but instead fingerprints the "incoming" classpath of tasks (it knows how to fingerprint a "compile classpath" differently from a "runtime classpath", only taking the ABI into account in the first case), but I'd expect that this leads to the same results eventually 🤞

I have a (non incremental) prototype in Gradle that, when applied to the samples/helloworld project transpiles in 0.5s (closure step in BUNDLE mode is then ~1.2s). That's for 2 Java files only!
Those numbers are for warmed-up workers, but maybe I did something wrong and could achieve better performance (currently using in-process workers with classloader isolation, will try with process isolation but I'm not sure it would actually improve things here); unless it's just the overhead of Gradle (it apparently bears comparison with Bazel though).
Using Gradle's continuous build, this gives me a 2s rebuild whenever I change a Java source file; whereas ibazel build //src/main/java/com/google/j2cl/samples/helloworld:helloworld_dev rebuilds in 0.7s ! (while probably doing everything "twice": strip/javac/j2cl the helloworld_lib, then possibly strip/javac/j2cl helloworld if the ABI of helloworld_lib changed, and the final closure bundle step!)

I need to try with a larger project, and possibly optimize performance of the tasks themselves, but given that "targets" are much bigger in non-Bazel projects (fwiw, my current GWT project has 90.4k sloc in 1275 Java files in the same "target" –i.e. excluding the libraries that it uses, some of them "internal" to our project too–; and SDM still is very usable!), and given that Gradle thought worth it to work on incremental Java compilation despite the awful complexity particularly including annotation processing, then I would assume that this would greatly help for J2CL as well.
And it looks like we could do something "good enough" by only knowing the relationships between source files (by "good enough", I mean that adding a source file could possibly trigger a full recompilation, as it could completely change which type is resolved when using star-imports and/or between inherited inner types and same-package types with the same name; but deleting a file or changing its content could only re-transpile those files and all those that depended on it, transitively).

Let's gather some more numbers.

@gkdn
Copy link
Member

gkdn commented Jul 16, 2020

I think 3 seconds for transpiling + many more seconds for Closure is too long for a usable "dev mode".

3 seconds is for change in JRE itself; which is one of the worstcases.
The numbers for smaller targets are much smaller; it would be less 100 ms. And because of that we actually never bothered profiling/optimizing the transpiler. I'm sure there a low hanging fruits there.

Let's gather some more numbers.

Yeah let's gather some numbers and we can decide on prioritization of this.

@gkdn
Copy link
Member

gkdn commented Jul 16, 2020

And thanks for looking into J2CL Gradle!!

@tbroyer
Copy link
Contributor Author

tbroyer commented Jul 17, 2020

I think 3 seconds for transpiling + many more seconds for Closure is too long for a usable "dev mode".

3 seconds is for change in JRE itself; which is one of the worstcases.

AFAICT, the JRE is 26.5k sloc in 265 files, that's small compared to the 90.4k sloc in 1275 files I'm working with 😉 (and that's not counting all the generated code, as it makes heavy use of UiBinder, the Editor framework, and RequestFactory, and of course ClientBundle and i18n).
Just to put that in perspective 😉

The numbers for smaller targets are much smaller; it would be less 100 ms. And because of that we actually never bothered profiling/optimizing the transpiler. I'm sure there a low hanging fruits there.

👍

@gkdn
Copy link
Member

gkdn commented Jul 17, 2020

@rluble heads up

Ok got it.

For starters; to cover output to source mapping, we could easily have a flag so that the generated source map can point to the original source. I think that's generally useful option.

But list of dependencies of a file though is a more of a unique request. Curious if it is not easy to extract that information from the Java compilation that Gradle is doing?

BTW, as side notes, I would like to get the parallel stripping + javac compilation out of the equation and rely completely on Turbine for that. We never need the byte code other than the ijars/hjars. We can keep separate stripper tool but wanted to share that if it effects your plans.

@tbroyer
Copy link
Contributor Author

tbroyer commented Jul 28, 2020

And thanks for looking into J2CL Gradle!!

Far from being done, but code is now available at https://github.com/tbroyer/gradle-j2cl-plugin
Write up about where I'm at: https://blog.ltgt.net/designing-gradle-j2cl-plugin/

@tbroyer
Copy link
Contributor Author

tbroyer commented Jul 28, 2020

For starters; to cover output to source mapping, we could easily have a flag so that the generated source map can point to the original source. I think that's generally useful option.

That'd at least help us start experimenting 👍

But list of dependencies of a file though is a more of a unique request. Curious if it is not easy to extract that information from the Java compilation that Gradle is doing?

Should be relatively easy to extract using ASM.
So

  • source maps would give us *.java.js/*.impl.java.js to original *.java mapping,
  • with ASM we could extract the dependencies between *.class files
  • the names of the *.java.js would give us the corresponding *.class, so transitively we could find the dependencies between *.java files

This means that the J2clTranspile task would need both the *.java and their *.class counterparts (rather than just the *.java that the J2clTranspiler actually needs), but it could work.

BTW, as side notes, I would like to get the parallel stripping + javac compilation out of the equation and rely completely on Turbine for that. We never need the byte code other than the ijars/hjars. We can keep separate stripper tool but wanted to share that if it effects your plans.

I had long been floating the idea of stripping at the bytecode level using ASM, in addition to stripping at the source level with the GwtIncompatibleStripper. If the stripper is integrated into the transpiler, we could then just remove it, and continue to strip the bytecode with ASM in parallel.
This is in a world where we expect third-party libraries to come with all their sources, vs running javac to also run annotation processors. I'm not quite sure what you're expecting in Bazel with the j2cl_import_external and j2cl_maven_import_external (where you would declare their deps, and those could have exported_plugins): that the source JAR includes everything (including processor generated sources; as is the case when using those JARs with GWT for instance) or that annotation processors would run?

@gkdn
Copy link
Member

gkdn commented Jul 29, 2020

Should be relatively easy to extract using ASM.

I was assuming that since Gradle Java plug producing this information, maybe we can simply access it. Going through ASM might be a an overkill if used only for this purpose.
When we are there, let's talk about it and see how we can supply the information to you.

I'm not quite sure what you're expecting in Bazel with the j2cl_import_external and j2cl_maven_import_external

j2cl_import_external and j2cl_maven_import_external are just mimicking their JVM counterparts so don't really understand from plugins hence effectively requires APT to be already run. That is only better than not having anything :)

I'm not sure how things are with rules_jvm_external but we will very likely keep following whataver the Bazel Java solutions come up with.

@tbroyer
Copy link
Contributor Author

tbroyer commented Sep 18, 2020

For reference (I thought I already posted it, but can't find it), the way Gradle does it for JavaC is here: https://github.com/gradle/gradle/blob/v6.6.1/subprojects/language-java/src/main/java/org/gradle/api/internal/tasks/compile/incremental/ (and a bit in the parent package)
They store:

  • a source→classes mapping file (format)
  • analysis data for each class (including those from the classpath AFAICT), tracking which other class is referenced, either privately (method bodies, or private methods/fields) or accessible (such that if class A is changed, but is only used privately by class B, only class B will need to be recompiled, and not any class that also depends on class B). Analysis itself is done using ASM as a post-processing phase after JavaC has been called.

And it's more complex than J2Cl as the process also has to track generated files by annotation processors.

This (more or less) matches what I described above for J2Cl:

  • the list of dependencies between source files (src/p/A.java depends on src/p/B.java; this could use types as an indirection, e.g. type p.A comes from src/p/A.java and references type p.B that comes from src/p/B.java, and foo.Bar that comes from jar:foo.jar!/foo/Bar.class)
  • the mapping back to the source file(s) (there could be a .java and a .native.js) for each output file

For reference, the meat of the incremental compiler, that makes use of this data, is JavaRecompilationSpecProvider and ClassSetAnalysis

@mdproctor
Copy link

I have added a TypeGraphStore to my fork. As well as dumping the graph info, it records if an update to a class may have a cascading impact - such js class and method renames.
https://github.com/mdproctor/j2cl/blob/incremental/transpiler/java/com/google/j2cl/transpiler/incremental/TypeGraphStore.java

It has a single touch point (see line 151)in J2clTranspiler, as well as the options config to enable it.
https://github.com/mdproctor/j2cl/blob/incremental/transpiler/java/com/google/j2cl/transpiler/J2clTranspiler.java#L151

@gkdn
Copy link
Member

gkdn commented Sep 18, 2020

Thanks for sharing @mdproctor

@tbroyer I think I didn't get a direct response to my earlier question. Assuming you use Gradle to do java compilation, don't you already have access to this generated mapping file?

@tbroyer
Copy link
Contributor Author

tbroyer commented Sep 21, 2020

@tbroyer I think I didn't get a direct response to my earlier question. Assuming you use Gradle to do java compilation, don't you already have access to this generated mapping file?

It might be possible to get access to this information (I haven't tried), but this is internal to Gradle and could change at any time. Some (or all) of it also won't be available (AFAICT) if the compilation output comes from a build-cache.

It also, as you said, assumes that the exact same code has already been compiled by JavaC. Tasks being building blocks, things can be configured slightly differently, breaking the incremental processing with no (easy) way to detect it. In a shared library, one could, for example, only compile (javac) its original sources, and not the ones stripped from @GwtIncompatible annotations (given that it's something you're intending to rework and build into Turbine on your side…), and for downstream modules, use ASM to strip @GwtIncompatible from bytecode (this is something we'll have to do one way or the other anyway for third-party libraries)

(fwiw, I have that ASM-based bytecode-stripper working already; not sure if I'll actually use it but I wanted to learn ASM 😉)

@mdproctor
Copy link

I have submitted a PR of my work to vertispan's repository, Vertispan#12 I'm not expecting this to be merged yet, but it's a good time to circulate for feedback. I'm not sure we need this included in J2CL, but it might be nice to have a hook so a custom use provided visitor can be run.

I was unable to use the LibraryInfo as its information is not complete and anything visible to JS is not exported as it's more aimed at pruning - but it works in a similar way. I also created my on simple, CVS style, format to read/write the information - because it's compact and fast - much smaller than the json. For incremental refreshes, we want to cut every bit of time we can.

For this approach to work it expects the IDE to ensure all compilation errors have been resolved, before being passed to j2cl. Although it gracefully handles when that is not the case. While a subset of .java files are passed for transpiration, it uses .class output from the other classes to ensure everything resolves correctly. We have found this to work fine, with our j2cl Quake port.

The code works in two phases. The first phase takes the incoming changeset (as defined by timestamp changes since last iteration) and uses the graph information, if any, to build the expanded list of files that need to be retranspiled. This first phase is done before calling j2cl, such we do in our maven plugin already.

The second phase is after the j2cl produces it's compilation units, and it outputs the new graph information. Note, as it has all the graph information from the previous call, so even if a subset of java files are passed, it is able to ensure a full graph write out on each iteration.

The visitor builds up a list of all the members, and builds a callee/caller graph. "Impact" tracking is added for a member that calls another member who has as different JS name specified in the annotation - as it's possible to change the js name, and the IDE will not trigger refactor in the caller and thus no timestamp change. This way it is ensured the caller is still added to the transpiration changeset. As other issue are identified, they can be added to impact tracking too.

I have added a unit test, to check the main types of issues. We have used this with the j2cl quake port in our maven toolchain and iterations went from 7s to 1.5s for each iteration.

@gkdn
Copy link
Member

gkdn commented Feb 5, 2021

Could we customize library info with a flag so that you can use the information provided by it instead of introducing completely independent concept? If it doesn't require heavy customization, that would be a better option.

And is the output we would generate going to be useful for both Gradle and Maven incremental compilation? i.e. can they use same thing?

And the final question as a side discussion:
What does all other languages do with Maven and Gradle? Does scala, kotlin, go, dart etc. plugins need to do such this independent analysis during the build to have good performance?

@tbroyer
Copy link
Contributor Author

tbroyer commented Feb 5, 2021

And is the output we would generate going to be useful for both Gradle and Maven incremental compilation? i.e. can they use same thing?

I haven't looked at the PR yet but there's no a priori reason it wouldn't work for Gradle or any other build tool.

And the final question as a side discussion:
What does all other languages do with Maven and Gradle? Does scala, kotlin, go, dart etc. plugins need to do such this independent analysis during the build to have good performance?

Generally speaking Maven does nothing, it's as dumb as Ant or Make; developers generally rely on their IDE for incremental compilation (hoping it does the same thing 🤷).
For Java and Groovy, Gradle analyzes bytecode and traces annotation-processor generated classes to get proper incremental compilation (annotation processors need to opt-in to incremental annotation processing too, this is Gradle-specific).
Scala and Kotlin have incremental compilers on their own (for Scala, it was originally built into SBT but has been extracted as Zinc; Pants uses Zinc for Java incremental compilation too BTW), so they "just" need to be used by the build tool.
Go is fast enough that it doesn't need incremental compilation AFAICT.
I can't tell for other languages.

@mdproctor
Copy link

mdproctor commented Feb 6, 2021

Could we customize library info with a flag so that you can use the information provided by it instead of introducing completely independent concept? If it doesn't require heavy customization, that would be a better option.

And is the output we would generate going to be useful for both Gradle and Maven incremental compilation? i.e. can they use same thing?

And the final question as a side discussion:
What does all other languages do with Maven and Gradle? Does scala, kotlin, go, dart etc. plugins need to do such this independent analysis during the build to have good performance?

It needs to dump the full graph information. All interfaces implemented, all classes extended, all methods and properties referenced. We also need the annotation meta data, that might impact the transpiration graph - i.e. when jsinterop changes the used js name for a class, field or method.

Right now a lot of the information is removed via:
`if (!isPrunableType(referencedType)) {
return;
}

if (isJsAccessible(referencedType)) {
return;
}`

If we can get all that information dumped, that would be great. Also maybe we could configure protobuf so the output could be pure json or a format to a zipped binary? the later would ensure we load this information as quickly as possible, so that each refresh is faster.

The other tricky part is how to keep this data whole-whole and up to date, yet handle on partial incremental transpilations - where only a subset of .java files are passed for translation at any given time - except the first time, when it must receive all. If you look at the code I read in the previous iteration of the graph that was saved to disk. I then remove just the types being updated via transpiration and then re-insert them back into the graph and save the graph as a whole.

@tbroyer
Copy link
Contributor Author

tbroyer commented Feb 6, 2021

The other tricky part is how to keep this data whole-whole and up to date, yet handle on partial incremental transpilations - where only a subset of .java files are passed for translation at any given time - except the first time, when it must receive all. If you look at the code I read in the previous iteration of the graph that was saved to disk. I then remove just the types being updated via transpiration and then re-insert them back into the graph and save the graph as a whole.

This could be done by the plugins as a post-j2cl step, right? (with the text/CVS-style file being plugin-specific then: read the LibraryInfo and merge it into the cache file)

@mdproctor
Copy link

mdproctor commented Feb 8, 2021

This could be done by the plugins as a post-j2cl step, right? (with the text/CVS-style file being plugin-specific then: read the LibraryInfo and merge it into the cache file)

Yes I believe it should be possible to merge a more recent partial graph over the whole world version. It would mean potentially two files on disk. The whole world and the partial world output from j2cl.

@mdproctor
Copy link

  1. I see two methods limiting what is recorded, that should be made optional.
    isJsAccessible
    isPrunableType

  2. I also see the two blocks, I think need to be optional:
    ` if (memberDescriptor.hasJsNamespace()) {
    // Members with an explicit namespace members don't really belong to the type. Skip them
    // here, otherwise they would be an entry point for this type, and the type might be
    // unnecessarily retained by rta.
    continue;
    }

    if (memberDescriptor.getOrigin().isInstanceOfSupportMember()) {
    // InstanceOf support members should not be considered methods that are prunable if there
    // are no references, since the references are hidden by the runtime. In the end
    // InstanceOf support members are live whenever the type is live.
    continue;
    }`

  3. Each TypeInfo and MemberInfo should provide both the java name and js name, so we can see if they differ due to renaming.

  4. Configure two outputs. 3.1 json text (as per now) 3.2 binary + zip

@rluble
Copy link
Collaborator

rluble commented Feb 8, 2021

4. Configure two outputs. 3.1 json text (as per now) 3.2 binary + zip

AFAIK, library info is already emitted in binary format here. The json text is just for debug and used only in golden file tests.

@gkdn
Copy link
Member

gkdn commented Feb 8, 2021

Thanks @mdproctor.

I meant to reply to @tbroyer response and follow up on with my summarization of the options to make sure we are on the same page but I couldn't come back to it yet.

And as Roberto pointed, yes we use proto binary, it is a quite compact and platform independent format.

@mdproctor
Copy link

mdproctor commented Feb 9, 2021

@gkdn if you are able to make the changes in a branch, I can try and port my test suite to it:
https://github.com/Vertispan/j2cl/blob/69c7e6db2f78533f956e29c151c8761b0fe8d1ca/transpiler/javatests/com/google/j2cl/transpiler/incremental/IncrementalTest.java

I did look myself at doing the changes, they seem quite minimal, but not sure I'd do it in the way you'd want. don't forget we will need the qualified java and qualified js name, to determine if a jsinterop annotation changed those.

@mdproctor
Copy link

Looking over the LibraryInfo. I can see it stores the qualified js name, but not the qualified java name. Also we do not have any of the JsInfo. The aim is to be able to determine if any Java class, property or method has it's namespace or type renamed when outputting to javascript. Not just that "getJsInfo().getJsName()" exists, as it could be specified but the same name, so the it will not result in an actual different name.

@tbroyer
Copy link
Contributor Author

tbroyer commented Feb 11, 2021

Correct me if I'm wrong but what's needed is the Java source file name, not much the Java type name. What's needed is a way to compute which other file needs to be j2cl'd when a given file is modified.
See the original issue above.

@mdproctor
Copy link

mdproctor commented Feb 12, 2021

"What's needed is a way to compute which other file needs to be j2cl'd when a given file is modified."
I make the assumption that there has been a successful IDE compile, so I can use the date time stamp to get the list of files, if you refactor a class it will force all other impacted files to be updated and saved, so it gets added to that date time stamp based changeset. What I need in addition to this is classes that did not have their date time stamp changed, because the java member was not refactored, but the jsinfo name was changed which means it's impacted and thus needs to be in the set.

@gkdn
Copy link
Member

gkdn commented Feb 12, 2021

Why do you need JsInfo, qualified js name or changes to namespace in JsType for incremental compile?

@mdproctor
Copy link

mdproctor commented Feb 12, 2021

Why do you need JsInfo, qualified js name or changes to namespace in JsType for incremental compile?

Take the example below, if I refactor the A java name, then B will be changed too, both datetime stamp are updated and thus part of the submitted changeset. So I rely on datetime stamp to build the submitted changeset, rather than scan the entire call graphs. This works fine. But if I change JsType name on A, but do not change the java class name of A, then the generated JS will be stale, as the IDE will not performance any updates on B - so this must be detected and B added to the ChangeSet. So in my code i was checking if the JsInfo was doing any renaming, and if it was different from the previous time this class was visited.

@JsType(name="FakeA")
public class A {
    public void call() { .. do something .. }
}

public class B {
    private A a;
   public void call() { a.call(); }
}

@gkdn
Copy link
Member

gkdn commented Feb 13, 2021

IIUC what @tbroyer was asking for; we will give you the list of things that depend on A (which will include B in this case) so when you see A is updated, you will invalidate B as well.

However if you are saying that A doesn't look updated after the annotation change, then it is probably just related to current retention policy of these annotations - then we can set them accordingly.

@gkdn
Copy link
Member

gkdn commented Feb 13, 2021

(finally had the time to send the reply I promised earlier)

Thanks @tbroyer, you are a gem; a walking Encyclopédie of build tools 🙂

Ok IIUC, we have following options:

  1. A 3rd party tool to generate a call graph and use this data to drive the J2CL compiler (like Gradle Java?).
  2. Same as previous but the tool could be written using J2CL non-public APIs.
  3. Same as previous it is official part of J2CL.
  4. J2CL supports incremental compilation (like Kotlin?).
  5. A 3rd party incremental compiler written using J2CL non-public APIs. (like Zinc?)

Option 1 is not very complicated; as you pointed it can be done with a tool that uses ASM or JDT.

Option 2 doesn't make much sense since using J2CL is not valuable for this purpose.

Option 3 is ok as I said earlier.

Option 4 is probably overkill at the moment. Maybe we can eventually add it by moving some of your gradle code.

@gkdn
Copy link
Member

gkdn commented Feb 13, 2021

And for option 3; we would need to output something like:

message J2clInput {
  string name = 1; // source path
  repeated string deps = 2; // list of other source paths this depends on
  repeated string outputs = 3; // list of output file paths
}

Since this is simple and mostly text anyway, we can go with a simple text file emitted per source instead of using a binary proto.

So the final outputs will look like:

out/p/A.java
out/p/A.java.js
out/p/A.impl.java.js
out/p/A.build.map 
out/p/B.java
out/p/B.java.js
out/p/B.impl.java.js
out/p/b.build.map 

where A.build.map will look like:

 - src
src/p/A.java
- deps
src/p/A.native.js
src/b/B.java
- outputs
out/p/A.java
out/p/A.java.js
out/p/A.impl.java.js
out/p/A.build.map 

We don't have the bandwidth for this but it should be trivial to add.

@mdproctor
Copy link

"Since this is simple and mostly text anyway, we can go with a simple text file emitted per source instead of using a binary proto."
Without the class and member jsinfo annotation information, there is little room to optimise this. It would mean we have to transpile every class that has a dependency, even if it was not impacted.

@gkdn
Copy link
Member

gkdn commented Feb 13, 2021

You need a lot more than jsinfo annotation for that. And that is really hard to pull off correctly in a minimal way, e.g. base class change may result in new bridges, so class A may not have dependency to class B at member level earlier, and may start having one after a change in A. To handle that, for example you probably need to assume all subclass are changed regardlessly and as well as all of their their call sites.

And does it really matter in practice? (considering transpilation time vs. bundling+downloading+parsing)

Regardless, I think we should just start with something simple here.

@mdproctor
Copy link

Regardless, I think we should just start with something simple here.

If we retranspile all direct (not transitive) callers (no optimisations), from what I can see, isn't that information in the LibrayrInfo already?

@gkdn
Copy link
Member

gkdn commented Feb 16, 2021

LibraryInfo is a call-graph. Reverse of it doesn't necessarily give you what to compile when something changes since new APIs can add new edges to call graph.

Look following just for an example:

class A<T> { 
  static foo(Integer o) {...}
}
class B extends A {}
class Zoo {
  void method() { B.foo(5); }
}

Then introduce foo(int) to class B.

So if there is an API change, it is best and also easy to compile all the deps of the file. If you know if there is only a change in body, you don't need to re-compile anything other than that class - that scenario will be the majority of the user changes anyway during an edit-refresh cycle...

@mdproctor
Copy link

ok works for me, happy to start with something simple and robust and expand from there if needed. As you say, we can still capture signature changes ourselves, and avoid retranspilation if they do not occur.

@tbroyer
Copy link
Contributor Author

tbroyer commented Mar 7, 2021

@gkdn I compiled your sample code with JavaC, and it turns out Java puts B.foo in the Zoo class even though the actual method will be A.foo.

$ javap -c build/classes/java/main/test/Zoo.class 
Compiled from "Zoo.java"
public class test.Zoo {
  public test.Zoo();
    Code:
       0: aload_0
       1: invokespecial #1                  // Method java/lang/Object."<init>":()V
       4: return

  public static void main(java.lang.String[]);
    Code:
       0: iconst_5
       1: invokestatic  #2                  // Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
       4: invokestatic  #3                  // Method test/B.foo:(Ljava/lang/Integer;)V
       7: return
}

Could the call graph then contain B rather thanA in LibraryInfo? What would/could be the consequences of such a behavior change?

Regarding the rest of the discussion: you cannot know what part of B has changed without parsing it and diffing the AST against the previous known version, or possibly doing the same using ASM on the compiled class. Easiest is to just see that B has changed and then recompile both B and Zoo, because the call graph has recorded that Zoo depends on B (and not A). This is what Gradle does FWIW, and that's because JavaC records B into the bytecode (Gradle analyses the bytecode just after compilation using ASM to build the call graph; with J2Cl, analysing the generated JS would be hard, and counterproductive given that J2Cl knows about the call graph already).

@gkdn
Copy link
Member

gkdn commented Mar 8, 2021

Could the call graph then contain B rather than A in LibraryInfo?

Specifically for LibraryInfo; RTA works based on normalized code; so generally speaking it is not a good idea to provide more raw data since it would require having more logic and that's something we don't want.

Also generating data that is not normalized is not a very practical option for us since we don't track the origin. That also applies for the proposal that I made in #99 (comment).

you cannot know what part of B has changed without parsing it and diffing the AST against the previous known version, or possibly doing the same using ASM on the compiled class.

Are you talking about minimizing recompiles by detecting that only "method body" has changed right?
If you already know the byte code changed (that's what mdproctor said) and if you have recorded the method signatures and annotations and verified that haven't changed, then you don't need to recompile the dependencies. So you don't need to do a full AST diff.

Easiest is to just see that B has changed and then recompile both B and Zoo, because the call graph has recorded that Zoo depends on B (and not A).

If A is changed, you would need to mark B as dirty as well and that would make also Zoo dirty, right?

@gkdn
Copy link
Member

gkdn commented Mar 8, 2021

@mdproctor @tbroyer
I think at this point, I feel like we are discussing different design options and their consequences/workarounds in parallel and I am making assumptions to connect the dots.

Could you maybe create a document that extracts the different re-compile scenarios (including corner cases) and your proposal to handle them (including the expected output from J2CL) so we can comment on and help you out?
You don't need to think about if this data is provided by J2CL via LibraryInfo or not; we can decide later on on the best way to provide such information.

@mdproctor
Copy link

mdproctor commented Nov 27, 2021

I have a POC working and integrated with the maven plugin, that seems to work.
I wrote a document here:
https://docs.google.com/document/d/1Y2Kuif9Fi5zpmB0vN5Jod1vkuCVNwGfU3Qt-tX30NvQ

The commit can be seen here, please ignore the vertispan maven stuff.
mdproctor@d8d2e2d

It produces a build.map per class (and per inner class), using a format similar to you described:
-sources
:? // latter is empty, if they are the same

  • hierarchy
    :* // latter is 0..n entries
    -innerTypes
    * // 0..n entries
  • interfaces
    * // 0..n entries
  • dependencies
    * // 0..n entries

To make the implementation, for now, I avoided trying to change existing code. I cloned LibraryInfoBuilder and removed pruning and added the code to dump the build-up. I updated the generator class to use this if -buildMap is present. I had to make another addition to the proto to record both exist target enclosing type (which it already does) and also the instance type. It needs the latter, to make sure any class impacted by a change in the hierarchy due to a method override is picked up - when a class is changed, we iterate it's descendants and add each of those to the change set.

Once people are happy with this approach and the format in the build.map file, we can look code and how to best incorporate it. Do we introduce the behaviour directly into LibraryInfoBuilder, or do we keep the two separate but work for more code re-use.

@gkdn
Copy link
Member

gkdn commented Dec 11, 2021

Sorry for the late reply.

Yes it is good idea to iterate on a separate pass for now and we can decide later on how to best incorporate the output.

Why are the hierarchy and interfaces explicitly defined instead of simply added as a dependency?
And related question: do you really need type names? Wouldn't file names good enough as in my original proposal?

In my proposal; I assumed the flow will be;
There is going to be a set of changed files (.java or .native.js). You will mark every file in its depgraph as "dirty" and remove their output files. After that you will send "dirty" files to J2CL for re-transpilation.

This process will be optimized on the plugin side by diffing the structure/signature of classes and not do the dependency travel if they are stable since most changes won't affect them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants