XArchived: Understanding incremental recompilation

Understanding Incremental Recompilation

Documentation has moved

The documentation for sbt has moved to http://scala-sbt.org. The new location for this page is http://www.scala-sbt.org/0.13/docs/Understanding-Recompilation.html.

Introduction

Compiling Scala code is slow, and SBT makes it often faster. By understanding how, you can even understand how to make compilation even faster. Modifying source files with many dependencies might require recompiling only those source files—which might take, say, 5 seconds—instead of all the dependencies—which might take, say, 2 minutes. Often you can control which will be your case and make development much faster by some simple coding practices.

In fact, improving Scala compilation times is one major goal of SBT, and conversely the speedups it gives are one of the major motivations to use it. A significant portion of SBT sources and development efforts deals with strategies for speeding up compilation.

To reduce compile times, SBT uses two strategies:

reduce the overhead for restarting Scalac;
implement smart and transparent strategies for incremental recompilation, so that only modified files and the needed dependencies are recompiled.
SBT runs Scalac always in the same virtual machine. If one compiles source code using SBT, keeps SBT alive, modifies source code and triggers a new compilation, this compilation will be faster because (part of) Scalac will have already been JIT-compiled. In the future, SBT will reintroduce support for reusing the same compiler instance, similarly to FSC.
When a source file A.scala is modified, SBT goes to great effort to recompile other source files depending on A.scala only if required - that is, only if the interface of A.scala was modified. With other build management tools (especially for Java, like ant), when a developer changes a source file in a non-binary-compatible way, he needs to manually ensure that dependencies are also recompiled - often by manually running the clean command to remove existing compilation output; otherwise compilation might succeed even when dependent class files might need to be recompiled. What is worse, the change to one source might make dependencies incorrect, but this is not discovered automatically: One might get a compilation success with incorrect source code. Since Scala compile times are so high, running clean is particularly undesirable.

By organizing your source code appropriately, you can minimize the amount of code affected by a change. SBT cannot determine precisely which dependencies have to be recompiled; the goal is to compute a conservative approximation, so that whenever a file must be recompiled, it will, even though we might recompile extra files.

SBT heuristics

SBT tracks source dependencies at the granularity of source files. For each source file, SBT tracks files which depend on it directly; if the interface of classes, objects or traits in a file changes, all files dependent on that source must be recompiled. In particular, this currently includes all transitive dependencies, that is, also dependencies of dependencies, dependencies of these and so on to arbitrary depth.

SBT does not instead track dependencies to source code at the granularity of individual output .class files, as one might hope. Doing so would be incorrect, because of some problems with sealed classes (see below for discussion).

Dependencies on binary files are different - they are tracked both on the .class level and on the source file level. Adding a new implementation of a sealed trait to source file A affects all clients of that sealed trait, and such dependencies are tracked at the source file level.

Different sources are moreover recompiled together; hence a compile error in one source implies that no bytecode is generated for any of those. When a lot of files need to be recompiled and the compile fix is not clear, it might be best to comment out the offending location (if possible) to allow other sources to be compiled, and then try to figure out how to fix the offending location—this way, trying out a possible solution to the compile error will take less time, say 5 seconds instead of 2 minutes.

What is included in the interface of a Scala class

It is surprisingly tricky to understand which changes to a class require recompiling its clients. The rules valid for Java are much simpler (even if they include some subtle points as well); trying to apply them to Scala will prove frustrating. Here is a list of a few surprising points, just to illustrate the ideas; this list is not intended to be complete.

Since Scala supports named arguments in method invocations, the name of method arguments are part of its interface.
Adding a method to a trait requires recompiling all implementing classes. The same is true for most changes to a method signature in a trait.
Calls to super.methodName in traits are resolved to calls to an abstract method called fullyQualifiedTraitName$$super$methodName; such methods only exist if they are used. Hence, adding the first call to super.methodName for a specific methodName changes the interface. At present, this is not yet handled—see issue #466.
sealed hierarchies of case classes allow to check exhaustiveness of pattern matching. Hence pattern matches using case classes must depend on the complete hierarchy - this is one reason why dependencies cannot be easily tracked at the class level (see Scala issue SI-2559 for an example.)

How to take advantage of SBT heuristics

The heuristics used by SBT imply the following user-visible consequences, which determine whether a change to a class affects other classes.

XXX Please note that this part of the documentation is a first draft; part of the strategy might be unsound, part of it might be not yet implemented.

Adding, removing, modifying private methods does not require recompilation of client classes. Therefore, suppose you add a method to a class with a lot of dependencies, and that this method is only used in the declaring class; marking it private will prevent recompilation of clients. However, this only applies to methods which are not accessible to other classes, hence methods marked with private or private[this]; methods which are private to a package, marked with private[name], are part of the API.
Modifying the interface of a non-private method requires recompiling all clients, even if the method is not used.
Modifying one class does require recompiling dependencies of other classes defined in the same file (unlike said in a previous version of this guide). Hence separating different classes in different source files might reduce recompilations.
Adding a method which did not exist requires recompiling all clients, counterintuitively, due to complex scenarios with implicit conversions. Hence in some cases you might want to start implementing a new method in a separate, new class, complete the implementation, and then cut-n-paste the complete implementation back into the original source.
Changing the implementation of a method should not affect its clients, unless the return type is inferred, and the new implementation leads to a slightly different type being inferred. Hence, annotating the return type of a non-private method explicitly, if it is more general than the type actually returned, can reduce the code to be recompiled when the implementation of such a method changes. (Explicitly annotating return types of a public API is a good practice in general.)

All the above discussion about methods also applies to fields and members in general; similarly, references to classes also extend to objects and traits.

Why changing the implementation of a method might affect clients, and why type annotations help

This section explains why relying on type inference for return types of public methods is not always appropriate. However this is an important design issue, so we cannot give fixed rules. Moreover, this change is often invasive, and reducing compilation times is not often a good enough motivation. That is why we discuss also some of the implications from the point of view of binary compatibility and software engineering.

Consider the following source file A.scala:

import java.io._
object A {
  def openFiles(list: List[File]) = list.map(name => new FileWriter(name))
}

Let us now consider the public interface of trait A. Note that the return type of method openFiles is not specified explicitly, but computed by type inference to be List[FileWriter]. Suppose that after writing this source code, we introduce client code and then modify A.scala as follows:

import java.io._
object A {
  def openFiles(list: List[File]) = Vector(list.map(name => new BufferedWriter(new FileWriter(name))): _*)
}

Type inference will now compute as result type Vector[BufferedWriter]; in other words, changing the implementation lead to a change of the public interface, with two undesirable consequences:

Concerning our topic, client code needs to be recompiled, since changing the return type of a method, in the JVM, is a binary-incompatible interface change.
If our component is a released library, using our new version requires recompiling all client code, changing the version number, and so on. Often not good, if you distribute a library where binary compatibility becomes an issue.
More in general, client code might now even be invalid. The following code will for instance become invalid after the change:

val res: List[FileWriter] = A.openFiles(List(new File("foo.input")))

Also the following code will break:

val a: Seq[Writer] = new BufferedWriter(new FileWriter("bar.input")) :: A.openFiles(List(new File("foo.input")))

How can we avoid these problems?

Of course, we cannot solve them in general: if we want to alter the interface of a module, breakage might result. However, often we can remove implementation details from the interface of a module. In the example above, for instance, it might well be that the intended return type is more general - namely Seq[Writer]. It might also not be the case - this is a design choice to be decided on a case-by-case basis. In this example I will assume however that the designer chooses Seq[Writer], since it is a reasonable choice both in the above simplified example and in a real-world extension of the above code.

The client snippets above will now become

val res: Seq[Writer] = A.openFiles(List(new File("foo.input")))

val a: Seq[Writer] = new BufferedWriter(new FileWriter("bar.input")) +: A.openFiles(List(new File("foo.input")))

XXX the rest of the section must be reintegrated or dropped: In general, changing the return type of a method might be source-compatible, for instance if the new type is more specific, or if it is less specific, but still more specific than the type required by clients (note however that making the type more specific might still invalidate clients in non-trivial scenarios involving for instance type inference or implicit conversions—for a more specific type, too many implicit conversions might be available, leading to ambiguity); however, the bytecode for a method call includes the return type of the invoked method, hence the client code needs to be recompiled.

Hence, adding explicit return types on classes with many dependencies might reduce the occasions where client code needs to be recompiled. Moreover, this is in general a good development practice when interface between different modules become important—specifying such interface documents the intended behavior and helps ensuring binary compatibility, which is especially important when the exposed interface is used by other software component.

Why adding a member requires recompiling existing clients

In Java adding a member does not require recompiling existing valid source code. The same should seemingly hold also in Scala, but this is not the case: implicit conversions might enrich class Foo with method bar without modifying class Foo itself through the pimp-my-library pattern (see discussion in issue #288 - XXX integrate more). However, if another method bar is introduced in class Foo, this method should be used in preference to the one added through implicit conversions. Therefore any class depending on Foo should be recompiled. One can imagine more fine-grained tracking of dependencies, but this is currently not implemented.

Further references

The incremental compilation logic is implemented in https://github.com/sbt/sbt/blob/0.13/compile/inc/src/main/scala/sbt/inc/Incremental.scala. Some related documentation for SBT 0.7 is available at: https://code.google.com/p/simple-build-tool/wiki/ChangeDetectionAndTesting. Some discussion on the incremental recompilation policies is available in issue #322 and #288.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly