Skip to content

XArchived: Improving .sbt format (take 2)

eugene yokota edited this page Sep 10, 2017 · 1 revision

Improving the sbt format (again)

With sbt 0.13 we have a drastic improvement to the .sbt format of sbt. We now have great flexibility in how we can define our build definitions. In particular, sbt currently:

  • Splits the build file using empty blank lines.
  • Identifies each setting/line by one of three classes of code:
    • import statement
    • val, def "definitions" new in 0.13
    • expressions which return Setting[_] or Seq[Setting[_]]

The translation of .sbt files into scala modules (used to compile) is a bit elegant, but relatively straightforward from the above. Here's the basic grammer which is followed:

 ("import" -> scalac.importClause ... )*
 repsep( definitionsOrSetting, blankLine)

 definitionsOrSetting = definition | setting
 definition = (("val"|"lazy val"|"def") -> scalac.nonLocalDefOrDcl ... )+
 setting = scalac.expr

First, sbt keys off of lines starting with "import", which are listed at the top of the file only. It parses these with scalac's importClause. These are combined with any automatic imports: there are some hard-coded ones like sbt._ and Keys._ and then there are ones from Plugins.

Then, each definition block is parsed. This is scalac's (nonLocalDefOrDecl ~ acceptStatSepOpt)+. These are combined into a common synthetic module and compiled.

Finally, the settings are compiled with the above imports + an import for the module containing the definitions. Each setting gets its own module.

In sbt, the parts of the process (each definition block and setting) are cached by the hash of the piece, the imports, the compiler settings, etc... Each setting gets its own module and thus its own class files. So, a setting that has already been processed doesn't require starting up the compiler. This avoids starting up the compiler in many situations, including the most common case where nothing has changed of course. It also allows incrementally processing the files, so that only what has changed needs handling.

Arising issues

With the improved flexibility of allowing definitions in .sbt files has arisen some confusion in users around conventions, best practices and stylistic concerns. Primarily, when creating multi-line definitions (e.g. helper methods) there is a conflict between code style concerns and requirements of the sbt parser. Here's an example:

def helperMethod(in: Input1): Unit = {
   // Do the thing 1
   val thing1 = doThing1(in)

   // Do the thing 2
   val thing2 = doThing2(in)

   thing1 ++ thing2 // Return the result
}

While this is a valid scala method definition, it is not expressible in the .sbt format, because of the blank lines. While sbt 0.13 provides a helpful error message, the user is required to make one of two changes:

  1. remove blank lines
  2. replace blank lines with empty comments.

So, essentially, we're now imposing style concerns on users.

Possible solutions

There are a few possible solutions to this issue, with various trade-offs:

1. Push non-trivial code into project/*.scala files.

This is generally a good thing to do. We can and should continue to encourage users to push non-trivial configuration into .scala files. However, this does not solve the case where majority of users starting to use sbt do not read best practices, and just begin to do "what they feel is right". With sbt 0.13, we've seen this issue pop-up in many users as a repeating cause of broken builds when working on .sbt files. While the blank line rule is quite simple, it seems adverse to remaining in user's memory.

2. Attempt to remove the blank line restrictions via an improved parser.

This solution involves creating a new parser which includes the full Scala expression parser. This limits .sbt files to the same non-determinism as Scala itself, as things like postfix-notation can wreak havoc on semi-colon inference and brace healing. The pro of this solution is that hand-written sbt files, assuming the current Scala style which avoid the use of postfix operators, will be parsable including white lines in expressions. The downside is that it would no longer be "safe" to programatically alter a .sbt file by removing an expression. This is because, with scala syntax, there is no way to guarantee that altering the order of two expressions will not alter the parsing of the expressions themselves.

This solution still involves the fragmentation of the .sbt file into hashable + incremental chunks. This serves, not only to avoid recompilation as necessary, but to help provide mechanisms to deal with the fallout of non-determinism in the expression parsing.

Josh's opinion: This restriction isn't a huge deal in practice. Primarily because:

  1. there is a subset of known mechanisms in Scala that lead to ambiguous expression parsing. Given that sbt hashes expressions (and would have to maintain this), automated tools can verify pre-reformat post-reformat expression parsing does not alter unrelated expressions. If the automation fails, you can relegate the task to the user, and notify them of the offending expressions and/or try a set of things that have likelihood to succeed, like adding a blank line between offending expressions. Most likely those could be reformatted to better support automation.
  2. The ability to programmatically refactor .sbt files is inherently limited to the subset of .sbt lines which it can understand. Removing blank lines may place further restrictions on what could be automated in refactoring, but would not limit any known use case from still being viable. (Right now, this ability is mostly used by set <setting>, session save and the sbt-release plugins).