XArchived: Client server split

This is a discussion document about a client-server setup where a per-project sbt daemon would be shared among editors, IDEs, and any other clients a developer may be using.

Some issues will have to be figured out through prototyping, but we are doing our best to frame the overall problem and anticipate solutions.

The concept

Each project-to-be-built should have a “build model,” implemented as a per-project server process, where "view-controllers" (aka clients) may be apps such as:

IDEs (Eclipse, ENSIME, Idea, etc.)
Command line tools
Standalone tools such as Activator, profilers, etc.
The limited on-reload in-browser UI offered by Play
Notification tools (using growl, gnome-shell, stuff like that)

Why (what are we solving)?

In brief:

lots of programs need to work with the build, yet
the build needs to be centrally coordinated to avoid breakage.

Longer version:

A project’s build is single-process / single-instance, doing otherwise would require greatly complicating how tasks are implemented (e.g. to have cross-process locks).
But we also have multiple programs that care about the build: IDE, editor, play run, Activator, command line. Each of these clients wants to trigger tasks and display their output.
Even if it were not broken, it’s wasteful in both CPU and RAM to have tons of sbt instances or multiple compiles.
None of the clients are “primary” or “special,” it makes sense to run any one by itself, and also makes sense to run many different combinations of multiple.

Demo

To think concretely about the design, imagine the following demo:

open an sbt command line console, Eclipse, a “play run,” and Activator, all on a Play app
load localhost:9000/whatever in your browser, triggering a compile of the Play app
the compile logs appear simultaneously in the command line sbt console, Eclipse, and Activator. Errors show in all of them and in the browser window.
There's notification of completed compiles via growl or other platform hooks.

The compile could also be triggered in Eclipse, in the sbt command console, etc. not just on browser load. The point is that all these clients are “views” on the same build instance.

Prior Art

We have a crude prototype in sbt-remote-control (see https://github.com/sbt/sbt-remote-control) where we have learned something about running sbt in a separate process under the control of a graphical user interface.

sbt-remote-control can teach us a lot about requirements. However, the implementation of sbt-remote-control is pretty much one big hack, and it does not allow multiple clients to share the same sbt process.

Some details to think about

Problem: startup, discovery, and lifecycle of the build server

As robustly as possible, we want to be sure we start ONE build server per project, even if multiple clients are launched in quick succession. In broad outline this might be done by having a file such as myproject/.sbt-server/url or otherwise storing the location of the build server inside each project directory. Appropriate locking should be used. The build server should probably self-destruct when all clients are gone.

Discussion of solutions

Problem: client-server communication machinery

What is the general shape of client-server communication? REST or WebSocket or custom protocol? Is there any "handshake" information? What sorts of messages can be sent?

Discussion of solutions

Problem: long-running tasks block the build server

Some tasks are "infinite" (such as run), hanging around until main() exits or until the developer kills the task. Other tasks are simply long-running, such as compile. During a long-running task, clients may want to cancel/restart the task, or may want to be able to fetch other information (such as the project name, or the classpath).

One possible solution would be to cache (or pre-cache) information when possible. The trouble of course is that we don't know a priori which tasks are safe to cache or quick to execute. Right now, somewhat accidentally, we have memoized settings such as name which are distinct from always-reexecuted tasks, but we may not even want to keep that situation forever.

Another possibility would be to allow tasks to background themselves in some way. For example, the run task might start up a web server and keep track of its pid, then return control to sbt. Another attempt to run would not relaunch the web server, if it was already active. There could be a general way to stop a "backgrounded" task, and a general way to get a list of active backgrounded tasks.

Problem: capturing stdout/stderr

We would want to capture output from any child processes and probably from sbt itself (in case someone is using println to debug their sbt plugin). This output then needs to go to all the clients for display.

Problem: supporting multiple sbt versions in clients

Within reason, we'd like our tools to work with any project.

As a practical matter, we may need an API that can backend to sbt 0.13 so tools can start porting to it sooner.

One concept might be two API "layers":

“Coarse” or “portable” layer maps fairly closely to e.g. the user-visible buttons in Activator and high-level things Eclipse wants to do, and abstracts across sbt versions and even potentially someday other non-sbt build tools if people want. So here we can “compile” for example and get some logs, which is a pretty least-common-denominator kind of thing.
“Fine” layer is an exact mapping to all settings available in a specific sbt version, along with detailed task results, etc. provided by that version of sbt.

The coarse layer would be implemented over the fine layer presumably.

Open question: does the coarse-to-fine mapping live in the client jar or in the build server?

Problem: who watches files and takes action accordingly?

Say some source files change; who will detect it and do something about it? We have sbt ~compile now, plus IDEs and Activator have similar functionality.

Watching files for changes kind of sucks to do in every client, because it can be expensive and/or bad for battery life.

So the sbt server side could do the file watch, but then who takes action when a "files changed" event occurs?

Maybe each client registers tasks to run on "sources changed" and we de-dup on the server, or just rely on compile doing nothing if there's nothing to do?

Problem: Mapping Scala types to wire protocol

Whenever we run a task, we need to be able to deserialize the task's input parameters and serialize the task's result. This will mean some kind of registration of (de)serializers.

Problem: Task cancellation

In sbt-remote-control, currently we are stopping the “run” or “compile” tasks by simply killing the sbt process.

We could go that route (killing either the entire build server, or some child of it), or we could build on sbt's current Ctrl+C handling.

Some tasks such as run could get stuck due to bugs in the program being executed, or bugs in sbt plugins.

Problem: State across build server restarts

In general, nothing catastrophic should happen if the build server crashes or gets killed (even kill -9). Otherwise, people will get cranky.

Some possible strategies for this are:

lose the work queue, logs, etc. on crash and clients just drop all old tasks and logs from their UI on crash
lose the work queue, logs, etc. on crash and clients try to keep logs themselves and restore their tasks on crash
persist work queue, logs, etc. and state appears unchanged to clients on reconnect

Making the build server stateful across crashes is probably fragile. It's as hard as a journaling file system or database. Also, relevant state may well change between build server runs (e.g. a git checkout, or whatever).

But if clients have to manually handle errors on every single API call, none of them will do so. So the client API needs to do something better than just throw when it gets a socket error. Potentially the client API should keep the client's work queue locally and try to restore it to the build server if the client has to reconnect.

Problem: Plugins which use the terminal directly

Interaction with user needs to change from direct JLine/console usage to some kind of API, which would be proxied through a wire protocol.

Some tasks in sbt itself also do this (for example mainClass asking which main class to run if none has been configured).

Special cases to handle could include:

No interactive client is attached.
Multiple clients are attached; if the user answers a question in one client, the others should remove their prompt or dialog.

Problem: Completion

A command line client with autocomplete will need to ask the server for completions.

Problem: Notification of build reload

Running commands, or reloading the build, may mean the classpath has changed, or even that the Scala version or sbt version has changed. Clients will have to be notified.

Eclipse should run some paranoia checks every time the build changes, for example ensuring that the Scala version is supported by Eclipse.

Problem: Comprehensive error list

In some cases incremental compilation does not output the complete list of compile errors, because they were shown on a previous compile run. This is confusing for IDEs and editors which would like to show all errors. Should the build server automatically take care of this and guarantee that compile always yields the exhaustive list of errors and warnings for all files?

Problem: How do clients filter what they get over the socket?

For efficiency, maybe clients need to ask for things like copious debug output before we start shoveling it over the socket. However, we should not prematurely optimize this (round trips are bad too, not only bandwidth - in some cases it can be better to send everything async rather than encouraging a ton of remote “gets”). It’s not clear exactly what we do here; debug logging may be the only thing that’s really a local-socket bandwidth issue. So maybe it’s just ability to set log level per-client.

Problem: How long is the history?

It seems useful to keep history of tasks along with their event logs, but we don’t want this to be infinite, so how do we know when to truncate it?

Problem: Tasks and plugins sending events

Tasks may want to broadcast typed events to clients. The types involved may not be known to the sbt core. Examples of events:

Structured compile warnings/errors (avoiding log parsing)
Notification that "Play is now running at http://localhost/9000"
Progress indication (percent complete)

Problem: Tasks and plugins receiving requests

If a task is long-running or backgrounded, it may want to support some kind of custom typed requests from clients (and replies to same). The main examples we have right now are pretty generic, such as cancellation; do we need custom requests or only some fixed generic ones?

Client API

We would have a jar used to interact with the build server. This jar should:

Automatically start and restart the build server.
Handle server restarts by reconnecting / reloading.
Provide a nice, typesafe interface for interacting with sbt concealing protocol cruft.

Ideas: what could the model provided by the build server look like?

Each client makes requests to the server. Since we already have meanings for "task" and "command," let's just call these requests a ServerRequest for now. A ServerRequest might typically (or always?) indicate a task or command to be executed, but would also include additional data.

Wire protocol requests would sometimes create a ServerRequest, but other times might manipulate them (for example, canceling a ServerRequest) or manage client-server concerns such as which events a client would like to receive.

The server-side model might have these entities:

An ordered list of past ServerRequests (shared among all clients, but clients can tell which are “theirs”)
An ordered list of future ServerRequests (also shared among all clients, but tagged by client)
An optional set of currently-active ServerRequests (plural because we may want to keep some open while running others)..
“Source files changed” events properly ordered with respect to the list of ServerRequest (i.e. if source files change after the start of compile, the “source files changed” event is after that compile).
“Build config files changed” events properly ordered with respect to the list of ServerRequest.
Configuration for what to do on source / build files changed events (automatically add compile request to the queue, for example, on source files changed).

Each ServerRequest might have:

An id for use in referring to it. (this is per-ServerRequest, so if I send 10 compile requests it’s different for all).
The client which submitted it.
A log of all events that occurred during the request. The most common event will be “log message” probably, but there are also a few other things like compile errors or “play is now on port 9000” or what-have-you. Of course we also send change notification out to clients whenever this log has new stuff.
Whether it has been cancelled, succeeded, or failed; with the result if any.
What the ServerRequest actually is (generally “run this task” or “run this command” where the task is specified in the usual way by a ScopedKey).
Potentially some view of the “contained” tasks (perhaps for multiple projects, and dependency graph of each task). This detail may require an extra wire protocol request to get it, only if we end up needing it in one of the clients. We could punt this down the road a bit.

Each client might have:

A unique ID it can use to identify requests it submitted, etc. (maybe this is just an object reference or the identify of the socket)
A human-readable name for display if we have a ServerRequest queue shown in the UI or whatever

A very literal rendering of the model might be a UI like this:

Completed Requests
- root/update (from Eclipse) logs
- core/compile (from command line) logs
Active Requests
- core/run (from Activator) logs stop
Pending requests
- root/name (from Activator) cancel
- web/whatever (from browser reload) cancel

In fact, the build server itself could export a little HTML page with a UI like that, as a handy way to check on its state.

At any time, a client needs to be able to 1) load the current model then 2) track changes since the loaded snapshot.

The protocol could work by having change events that move from one snapshot to another. So if I have model version 50 then there’s some change event that allows me to transform it from 50 to 51. This way, any view can sync to the model, by first loading its current state, then receiving change events from that state forward. An example of a change event would be “append this log message to this task,” just any delta that can be applied to the model.

Ideas: Backgroundable Tasks

The idea here is that we have a notion of a task that can remain “active” and that sbt would track which tasks are currently active. The canonical example would be the Play “run” task.

When executed, if the task is inactive then it gets a “Start” operation which returns the task’s result type; if the task is already active then it gets a “Continue” operation which also returns the task’s result type. Tasks have a third operation, “Stop,” which also returns the task’s result type. And like any task, an active task can be cancelled (which means the task goes inactive and returns an error). Start, Continue, and Stop would all receive the task’s dependencies as parameters.

This means that the task has more than one state that the task execution engine retains. Specifically, active tasks have three primary functions:

Start the task, loading additional information into the state
Handle a new request for the task, while the task is active
- Pull state that was stored from the “start” run
- Pull current values of task dependencies
- Check to see what work needs to be done
Go inactive (Clean up the saved state, stop the forked process, etc.). This needs to be done when reloading a build or possibly upon client request. E.g. possibly active task requests can optionally block a client (like sbt> run) so that only logs show up. Issuing a ctrl-c would make the task go inactive.

Note: It’s possible to encode active tasks on top of “storeAs”, etc (I think), which may be ok for clients. The main reason we can do this is if access to the task execution engine (and therefore state transitions) are serialized by the build server.

Ideas: command-line flexibility

The sbt server concept opens up more kinds of command line client, if people are interested. Right now, due to startup time, the command line client has to be its own shell. sbt server enables something more like git's suite of separate commands used directly from the unix shell. Individual commands could in principle be little more than a wget to localhost, so they should have trivial startup time.

Ideas: Wire protocol

Using HTTP/websocket and JSON for the wire protocol would make it much easier to write quick clients or clients in non-JVM frameworks. For example, on Mac or Linux, it might be neat to write a quick little client for notifications (using growl or gnome-shell or whatever). curl and wget may also be useful for testing.

There's also JavaScript to consider; the build server could have a cheesy, small JavaScript app built in for viewing and manipulating the work queue, and tools such as Activator may be partially implemented in JavaScript.

The protocol should have request/response pairing using request serials or request IDs, so that responses may be reordered. That is, it should not rely on responses coming back in the same order that requests were sent.

Other ideas

Many more will no doubt emerge over time, please share yours.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly