Skip to content

Making it real

Daniel Selsam edited this page Aug 3, 2017 · 2 revisions

What would it take to develop a real-world, multi-device, performance-critical machine learning system like TensorFlow using our methodology?

Unfortunately, it is difficult to answer this question without additional constraints. Suppose we are trying to build a (certified) compiler in the spirit of XLA. It would be much easier to prove the compiler correct if we ensured that all compiled code has only one possible observable behavior. What would such determinism cost us? Would it be acceptable in a "real-world" machine learning system? More generally, developing a certified system can get dramatically easier when giving up certain degrees of freedom (e.g. non-determinism), and thus the cost of developing a system depends heavily on which degrees of freedom one is willing to forgo.

Let's try to refine the question. What would it take Google to develop (a duplicate) TensorFlow using our methodology and to prove it correct? This question is still hard to answer but for a different reason: what would it even mean for TensorFlow to be correct? There is no "out-of-the-box" specification for TensorFlow, and developing the specification would require some thought. Here are a few quick examples of design decisions that would need to be made:

  • In TensorFlow, the gradient of the absolute value function Abs returns 0 when its input is 0, even though the absolute value function is not differentiable at 0. Should the precondition for Abs be x != 0 or should the specification for TensorFlow only claim that it produces subgradients instead of gradients?

  • TensorFlow builds both mutation and asynchronous evaluation into the core language, and it is not clear what gradients even mean in this context. See https://groups.google.com/a/tensorflow.org/forum/#!topic/discuss/kdhnp35QJfs for an example of weird behavior. Should the main specification only apply to programs without pathologies like this? What specifically constitutes a valid TensorFlow program? From a verification perspective, it would be much easier for the language to omit mutation, and then for the compiler to mutate whenever it can prove it safe to do so.

Let's try to refine the question again: what would it take Google to develop a system like TensorFlow using our methodology, that may have subtly different semantics than the actual TensorFlow, that has a satisfying specification that approaches functional correctness, and that makes no sacrifices at all to its flexibility or performance?

TODO(dhs): finish thought experiment

Clone this wiki locally