Mistakes we made

We made many mistakes over the course of developing Certigrad, and we try to document them here so future developers need not repeat them. Some of these design errors were not catastrophic and are still present in the Certigrad codebase, so we strongly encourage you to read this document in addition to the Certigrad codebase before attempting a similar project on your own.

Catastrophic mistakes

We only made one catastrophic mistake during development that led to a brick wall: we omitted the shape information from the type of tensors. Specifically, in our first attempt at building Certigrad, we used the non-dependent type T : Type for tensors. The problem with T : Type is that a T has no algebraic structure. For example, there is no 0 : T, no 1 : T, and no meaningful notion of +. The solution is to store the shape of the tensor in the type (T : list nat -> Type), since for any particular shape, the type of tensors with that shape has nice algebraic properties. If you are developing a similar project on your own, we strongly recommend using the latter representation for tensors.

Non-catastrophic mistakes

Insufficient use of automation

We used heavy automation when proving properties of the specific AEVB models, but we used very little when proving the actual algorithms and when performing the mathematical derivations. The manual style we used for proving theorems such as https://github.com/dselsam/certigrad/blob/master/src/certigrad/compute_grad_slow_correct.lean#L15-L26 seemed easier than writing automation at the time, but we underestimated how many times we would need to apply similar reasoning throughout the development, and in hindsight we think the project would have been easier to build and maintain if we had put more thought into automation up front.

The biggest win would have been to craft confluent rewrite systems for (a) tensor expressions and (b) environment expressions, and to use the simplifier to avoid manual (and often surgical, multi-directional) rewriting. It is not too late to refactor accordingly, but many proofs will need to be patched.

Using algebraic structures from the Lean standard library

We began the project using the typeclass for commutative rings from the Lean standard library, since it provide lemmas that we thought would be useful, but a weird consequence of this decision is that x - y becomes sugar for x + -y. This property may be convenient when proving but is unnecessarily slow when running the program.

Not using a module for tensors

We started by actually implementing tensors in Lean on top of the reals (instead of simply axiomatizing them and wrapping Eigen), and then overriding the implementations with Eigen implementations. This was problematic because tensors were defined as functions, and so the arity of the type T shape differed between the Lean version and the Eigen version. Next we tried wrapping the Lean implementation of tensors inside a structure (T : list nat -> Type) and replacing the structure with the Eigen implementation of tensors. This addressed the original problem but a similar one still persisted: various functions on tensors still had different arities in the two representations. We decided to remove the Lean implementation of tensors entirely but there is a clean way we could have kept both versions: by using modules. Specifically, we could have done the entire development in the context of a structure representing the abstract tensor API, and then separately instantiated the structure with the two different implementations of tensors.

Provide feedback

Saved searches