Formal Methods

Everybody hates buggy software. Most of us just live with it, either because we don’t build the software ourselves, or we’ve experienced enough errors to know that it’s "just not possible" to get rid of all of them. A world without software errors seems like a pipe dream, because it’s so far from what we experience day to day.

But the dream lives with a small cadre of researchers and programmers. They feel that bugs exist not because it's inevitable, but because our software is too poorly defined for us to truly know if it’s correct. To reach the promised land of error-free code, the programmer must describe exactly what their software is meant to do—and what it’s meant not to do—in mathematical terms, so that the logic underlying the program can be proven to be correct using mathematical proof techniques. Some hold that this is the only way to ensure that a program will never misbehave.

Formal methods, as these techniques are known, are poorly received among the general software community due to their difficulty. Esoteric concepts abound, and for most programmers who have little formal training in mathematical proof techniques it's easy to get lost in the complexity of fully verifying even a small program. Tools are unfriendly at best, user-hostile at worst, adding to the pain and preventing their adoption.

I have been taking a course on one such tool this semester, fittingly titled "Formal Reasoning About Programs", taught by Adam Chlipala of "Certified Programming with Dependent Types" fame—or, at least, what passes for it among those interested in such things. The course provides a broad introduction to major techniques used to model programs as transition systems or finite automata, which are well-understood objects and lend themselves to formal reasoning, within the "proof assistant" Coq.

To a first approximation, a program in this system looks like one in any other functional programming language that supports pattern matching: we can define functions and data structures, compute values, and use types to keep everything aligned. It begins to diverge when you find the first Theorem statement, where we transition from one of Coq's inner languages (Gallina) to another (Ltac). You see, Coq has a split personality: in one body it is both a powerful, if standard, functional programming language and a proof assistant whose underlying logic–the abstractly named Calculus of Constructions–is meant to assist you in walking the straight and narrow path to a valid proof of your program's correctness.

Borrowing from the course's textbook (freely available online, at, let's look at a small example of how these two sides come together. Within the file BasicSyntax.v is a toy language for demonstrating some of Coq's features. It defines a set of arithmetic operators and a constant type. We can use this toy example to demonstrate Coq's proof style on some basic operations that operate on the syntax tree of the language.

Inductive arith : Set :=
  | Const (n : nat)
  | Plus (e1 e2 : arith)
  | Times (e1 e2 : arith).

We first encounter an inductive data type. These define what it means for certain relations to hold between objects. They consist of a list of constructors—in our case, Const, Plus, and Times—which are allowed to "inhabit" the inductive type. For example, let's consider a few constants:

  (* Here are a few examples of specific expressions. *)
  Example ex1 := Const 42.
  Example ex2 := Plus (Const 1) (Times (Const 2) (Const 3)).

As you can see, constructors can be applied to values in order to create instances of our type. Beyond this simple application, inductive types are able to include preconditions and other rules which make them more powerful than similar structures in other languages. I won't use such features here, but the book makes extensive use of them as modeling becomes more challenging.

  Fixpoint size (e : arith) : nat :=
    match e with
    | Const _ => 1
    | Plus e1 e2 => 1 + size e1 + size e2
    | Times e1 e2 => 1 + size e1 + size e2

A Fixpoint defines a recursive function, one which calls itself until it reaches a terminal condition. We can see the mathematical underpinnings clearly here: a function's fixed point, in mathematics terminology, is a value which emerges unchanged after applying the function to it. In this example, the function size counts each node in a program syntax tree, recursively calling itself when the value it is given is not a constant.

Next, we introduce another function on these syntax trees (depth) and prove that a relationship holds between the two. By the way, everything between parenthesis-star, (* ... *), is a comment.

 Fixpoint depth (e : arith) : nat :=
    match e with
    | Const _ => 1
    | Plus e1 e2 => 1 + max (depth e1) (depth e2)
    | Times e1 e2 => 1 + max (depth e1) (depth e2)

  (* Our first proof!
   * Size is an upper bound on depth. *)
  Theorem depth_le_size : forall e, depth e <= size e.
    induct e.




This is not a proof as most mathematicians would recognize it, despite the trappings. Rather, the program makes an assertion (for all arithmetic expressions e, the result of calling the function depth on e is less than or equal to the result of calling size on it) and then uses a series of "tactics", written in the "Ltac" language, to guide the proof assistant to accepting the conclusion.

Of course, it would be impossible to do this blind. A group of researchers at the University of Edinburgh has created an interface to a variety of proof assistants for Emacs called Proof General, which we can use to track the state of the system as we write the tactics script. For this example, my Emacs window shows me this:


Our current goal, the next statement we must prove, is displayed in one window, along with any hypotheses we may have at this stage in the proof; the program itself is shown in another. This arrangement fosters continuous interaction with the proof assistant as you guide it, via Ltac tactics, through the reasoning process required for the proof. Despite this continuous back-and-forth with the machine, proving a given theorem may require quite a bit of insight and intuition, just as traditional proofs performed on paper do.

Defining these theorems and proving them is what makes Coq, Coq. To prove that a program is correct requires creating a logical basis for reasoning about the program, proving theorems and lemmas along the way to ensure that you are always on solid footing. The overhead for such proofs can be quite large - even involving fully embedding the syntax and semantics of other languages within Coq itself. In my limited experience based on the coursework, creating this infrastructure is likely to be a substantial investment for any non-trivial program.

And therein lies the rub: we want programs which are free of errors, but to reach that point we may need to double or triple the effort needed to write them in the first place. As things stand now, heavyweight formal methods proofs such as the ones that I have been doing for my class seem undesirable for the majority of software projects today. Only niche applications where errors are extremely costly justify their use. For now, we just have to keep dreaming.