🔴 🟡 🟢Pattern Matching

Pattern Matching

I've been having great fun in the Lean dependent type system, so it's high time I wrote about pattern matching!

What is pattern matching?

Pattern matching is a mechanism for deconstructing data types by directly matching against their structure and binding components to variables. For production software, native pattern matching support in a language helps with type safety, reducing defensive checks and making illegal/not-well-defined states unreachable. For formal verification, this is how we prove propositions -- each case is a proof obligation.

By C++17 we'd gotten std::variant, std::visit, and structured bindings as an approximation for a non-native way to pattern match.

#include <variant>
#include <iostream>

struct Zero {};
struct Succ {
    int value;
};

using Nat = std::variant<Zero, Succ>;

template<class... Ts>
struct overloaded : Ts... {
    using Ts::operator()...;
};

template<class... Ts>
overloaded(Ts...) -> overloaded<Ts...>;

void printNat(const Nat& n) {
    std::visit(overloaded{
        [](const Zero&) {
            std::cout << "Zero\n";
        },
        [](const Succ& s) {
            std::cout << "Succ with value: " << s.value << "\n";
        }
    }, n);
}

int main() {
    Nat zero = Zero{};
    Nat succ = Succ{42};

    printNat(zero); // Output: Zero
    printNat(succ); // Output: Succ with value: 42
}

The std::variant is a discriminated union/sum type that allows runtime polymorphism without requiring inheritance or virtual functions. std::visit dispatches the right callable (lambda in this example) based on the active type in the std::variant. Since std::visit only takes a single callable object but we want to use different lambdas based on the different possible underlying types (Zero/Succ), we use the overloaded struct to combine multiple lambdas together into a single callable. This is cleverly achieved by inheriting from all the lambdas and using template parameter pack expansion (Ts... ), and then bringing all the operator() overloads together into the struct via using Ts::operator()...;. Then we use CTAD in overloaded(Ts...) -> overloaded<Ts...>; to let the compiler automatically deduce the types of the lambdas passed to overloaded.

Since we wrote this without a generic fallback, exhaustiveness is enforced -- we get a (pretty gnarly) compilation error if we add in some other type to the variant without adding a corresponding case in the std::visit. We don't have exhaustiveness checking provided natively, and we also don't have matching on multiple fields or nested matching.

In Lean this is easier on the eyes:

inductive Nat : Type where
  | zero : Nat
  | succ : Nat → Nat

def isZero (n : Nat) : Bool :=
  match n with
  | Nat.zero => true
  | Nat.succ _ => false
  
#eval isZero Nat.zero -- `true`
#eval isZero (Nat.succ Nat.zero) -- `false`

We inductively define a Nat type with a base case Nat.zero and an inductive successor case Nat.succ. Then isZero pattern matches on the structure of Nat. The native language support in Lean means handling of all possible cases is enforced, and values are automatically destructured and bound.

Dependent type systems ...and match?

Actually, Lean takes it a step further by supporting full dependent types. A dependent type is a type that can depend on runtime values.

def BoolToType x :=
  match x with
  | true => Nat
  | false => String
  
axiom n : Nat

axiom x : BoolToType (n % 2 == 0)

This convoy example is from stepchowfun's Lean tutorial series! See Sources

Here the type of x depends on whether the value of n is even or odd. And let's say we want to do different things with x depending on the different types it could be. You might try naively pattern matching:

-- #check
--   match n % 2 == 0 with
--   | true => x + 1
--   | false => x.length

But this is an issue -- Lean can't infer the specific type of x based on what it knows about n. Naive pattern matching doesn't work because it doesn't do type refinement on the result for each case such that each branch only needs to consider the specific value corresponding to that case. You'll get compiler errors on both match cases above. Specifically, in both branches x still has type BoolToType (n % 2 == 0), which is why we can't use it as Nat or String.

The convoy pattern and dependent pattern matching

Instead, let's use dependent pattern matching and the convoy pattern to "lift" the runtime value into the compile time type system. We construct a function which accepts an arbitrary value of the dependent type. Then we immediately call that function on the value we have, x. Dependent pattern matching means when we match on a value, it causes types mentioning that value to get refined.

#check
  (
    match n % 2 == 0 with
    | true => fun y => @id Nat y + 1
    | false => fun y => y.length
    : BoolToType (n % 2 == 0) → Nat
  ) x

The key here is Lean creates an equality proof in each branch. For the true case Lean knows n % 2 == 0 = true, so BoolToType (n % 2 == 0) reduces to BoolToType true . Then we can use @id Nat to force y's BoolToTrue type to be reduced to Nat type, which is needed for type class resolution of which + operator to use (@ id means the identity function with implicit arguments disabled). In the false case, Lean knows n % 2 == 0 = false, and the .length method directly tells Lean y 's type should be reduced down to a String via method resolution.

And using the equality proof to carry information learned about each case of x is quite neat! Indeed, that's the essence of the convoy pattern: that information is convoyed/escorted into each branch to be refined, which enables expressiveness while maintaining type safety. (To be clear, naive pattern matching does maintain type safety too, by being conservative and rejecting.)

Why can't C++ do something like this?

Stepping back, the answer lies in the differences in how C++ and Lean are compiled, and what each language wants to achieve.

In general, C++ has a hard wall between compile and run time. Types are checked at compile time, and values are computed at runtime. Types have to be erased before runtime for performance. C++ guarantees zero-cost abstraction, which means no runtime overhead for type safety. But by definition, dependent types depend on runtime values. On the other hand, Lean doesn't mind adding compilation time -- as an interactive theorem prover, Lean wants that compile-time correctness.

Sources

https://lean-lang.org/theorem_proving_in_lean4/Dependent-Type-Theory/#dependent-type-theory

https://lean-lang.org/functional_programming_in_lean/Programming-with-Dependent-Types/#Functional-Programming-in-Lean--Programming-with-Dependent-Types

https://github.com/stepchowfun/proofs/tree/main/proofs_lean/tutorial