Inductive Type

Roughly speaking, inductive type is a family of types that emphasizes on the inductive (or recursive) structure of its values. So that we could reason about the properties of those values by performing structural induction (or recursion).

Induction on natural number

We can define natural number inductively according to Peano Arithmetic:

data Nat = Z
         | S Nat

which states a natural number is either:

  • zero (Z)
  • a “successor” (S) of another natural number

so that any natural number can be represented by a finite linked structure, e.g. 0 is represented by Z, 1 is represented by S Z, 5 is represented by S(S(S(S(S Z)))) etc.

And the induction on naturual number is naturally followed by mathematical induction, which state: to proof a property \(P\) with respect to all natrual numbers, we just show the following two things:

  • Base case: 0 (Z) satisfies the property, i.e. \(P(Z)\) holds
  • Inductive step: for all natural number \(n\), by assuming \(P(n)\) holds, we can show \(P(S\ n)\) holds

As such, we have \(P(Z)\), and we can proof \(P(S\ Z)\) by induction step, and all the way down to all natural numbers.

We can formulate this idea with a dependently typed language (idris, in this case).

natRec : {P : Nat -> Type}             -- some predicate
      -> P Z                           -- base case
      -> ((k : Nat) -> P k -> P (S k)) -- induction step
      -> (n : Nat) -> P n

The computational behavior of the induction is simply applying pattern matching and recursion of natRec

natRec b _    Z    = b
natRec b ind (S n) = ind n (natRec b ind n)

A non-trivial observation is that, the recursion on natRec is guaranteed to terminate:

Since, each step of the recursion, we “strip away” a layer of S and recurse with a strictly smaller structure (n than S n). And at last the “base case” of the inductive type is hit, namely Z, and the recursion returns.

And we can proof a trivial proposition of natural number with natRec

plusRightIdentity : (n : Nat) -> n + Z = n
plusRightIdentity = natRec {P = \n => n + Z = n}
  Refl                         -- 0 = 0
  (\k, p => rewrite p in Refl) -- n + 0 = n => S n + 0 = S n

Induction on binary tree

According to the definition of Nat (and our general knowledge), a non-zero natural number has exactly one predecessor. Howerver, for an arbitrary inductive type, the non-base case may have an arbitrary amount of predecessors. Our inductively define the binary tree as follow:

data Tree a = Leaf
            | Branch a (Tree a) (Tree a)

In this case, the Branch of binary tree has two predecessors representing its left and right subtrees.

Also, we can formulate the structural induction rule of Tree:

treeRec : {A : Type} -> {P : Tree A -> Type}
       -> P Leaf
       -> ((a : A) -> (l: Tree A) -> (r: Tree A) ->
            P l -> P r -> P (Branch a l r))
       -> (t : Tree A) -> P t
treeRec b _    Leaf          = b
treeRec b ind (Branch a l r) =
  ind a l r (treeRec b ind l) (treeRec b ind r)

It looks a lot messier but the idea is the same, the major differences lie in the induction step:

  • Most obviously, we need proofs of P from both of its predecessors instead of only one.
  • We should take the value a in the tree branch into consideration.

The first difference may seem straightforward, the reason of the second is that, if we want to show some property of some tree, what we want to show may be some property of the “elements” in the tree. So when we construct the proof of a Branch from its predecessors, the element of the branch should be taken into account.

W Type

Generalizing Inductive Type

Generally speaking, an inductive type consists of several cases, each may contain some data and predecessors. In other words, the number of predecessors is relying on the “case” and “data”. e.g. the Leaf and Z has 0 predecessors while Branch has 2, S has 1.

We can of course characterize this with a dependently typed system (and GADT syntax for inductive data):

data TreeData a = LeafD | BranchD a

nTreePreds : TreeData -> Nat
nTreePreds LeafD       = 0
nTreePreds (BranchD _) = 2

data Inductive : (A : Type) -> (A -> Nat) -> Type where
  -- ...

Tree : Type -> Type
Tree a = Inductive (TreeData a) nTreePreds

It seems that the type of Inductive decribe the situation quite well. But the problem comes, we would like a way to access the predecessors of the actual value of the inductive type.

-- Wrong implementation
data Inductive : (A : Type) -> (A -> Nat) -> Type where
  MkInductive : {A : Type} -> {B : A -> Nat}
             -> (a : A) -> (Nat -> Inductive A B)
             -> Inductive A B

We cannot simply characterize this as “indexing” function of Nat -> Inductive, what if we supply a number greater than the number of predecessors the case supposed to has? Either the indexing function is untotal or the abstraction strays away from what we originally mean.

One option is that we can do some reasoning with the number of predecessors a case suppose to have:

either by adding a “less than” proof to the indexing function…

data Inductive : (A : Type) -> (A -> Nat) -> Type where
  MkInductive : {A : Type} -> {B : A -> Nat}
             -> (a : A) -> ((n : Nat) -> LT n (B a) -> Inductive A B)
             -> Inductive A B

or we just explicitly limit the amount of predecessors…

data Inductive : (A : Type) -> (A -> Nat) -> Type where
  MkInductive : {A : Type} -> {B : A -> Nat}
             -> (a : A) -> Vect (B a) (Inductive A B)
             -> Inductive A B

These should work, but we can do better.

W Type: Type Inhabitants as Index

The idea is, we need some way to access the corresponding predecessor of a certain index, but the index doesn’t have to be a number. Previously, “the number of predecessor” is abstracted by a number, but what we actually want is just “some limit” to the number of indices to the predecessor. We want to limit the number of predecessors of Leaf to 0, and Branch to 2.

Instead of mapping a case to a number, we can map a case to a finite set of indices, which are used to index into the corresponding predecessor. So … how to abstract a set? With Type, of course.

So we achieve the following implementation:

data W : (A : Type) -> (A -> Type) -> Type where
  Sup : {A : Type} -> {B: A -> Type}
     -> (a : A) -> (f : B a -> W A B) -> W A B

And we adjust the implementation of Tree to:

wTreePreds : TreeData a -> Type
wTreePreds Leaf       = Void
wTreePreds (Branch _) = Bool

WTree : Type -> Type
WTree a = W (TreeData a) wTreePreds

We want to say, the Leaf case has no predecessors, which is equivalent to there’s no index to the predecessor of Leaf, so we use a “index set” Void with no element (value). Likewise, there are two indices to the predecessors of Branch, and Bool is a type with exactly two values, which serves our purpose.

So we can define our constructor of WLeaf and WBranch.

WLeaf : WTree a
WLeaf = Sup Leaf void -- void : Void -> a

WBranch : a -> WTree a -> WTree a -> WTree a
WBranch a l r = Sup (Branch a) branchF
  where branchF True  = l
        branchF False = r

An interesting detail is, if we are constructing a value with no predecessor, what should the indexing function be? Observing that the index set of a value with no predecessor is Void, the indexing function is Void -> W A B, which unifies with void : Void -> a. In other words, the indexing function of a case with no predecessor is trivial.

The implementation of WBranch is sort of as expected, as we are providing two predecessors of branch, we also have two indices to access them, True => l and False => r.

Induction over W

W type doesn’t explicitly distinguish base case from inductive case (or it considers base case as a special case of inductive case).

The inductive step should accept a proof that the property holds for all its predecessors, how do we describe this? Well, recall that some predecessor is represented by applying the indexing function with some index, to range over all predecessors, we range over all indices.

So the induction of W type looks like this:

wRec : {A : Type} -> {B : A -> Type} -> {P : W A B -> Type}
    -> ((a : A) -- the data of W
      -- a proof of: for all indices, P (f b) holds
      -> (f : B a -> W A B) -> ((b : B a) -> P (f b))
      -> P (Sup a f))
    -> (w : W A B) -> P w

Again, for the base case, B a is expected to be Void (or any uninhabited type), so the proof of P holds for all predecessors is (b : Void) -> P (f b) which is again trivial. So the type of whole inductive step is (a : A) -> (Void -> ..) -> (Void -> ..) -> P (Sup a f) which is essentially equivalent to (a : A) -> P (Sup a f).

The implementation of wRec is straightforward by following the types.

wRec ind (Sup a f) = ind a f $ \b => wRec ind (f b)

And we can calculate the size of the tree with the recursor, since the result type doesn’t depends on the WTree, so P is just const Nat.

treeSize : WTree a -> Nat
treeSize = wRec {P = const Nat} sizeRec
  where sizeRec  Leaf      _ _     = 0
        sizeRec (Branch a) _ preds = 1 + preds True + preds False

-- test it in REPL
> treeSize (WBranch 1 (WBranch 2 (WBranch 3 WLeaf WLeaf) WLeaf) (WBranch 4 WLeaf WLeaf))
4 : Nat

Epilogue

This post is just a learning note of mine and it’s not particularly precise theory-wise and leans more on my thoughts during my learning.

There are still a lot of related topics I havn’t learned in detail just yet therefore not discussed in this post, including but not restricted to:

  • Equality reasoning of W type
  • Coinduction and M type
  • Other approaches to generalizing inductive types?
  • Relation to well-found induction(?)