> {-# LANGUAGE GADTs, KindSignatures #-} > module Week1403_2 where > import Week1403_1 > import Data.Char * Generalized Algebraic Datatypes (GADTs) Here is a version of the Fun datatype that enforces the restriction that Id has type Fun a a. Its syntax generalizes the data definition by allowing the specification of a full type signature for each constructor. The only restriction on this signature is that the return type of the constructor must be an instance of the declared data type. This syntax is only enabled with the language extension GADTS (see first line of file). > data Fun' a b where > Fun' :: (a -> b) -> Fun' a b > Id' :: Fun' a a > Const' :: b -> Fun' a b Pattern matching against a constructor of a GADT reifies the constraint on the return type. In particular, pattern matching on the Id' constructor introduces the constraint a ~ b for the right side of the Id' equation. Hence, its type must be (a -> a). The remaining constructors do not require the GADT extension. > runFun' :: Fun' a b -> (a -> b) > runFun' (Fun' f) = f > runFun' Id' = id > runFun' (Const' b) = const b Using the GADT also enables writing the typesafe (symbolic) composition function. > composeFun :: Fun' a b -> Fun' b c -> Fun' a c > composeFun (Fun' f) (Fun' g) = Fun' (g . f) > composeFun Id' h = h > composeFun f Id' = f > composeFun (Const' b) (Fun' f) = Const' (f b) > composeFun f (Const' c) = Const' c Original motivation for GADTs: Efficient implementation of typed interpreters Consider an interpreter with two base types, Int and Bool. The traditional approach requires a value representation for the return type of the interpreter. First, we consider a very simple language with just integers and booleans. > data Value = Int Int | Bool Bool > deriving Show > data Exp0 = Num0 Int | Add0 Exp0 Exp0 | Truth0 Bool | If0 Exp0 Exp0 Exp0 > eval0 :: Exp0 -> Value > eval0 (Num0 i) = Int i > eval0 (Add0 e1 e2) = case eval0 e1 of > Int i1 -> case eval0 e2 of > Int i2 -> Int (i1+i2) > eval0 (Truth0 t) = Bool t > eval0 (If0 e1 e2 e3) = case eval0 e1 of > Bool b1 -> if b1 then (eval0 e2) else (eval0 e3) This interpreter crashes when we try to run type incorrect programs like the following examples. | eval0 (Add0 (Num0 42) (Truth0 False)) | eval0 (If0 (Num0 21) (Num 0) (Num 1)) With a GADT we can internalize the notion of a typed expression. This way, a type-incorrect expression is already rejected by Haskell. > data Exp a where > Num :: Int -> Exp Int > Add :: Exp Int -> Exp Int -> Exp Int > Truth :: Bool -> Exp Bool > If :: Exp Bool -> Exp a -> Exp a -> Exp a The above examples for type incorrect programs are rejected outright. Furthermore, a so-called "tag-free" version of the interpreter can be written that elides the Value datatype. > eval :: Exp a -> a > eval (Num i) = i > eval (Add e1 e2) = eval e1 + eval e2 > eval (Truth b) = b > eval (If e1 e2 e3) = if eval e1 then eval e2 else eval e3 So far, we have punted around the function datatype because this approach is not easily generalizable to functions: | data FExp a where | Var :: FExp a | Lam :: FExp b -> FExp (a -> b) | App :: FExp (a -> b) -> FExp a -> FExp b This definition does not work because it does not provide sufficient (type) information for the Lam and Var constructors. Where does the type a in the Var constructor come from? What matches its type to, say, a surrounding lambda abstraction? How does a Lam impose the type a on the abstracted variable? Intermezzo: the App constructor uses a feature called "existential dataypes". Its presence can be detected by observing that the type variable a occurs in the arguments of the constructor, but not in its result type. Such a type variable is "existentially bound" and it can be used to implement data abstraction: > data Package where > Package :: (a -> Int) -> a -> Package The type variable is existentially quantified because it does not occur in the return type of the constructor. We may construct packages that instantiate a differently, but they all have the same type Package (and may be put together in a list, for instance). > package1 = Package (+1) 41 > package2 = Package length [1,2,3] > package3 = Package ord 'C' Unpackaging via pattern matching guarantees that f is a function applicable to a and returning an Int, but nothing more is known about the type of a. This type is not compatible (equal) to any other type. > unpack (Package f a) = f a Back to GADTs: To generalize to functions, we need to use a more refined type for expressions. It has to keep track of the result type and of the types of its free variables. > data FExp :: * -> * -> * where > App :: FExp e (a -> b) -> FExp e a -> FExp e b > Lam :: FExp (a, e) b -> FExp e (a -> b) > Var :: Nat e a -> FExp e a > data Nat e a where > Zero :: Nat (a, b) a > Succ :: Nat e a -> Nat (b, e) a ---------------------------------------------------------------------- This representation of lambda expressions is called de Bruijn representation. A lambda does not bind an explicit variable name. The number in a variable refers to number of enclosing lambdas that need to be skipped to find the binding lambda. That is, the variable 0 refers is bound by the next enclosing lambda. The variable 1 is bound by the second enclosing lambda, and so on. A program is a closed term, that is, a term without free variables. > type Program a = FExp () a > ex1 = Lam (Var Zero) -- \lambda x.x -- the I combinator > ex2 = Lam (Lam (Var (Succ Zero))) -- \x\y.x -- the K combinator > ex3 = Lam (Lam (App (Var (Succ Zero)) (Var Zero))) -- \f\x. f x > ex4 = Lam (Lam (Lam (App (App (Var (Succ (Succ Zero))) (Var Zero)) > (App (Var (Succ Zero)) (Var Zero))))) the S combinator S x y z = (x z) (y z) > lkup :: Nat e a -> e -> a > lkup Zero (a, b) = a > lkup (Succ p) (_, b) = lkup p b > lkup_pf :: Nat e a -> e -> a > lkup_pf Zero = fst > lkup_pf (Succ p) = lkup_pf p . snd > feval :: e -> FExp e a -> a > feval e (App f x) = (feval e f) (feval e x) > feval e (Lam b) = \x -> feval (x, e) b > feval e (Var p) = lkup p e > feval' :: FExp gamma a -> gamma -> a > feval' (App f x) = \gamma -> ((feval' f) gamma) ((feval' x) gamma) -- S (feval'f) (feval' x) gamma > feval' (Lam b) = \gamma -> \x -> feval' b (x, gamma) > feval' (Var p) = lkup p This interpretation corresponds to the interpretation of the simply typed lambda calculus in a CCC (cartesian closed category). Types T are represented by objects [[T]] in the CCC. The empty environment 0 is represented by the terminal object of the CCC. [[0]] = () The extended environment x:T, Gamma is represented by the product [[T]] x [[Gamma]]. A term of type Gamma |- e : T is represented by an arrow [[Gamma]] -> [[T]] A variable is mapped to a projection arrow. (cf. lkup_pf) [[Zero]] = proj_1 [[Succ v]] = [[v]] . proj_2 Function application is mapped as follows: [[App f x]] = eval . ([[f]] x [[x]]) . Delta id [[f]] : [[Gamma]] -> [[S->T]] = [[Gamma]] -> [[T]]^[[S]] [[x]] : [[Gamma]] -> [[S]] Delta : A -> A x A (diagonal) eval : B^A x A -> B Lambda abstraction is mapped thus: [[Lam b]] = iso . [[b]] [[b]] : [[S]] x [[Gamma]] -> [[T]] [[Lam b]] : [[Gamma]] -> [[S->T]] = [[Gamma]] -> [[T]]^[[S]] These two types are connected by a natural isomorphism iso in a CCC! S x G -> T = G x S -> T G -> T^S = G -> S -> T At top level: [[S]] x [[0]] -> [[T]] = [[S]] x () -> [[T]] = [[S]] -> [[T]]