> {-# LANGUAGE ExistentialQuantification, GADTs, KindSignatures #-} > module Week1403_2 where > import Week1403_1 > import Data.Char * Implementing arrows Arrows may carry static information. That's an advantage above monads. This static information can be exploited in an arrow for parsing, where the static information might be the lookahead set of a parser, which enables efficient implementation of alternatives. Furthermore, an implementation of arrows may exploit static information to optimize, for instance, by applying arrow laws on the fly. As an example, consider the arrow law: | arr f >>> arr g = arr (f >>> g) To implement that law, we might represent arrows constructed with arr such that composition recognizes it and can take advantage. > data Opt arrow a b = Arr (a -> b) > | Lift (arrow a b) Running such an optimized arrow means to interpret the constructors. > runOpt (Arr f) = arr f > runOpt (Lift f) = f But now we can implement the Arrow operations in an optimizing way. > instance Arrow0 arrow => Arrow0 (Opt arrow) where > arr = Arr > Arr f >>> Arr g = Arr (f >>> g) > f >>> g = Lift (runOpt f >>> runOpt g) > instance Arrow arrow => Arrow (Opt arrow) where > first (Arr f) = Arr (first f) > first (Lift f) = Lift (first f) | Arr f >>> (Arr g >>> Lift h) But this optimizer has a significant shortcoming. Its optimization depends on the way in which arrows are composed. In particular, associativity matters. For example, (Arr f >>> Arr g) >>> Lift h simplifies to Arr (f >>> g) >>> Lift h but Arr f >>> (Arr g >>> Lift h) becomes Arr f >>> Lift (arr g >>> h) which misses the optimization. This problem may be overcome by a more elaborate representation that recognizes compositions consisting of two static arrows and a lifted one: > data Opt' arrow a b = Arr' (a -> b) > | forall c d. ALA (a -> c) (arrow c d) (d -> b) | ALA f g h === arr f >>> g >>> arr h This data definition uses an interesting twist ... > instance Arrow0 arrow => Arrow0 (Opt' arrow) where > arr = Arr' > Arr' f >>> Arr' g = Arr' (f >>> g) > Arr' f >>> ALA gl h gr = ALA (f >>> gl) h gr > ALA fl h fr >>> Arr' g = ALA fl h (fr >>> g) > ALA fl h1 fr >>> ALA gl h2 gr = ALA fl (h1 >>> arr (fr >>> gl) >>> h2) gr > instance Arrow arrow => Arrow (Opt' arrow) where > first (Arr' f) = Arr' (first f) > first (ALA fl h fr) = ALA (first fl) (first h) (first fr) The good thing about this definition is that composition in the underlying arrow is only used when composing ALA with ALA in the expression | (h1 >>> arr (fr >>> gl) >>> h2) The bad thing is that it introduces a lot of fluff with each nontrivial arrow, because lifting it into the Opt' arrow type requires combining it with two identity functions. These functions give rise to extra compositions that could be elided. These extra compositions may be removed in at least two different ways. 1. Three further constructors could be added to the Opt' type representing the expressions Lift f Lift f >>> arr g arr g >>> lift f Then the composition operator for Opt' could be further specialized to deal with these constructors. 2. Instead of using the function type directly, specific representations could be introduced for select functions. For example: > data Fun a b = Fun (a -> b) | Id | Const b The datatype Fun would represent general function, identity functions, and constant function is a recognizable way. Composition of elements of the Fun datatype could take advantage of the laws for composing identity functions and constant functions. There is, however, a catch with this definition of the Fun datatype. The type of the Id constructor is insufficiently constrained, so that `Id :: Fun Bool Int' is accepted. Unfortunately, this typing deficiency stops us from defining a typesafe composition function and an interpreting function like runOpt for the Fun type. Here is an attempt: | composeFun :: Fun a b -> Fun b c -> Fun a c | composeFun Id (Fun h) = Fun h This line implements that Id is a left identity, but it is not accepted! To see why, consider the following typings Id :: Fun Bool Int h :: Int -> Int Fun h :: Fun Int Int composeFun Id (Fun h) :: Fun Bool Int Clearly, given just h :: Int -> Int it is impossible to produce a function of type Bool -> Int. The solution is to give the Id constructor a more restricted typing. This gives rise to generalized algebraic datatypes (GADTs). See next lecture. Closing remark: Opt is an example of an arrow transformer analogous to a monad transformer. A type class may be defined capturing this concept... * Generalized Algebraic Datatypes (GADTs) Here is a version of the Fun datatype that enforces the restriction that Id has type Fun a a. Its syntax generalizes the data definition by allowing the specification of a full type signature for each constructor. The only restriction on this signature is that the return type of the constructor must be an instance of the declared data type. This syntax is only enabled with the language extension GADTS (see first line of file). > data Fun' a b where > Fun' :: (a -> b) -> Fun' a b > Id' :: Fun' a a > Const' :: b -> Fun' a b Pattern matching against a constructor of a GADT reifies the constraint on the return type. In particular, pattern matching on the Id' constructor introduces the constraint a ~ b for the right side of the Id' equation. Hence, its type must be (a -> a). The remaining constructors do not require the GADT extension. > runFun' :: Fun' a b -> (a -> b) > runFun' (Fun' f) = f > runFun' Id' = id > runFun' (Const' b) = const b Using the GADT also enables writing the typesafe (symbolic) composition function. > composeFun :: Fun' a b -> Fun' b c -> Fun' a c > composeFun (Fun' f) (Fun' g) = Fun' (g . f) > composeFun Id' h = h > composeFun f Id' = f > composeFun (Const' b) (Fun' f) = Const' (f b) > composeFun f (Const' c) = Const' c Original motivation for GADTs: Efficient implementation of typed interpreters Consider an interpreter with two base types, Int and Bool. The traditional approach requires a value representation for the return type of the interpreter. First, we consider a very simple language with just integers and booleans. > data Value = Int Int | Bool Bool > deriving Show > data Exp0 = Num0 Int | Add0 Exp0 Exp0 | Truth0 Bool | If0 Exp0 Exp0 Exp0 > eval0 :: Exp0 -> Value > eval0 (Num0 i) = Int i > eval0 (Add0 e1 e2) = case eval0 e1 of > Int i1 -> case eval0 e2 of > Int i2 -> Int (i1+i2) > eval0 (Truth0 t) = Bool t > eval0 (If0 e1 e2 e3) = case eval0 e1 of > Bool b1 -> if b1 then (eval0 e2) else (eval0 e3) This interpreter crashes when we try to run type incorrect programs like the following examples. | eval0 (Add0 (Num0 42) (Truth0 False)) | eval0 (If0 (Num0 21) (Num 0) (Num 1)) With a GADT we can internalize the notion of a typed expression. This way, a type-incorrect expression is already rejected by Haskell. > data Exp a where > Num :: Int -> Exp Int > Add :: Exp Int -> Exp Int -> Exp Int > Truth :: Bool -> Exp Bool > If :: Exp Bool -> Exp a -> Exp a -> Exp a The above examples for type incorrect programs are rejected outright. Furthermore, a so-called "tag-free" version of the interpreter can be written that elides the Value datatype. > eval :: Exp a -> a > eval (Num i) = i > eval (Add e1 e2) = eval e1 + eval e2 > eval (Truth b) = b > eval (If e1 e2 e3) = if eval e1 then eval e2 else eval e3 So far, we have punted around the function datatype because this approach is not easily generalizable to functions: | data FExp a where | Var :: FExp a | Lam :: FExp b -> FExp (a -> b) | App :: FExp (a -> b) -> FExp a -> FExp b This definition does not work because it does not provide sufficient (type) information for the Lam and Var constructors. Where does the type a in the Var constructor come from? What matches its type to, say, a surrounding lambda abstraction? How does a Lam impose the type a on the abstracted variable? Intermezzo: the App constructor uses a feature called "existential dataypes". Its presence can be detected by observing that the type variable a occurs in the arguments of the constructor, but not in its result type. Such a type variable is "existentially bound" and it can be used to implement data abstraction: > data Package where > Package :: (a -> Int) -> a -> Package The type variable is existentially quantified because it does not occur in the return type of the constructor. We may construct packages that instantiate a differently, but they all have the same type Package (and may be put together in a list, for instance). > package1 = Package (+1) 41 > package2 = Package length [1,2,3] > package3 = Package ord 'C' Unpackaging via pattern matching guarantees that f is a function applicable to a and returning an Int, but nothing more is known about the type of a. This type is not compatible (equal) to any other type. > unpack (Package f a) = f a Back to GADTs: To generalize to functions, we need to use a more refined type for expressions. It has to keep track of the result type and of the types of its free variables. > data FExp :: * -> * -> * where > App :: FExp e (a -> b) -> FExp e a -> FExp e b > Lam :: FExp (a, e) b -> FExp e (a -> b) > Var :: Nat e a -> FExp e a > data Nat e a where > Zero :: Nat (a, b) a > Succ :: Nat e a -> Nat (b, e) a