AST是用来表示语法构造的数据结构,而在大多数语言中都有“变量”的概念。
- 那么应在AST中用什么方式表示一个“变量”呢?
- 怎么进行变量的替换呢?
- 怎么进行变量作用域的检查呢?
First-Order Abstract Syntax
最简单直接的方法就是直接用字符串保存变量名,以lambda calculus为例:
type Name = String
data Expr
= Var Name
| App Expr Expr
| Lam Name Expr
这种AST被称为FOAS,一般是parse完得到的。但这种裸的AST几乎不包含任何信息,变量和作用域之间没有直接的关系,bindings也只是由匹配字符串来表示。
在这个层面上对AST操作是十分不安全的,稍不注意就可能改变了语义。比如说不带更名的substitution:(λy.λx.y)x
→λx.x
。
若要写出正确的substitution还得花点小心思,这是wiki上lambda calculus的substitution的定义(少了一种情况):
x[x := N] ≡ N
y[x := N] ≡ y, if x ≠ y
(M1 M2)[x := N] ≡ (M1[x := N]) (M2[x := N])
(λx.M)[x := N] ≡ λx.M
(λy.M)[x := N] ≡ λy.(M[x := N]), if x ≠ y, provided y ∉ FV(N)
“翻译到”haskell:
freeVars :: Expr -> [Name]
freeVars (Var v) = [v]
freeVars (App f x) = freeVars f `union` freeVars x
freeVars (Lam n b) = freeVars b \\ [n]
allVars :: Expr -> [Name]
allVars (Var v) = [v]
allVars (App f x) = allVars f `union` allVars x
allVars (Lam n b) = allVars b
subst :: Name -> Expr -> Expr -> Expr
subst x s b = sub vs0 b where
sub _ e@(Var v)
| v == x = s
| otherwise = e
sub vs e@(Lam v e')
| v == x = e
| v `elem` fvs = Lam v' (sub (v':vs) e'')
| otherwise = Lam v (sub vs e') where
v' = newName vs
e'' = subst v (Var v') e'
sub vs (App f x) = sub vs f `App` sub vs x
fvs = freeVars s
vs0 = fvs `union` allVars b
newName :: [Name] -> Name
newName vs = head (names \\ vs)
names :: [Name]
names = ['x' : show j | j <- [1..]]
同时,这种替换效率很低。
我认为AST里面带上scope的信息是很必要的(本身字符串就表达不了变量名“指代”这样的语义),或许可以为之后的分析和优化提供更多的信息,主要是防止AST变换导致语义发生改变。
High-Other Abstract Syntax
(typo: Higher-Order Abstract Syntax)
HOAS一种十分简洁的带有scope信息的AST:
data Expr
= Lam (Expr -> Expr)
| App Expr Expr
它直接将meta language的binding借了过去:
s = Lam (\f -> Lam (\g -> Lam (\x -> App (App f x) (App g x))))
k = Lam (\x -> Lam (\y -> x))
i = Lam (\x -> x)
omega = let self = Lam (\x -> App x x) in App self self
zero = Lam (\f -> Lam (\x -> x))
suc = Lam (\n -> Lam (\s -> Lam (\z -> App s (App (App n s) z))))
甚至将求值器也借了过来。substitution直接就是apply。
whnf :: Expr -> Expr
whnf (App f x) = case whnf f of
Lam b -> whnf (b x)
f' -> App f' x
whnf e = e
nf :: Expr -> Expr
nf (Lam f) = Lam (nf . f)
nf (App f x) = case whnf f of
Lam b -> nf (b x)
f' -> nf f' `App` nf x
nf v = v
而且HOAS的这种substitution特别的快。
不过HOAS的缺点也是十分明显的,我们不能“看进”一颗HOAS里,于是就不能对其做很多操作,比如pretty print、优化等。当然现在有一个比较简单的方法,为HOAS开个“洞”,通过这个“洞”看到里面的东西:
data Expr a
= Hole a -- store some meta objects
| Lam (Expr a -> Expr a)
| App (Expr a) (Expr a)
现在就可以写一个pretty print了:
pretty :: Expr String -> String
pretty = go 0 where
go ident (Hole var) = var
go ident (Lam f) =
let var = 'x':show ident in
"(lam " ++ var ++ ". " ++ go (succ ident) (f (Hole var)) ++ ")"
go ident (App f x) = "(" ++ go ident f ++ " " ++ go ident x ++ ")"
但就算是这样,HOAS的性质还是极差的(顺便还有PHOAS这样的改版,连evaluator都写不顺):
- 无法实现Functor, Travesable, Foldable等typeclass,更不要说Monad了
- 在有dt的语言里,HOAS过不了strictly positive check(Agda, Coq),也过不了total check。
所以一般只把HOAS用临时要求值的情况下,比如dt的type checker里可以用到,HOAS可以拿到normal form。
De Bruijn Indices
在LC(lambda calculus)中,这是一种用索引来指代变量的表示法:
代表一个变量的索引 为 距离 该变量所定义的 作用域之间 的距离(之间嵌套了多少个作用域),比如:
λx. x
就是λ 1
λx. λy. λz. x z (y z)
就是λ λ λ 3 1 (2 1)
这种表示法不仅干掉了α-conversion,还干掉了shadow和capture,处理起来会方便不少(疑似)。当然,这种表示法不单单可以用在LC上,在所有有“变量”/“作用域”的语言中都可以用这样一种方式来表示变量。
(其实用索引代表变量是一个很常见的操作)(对了 除了dbi,还有co-dbi)
将DBI应用在AST上,我们将变量区分为“绑定”/“自由”变量:
data Expr
= FV String
| BV !Int
| Lam Expr
| App Expr Expr
稍微抽象一下:
newtype Scope f a = Scope (f a) deriving (Eq,Show,Read,Functor,Foldable,Traversable)
-- 表示有一个Bound Var的作用域
data Expr a
= FV a -- 自由变量不一定用String表示
| BV !Int
| App (Exp a) (Exp a)
| Lam (Scope Exp a)
deriving (Eq,Show,Read,Functor,Foldable,Traversable)
顺便定义两个组合子,一个用于将自由变量提出,一个用于替换:
abstract :: Eq a => a -> Expr a -> Scope Expr a
abstract var expr = Scope (go 0 expr) where
go dbi (FV fv) | fv == var = BV dbi
| otherwise = FV fv
go dbi (BV bv) = BV bv
go dbi (App f x) = go dbi f `App` go dbi x
go dbi (Lam (Scope body)) = Lam $ Scope $ go (succ dbi) body
instantiate :: Expr a -> Scope Expr a -> Expr a
instantiate val (Scope body) = go 0 body where
go dbi (BV bv) | dbi == bv = val
| otherwise = BV bv
go dbi (FV fv) = FV fv
go dbi (App f x) = go dbi f `App` go dbi x
go dbi (Lam (Scope body)) = Lam $ Scope $ go (succ dbi) body
比如可以用来定义一些smart constructor和evaluator:
var :: a -> Expr a
var = FV
infixl 9 @@
(@@) :: Expr a -> Expr a -> Expr a
(@@) = App
lam :: Eq a => a -> Expr a -> Expr a
lam var = Lam . abstract var
whnf :: Expr a -> Expr a
whnf (App f a) = case whnf f of
Lam b -> whnf (instantiate a b)
f' -> App f' a
whnf e = e
nf :: Expr a -> Expr a
nf fv@(FV _) = fv
nf bv@(BV _) = bv
nf (Lam b) = Lam $ Scope $ nf $ unScope b
nf (App f x) = case whnf f of
Lam b -> eval (instantiate x b)
f' -> App (nf f') (nf x)
现在这个AST可以看做一个放着FV的容器,并实现了Functor,Foldable,Traversable(甚至可以实现Monad),可以通过一些通用的函数对AST进行操作,比如说判定一颗语法树对应的项是否是闭项:
isClosed :: Expr a -> Bool
isClosed = all (const False)
isClosed (lam "x" (var "x")) -- True
isClosed (lam "x" (var "y")) -- False
DBI是一个很好的让变量带上作用域信息的方案,但是就上面定义的AST的定义来说还不够安全,还无法禁止构造像Lam (BV 2)
这样的不合法项。
DBI in dependent type
如果在有dependent type的语言下,上面的问题就很好解决了,只需要限制DBI在每个项的大小就好:
Index : Nat -> Type
Index = Fin
-- 新的作用域 变量的索引可以加一
Scope : Nat -> (Nat -> Type -> Type) -> Type -> Type
Scope n f a = f (S n) a
-- 是否可以实现 Monad (Scope n f)?
infixl 9 :@
data Expr : Nat -> Type -> Type where
FV : a -> Expr n a
-- Bound Variable, 限制index小于n
BV : Index n -> Expr n a
Lam : Scope n Expr a -> Expr n a
(:@) : Expr n a -> Expr n a -> Expr n a
-- 闭项
Closed : Type
Closed = Expr 0 Void
-- demo
demo1 : Expr 0 String
demo1 = Lam $ Lam $ BV 1 :@ BV 0 :@ FV "z"
demo2 : Closed
demo2 = Lam (BV 0)
demo3 : Expr 1 Void
demo3 = BV 0
加上那两个重要的组合子,但这里需要点证明:
weakenDBI : Expr n a -> Expr (S n) a
weakenDBI (FV fv) = FV fv
weakenDBI (BV dbi) = BV (weaken dbi)
weakenDBI (Lam body) = Lam (weakenDBI body)
weakenDBI (f :@ x) = weakenDBI f :@ weakenDBI x
abstract : Eq a => a -> Expr n a -> Scope n Expr a
abstract var expr = go 0 expr where
go : Fin (S n) -> Expr n a -> Expr (S n) a
go dbi (FV fv) with (fv == var)
| True = BV dbi
| False = FV fv
go dbi (BV bv) = BV (weaken bv)
go dbi (f :@ x) = go dbi f :@ go dbi x
go dbi (Lam body) = Lam (go (FS dbi) body)
instantiate : Expr n a -> Scope n Expr a -> Expr n a
instantiate val (BV bv) with (last `decEq` bv)
| (Yes _) = val
| (No cf) = BV (believe_me bv)
-- 抱歉,我偷懒了
instantiate val (FV fv) = FV fv
instantiate val (f :@ x) = instantiate val f :@ instantiate val x
instantiate val (Lam body) = Lam (instantiate (weakenDBI val) body)
这是一种简单直接的方法,非要说缺点的话,就是有dt的语言实在是太少了。不过用上了dt,限制也会很大,这里就很难进一步抽象,abstract和instantiate就只能依赖于Expr了。
一个类型index上一个值之后,就很难实现其它的typeclass了。
DBI as a nested datatype
有些dt是可以编码进数据结构里的:
data Vec : Nat -> Type -> Type where
Nil : Vec Z a
(::) : a -> Vec n a -> Vec (S n) a
可以编码为:
newtype S a = S a deriving(Show, Eq, Functor, Foldable, Traversable)
infixr :-
data Vec a where
Nil :: Vec a
(:-) :: a -> Vec a -> Vec (S a)
-- 长度为3的vec
vec :: Vec (S (S (S Int)))
vec = (S . S) 1 :- S 2 :- 3 :- Nil
当然,下面这种要麻烦得多,且一些关于长度的性质会很难表达(要上TypeFamilies,DataKinds等等扩展)。但是用于限制长度来说,这种足够了。
可以受此启发,dbi也可以编码到结构中:
-- Nothing => 0
-- Just Nothing => 1
-- Just (Just Nothing) => 2
-- Just (Just (Just a)) => Free var
-- :: Maybe (Maybe (Maybe a)) -- < 3
data Expr a
= Var a
| Lam (Expr (Maybe a))
| Expr a :@ Expr a
deriving(Eq, Show, Read, Functor, Foldable, Traversable)
infixl 9 :@
-- λx.λy.x
-- Lam (Lam (BV 1))
-- Lam (Lam (Var (Just Nothing)))
-- λx.λy.x y z
-- Lam (Lam (BV 1 @@ BV 0 @@ FV "z"))
-- Lam (Lam (Var (Just Nothing) @@ Var Nothing @@ Var (Just (Just "z"))))
-- Lam _
-- hole 的类型是 Expr (Maybe a),能填Var Nothing,也就是BV 0
-- Lam (Lam _)
-- hole 的类型是 Expr (Maybe (Maybe a))
-- 能填Var (Just Nothing),也就是BV 1
-- 或者填 Var Nothing,也就是BV 0
(注:forall a. Expr a
或者Expr Void
是闭项)
像之前那样稍作抽象,将Scope拿出来:
newtype Scope f a = Scope { unScope :: f (Maybe a) }
deriving (Functor, Foldable, Traversable)
instance Monad f => Applicative (Scope f) where
pure = Scope . return . Just
(<*>) = ap
instance Monad f => Monad (Scope f) where
return = pure
Scope m >>= f = Scope $ m >>= maybe (return Nothing) (unScope . f)
instance MonadTrans Scope where
lift = Scope . liftM Just
现在,那两个组合子也只需要依赖于Scope就可以了:
abstract :: (Functor f, Eq a) => a -> f a -> Scope f a
abstract x = Scope . fmap go where
go y = y <$ guard (x /= y)
instantiate :: Monad f => f a -> Scope f a -> f a
instantiate x (Scope xs) = xs >>= go where
go Nothing = x
go (Just y) = return y
那么AST就可以这样定义了:
data Expr a
= Var a
| Expr a :@ Expr a
| Lam (Scope Expr a)
deriving (Functor, Foldable, Traversable)
infixl 9 :@
instance Applicative Expr where
pure = Var
(<*>) = ap
-- 这个monad可用来遍历Expr里的Free var
instance Monad Expr where
return = pure
Var a >>= f = f a
l :@ r >>= f = (l >>= f) :@ (r >>= f)
Lam body >>= f = Lam (body >>= lift . f)
-- smart constructor
lam :: Eq a => a -> Expr a -> Expr a
lam v = Lam . abstract v
var :: a -> Expr a
var = Var
就很inductive,很abstract,很safe,很nice。(来自于paper)
到目前为止,我们就得到了一个比较完整的在AST中表示“变量”/“作用域”的方案(并不)。haskell的bound包就有很完整的实现。
Scope Check
我们当然可以这样scope check:
type Name = String
data Expr
= Var Name
| App Expr Expr
| Lam Name Expr
isClosed :: Expr -> Bool
isClosed = go [] where
go vs (Var v) = v `elem` vs
go vs (Lam v b) = go (v:vs) b
go vs (App f x) = go vs f && go vs x
但这样操作稍微low了一点。如果能将裸的AST转换为上面介绍的几种带Scope的AST,相当于“证明了”原表达式是个闭项,也就是过了scope check:
data Expr
= VarE String
| AppE Expr Expr
| LamE String Expr
deriving(Show, Eq)
data Term a
= VarT a
| AppT (Term a) (Term a)
| LamT (Scope Term a)
deriving (Functor, Foldable, Traversable)
-- 为原AST附加上scope信息,也相当于证明了原表达式是闭项
fromExpr :: Expr -> Maybe (Term a)
fromExpr = go HM.empty where
go :: HM.HashMap String a -> Expr -> Maybe (Term a)
go m (VarE v) = VarT <$> HM.lookup v m
go m (LamE v b) = LamT <$> Scope <$> go (HM.insert v Nothing $ fmap Just m) b
go m (AppE f x) = AppT <$> go m f <*> go m x
我们还可以进一步从dbi的AST的到HOAS。因为HOAS和dbi带有的信息是一致的,所以它们之间还能互转。这里留作习题。
当一个AST过了scope check之后,就可以着手type check了,接下来的事情也是十分有趣的,先给自己挖个坑,再写一篇AST with Type。
有一个很有意思的项目,讲的是dt的type check和elaboration,有兴趣可以去看一下。
更新:
(冰带逛)
data In : (x : a) -> (ctx : List a) -> Type where
-- -----------
-- Ctx, x |- x
Z : x `In` (x :: ctx)
-- Ctx |- x
-- -----------
-- Ctx, y |- x
S : x `In` ctx -> x `In` (y :: ctx)
data Term : (ctx : List a) -> Type where
-- x ∈ Ctx
-- --------
-- Ctx |- x
Var : x `In` ctx -> Term ctx
-- Ctx, x |- M
-- ------------
-- Ctx |- λx. M
Lam : (m : Term (x :: ctx)) -> Term ctx
-- Ctx |- M Ctx |- N
-- ------------------
-- Ctx |- M N
App : (m : Term ctx) -> (n : Term ctx) -> Term ctx
demo1 : Expr ["x", "y"]
demo1 = App (Var Z) (Var (S Z))
demo2 : Expr ["y"]
demo2 = Lam {x = "x"} (Var Z {ctx = ["x", "y"]})
nothingInEmpty : In x [] -> Void
nothingInEmpty Z impossible
nothingInEmpty (S _) impossible
isIn : DecEq a => (x : a) -> (xs : List a) -> Dec (In x xs)
isIn x [] = No nothingInEmpty
isIn x (y :: ctx) with (decEq x y)
isIn x (x :: ctx) | (Yes Refl) = Yes Z
isIn x (y :: ctx) | (No xneqy) with (isIn x ctx)
isIn x (y :: ctx) | (No xneqy) | (Yes xinctx) = Yes (S xinctx)
isIn x (y :: ctx) | (No xneqy) | (No xninctx) = No (lemma xneqy xninctx) where
lemma : {x, y : a} -> {ctx : List a} -> Not (x = y) -> Not (In x ctx) -> Not (In x (y :: ctx))
lemma xneqy xninctx Z = xneqy Refl
lemma xneqy xninctx (S xinctx) = xninctx xinctx
data Expr
= VarE String
| AppE Expr Expr
| LamE String Expr
-- fromExpr : Expr -> (fvs ** Term fvs) 我又偷懒了呢
fromExpr : Expr -> Maybe (Term [] {a = String})
fromExpr = go [] where
go : (ctx : List String) -> Expr -> Maybe (Term ctx)
go ctx (VarE v) with (v `isIn` ctx)
go ctx (VarE v) | (Yes vinctx) = Just (Var vinctx)
go ctx (VarE v) | (No vninctx) = Nothing
go ctx (LamE v b) = Lam <$> go (v::ctx) b
go ctx (AppE f x) = App <$> go ctx f <*> go ctx x
也就是冰冰在评论提到https://bentnib.org/binding-universe-journal.pdf
(事实上我也就马虎写一写)