AST是用来表示语法构造的数据结构,而在大多数语言中都有“变量”的概念。

  • 那么应在AST中用什么方式表示一个“变量”呢?
  • 怎么进行变量的替换呢?
  • 怎么进行变量作用域的检查呢?

First-Order Abstract Syntax

最简单直接的方法就是直接用字符串保存变量名,以lambda calculus为例:

type Name = String
data Expr
  = Var Name 
  | App Expr Expr
  | Lam Name Expr

这种AST被称为FOAS,一般是parse完得到的。但这种裸的AST几乎不包含任何信息,变量和作用域之间没有直接的关系,bindings也只是由匹配字符串来表示。

在这个层面上对AST操作是十分不安全的,稍不注意就可能改变了语义。比如说不带更名的substitution:(λy.λx.y)xλx.x

若要写出正确的substitution还得花点小心思,这是wiki上lambda calculus的substitution的定义(少了一种情况):

x[x := N]       ≡ N
y[x := N]       ≡ y, if x ≠ y
(M1 M2)[x := N] ≡ (M1[x := N]) (M2[x := N])
(λx.M)[x := N]  ≡ λx.M
(λy.M)[x := N]  ≡ λy.(M[x := N]), if x ≠ y, provided y ∉ FV(N)

“翻译到”haskell:

freeVars :: Expr -> [Name]
freeVars (Var v) = [v]
freeVars (App f x) = freeVars f `union` freeVars x 
freeVars (Lam n b) = freeVars b \\ [n]

allVars :: Expr -> [Name]
allVars (Var v) = [v]
allVars (App f x) = allVars f `union` allVars x
allVars (Lam n b) = allVars b

subst :: Name -> Expr -> Expr -> Expr
subst x s b = sub vs0 b where
  sub _ e@(Var v)
    | v == x    = s
    | otherwise = e
  sub vs e@(Lam v e')
    | v == x       = e
    | v `elem` fvs = Lam v' (sub (v':vs) e'')
    | otherwise    = Lam v (sub vs e') where
    v' = newName vs
    e'' = subst v (Var v') e'
  sub vs (App f x) = sub vs f `App` sub vs x

  fvs = freeVars s
  vs0 = fvs `union` allVars b 

newName :: [Name] -> Name
newName vs = head (names \\ vs)

names :: [Name]
names = ['x' : show j | j <- [1..]]

同时,这种替换效率很低。

我认为AST里面带上scope的信息是很必要的(本身字符串就表达不了变量名“指代”这样的语义),或许可以为之后的分析和优化提供更多的信息,主要是防止AST变换导致语义发生改变。

High-Other Abstract Syntax

(typo: Higher-Order Abstract Syntax)

HOAS一种十分简洁的带有scope信息的AST:

data Expr 
  = Lam (Expr -> Expr)
  | App Expr Expr

它直接将meta language的binding借了过去:

s     = Lam (\f -> Lam (\g -> Lam (\x -> App (App f x) (App g x))))
k     = Lam (\x -> Lam (\y -> x))
i     = Lam (\x -> x)
omega = let self = Lam (\x -> App x x) in App self self
zero  = Lam (\f -> Lam (\x -> x))
suc   = Lam (\n -> Lam (\s -> Lam (\z -> App s (App (App n s) z))))

甚至将求值器也借了过来。substitution直接就是apply。

whnf :: Expr -> Expr
whnf (App f x) = case whnf f of
  Lam b -> whnf (b x)
  f'    -> App f' x
whnf e = e

nf :: Expr -> Expr
nf (Lam f) = Lam (nf . f)
nf (App f x) = case whnf f of
  Lam b -> nf (b x)
  f' -> nf f' `App` nf x
nf v = v

而且HOAS的这种substitution特别的快。

不过HOAS的缺点也是十分明显的,我们不能“看进”一颗HOAS里,于是就不能对其做很多操作,比如pretty print、优化等。当然现在有一个比较简单的方法,为HOAS开个“洞”,通过这个“洞”看到里面的东西:

data Expr a
  = Hole a -- store some meta objects
  | Lam (Expr a -> Expr a)
  | App (Expr a) (Expr a)

现在就可以写一个pretty print了:

pretty :: Expr String -> String
pretty = go 0 where
  go ident (Hole var) = var
  go ident (Lam f)    = 
    let var = 'x':show ident in 
    "(lam " ++ var ++ ". " ++ go (succ ident) (f (Hole var)) ++ ")"
  go ident (App f x)  = "(" ++ go ident f ++ " " ++  go ident x ++ ")"

但就算是这样,HOAS的性质还是极差的(顺便还有PHOAS这样的改版,连evaluator都写不顺):

  • 无法实现Functor, Travesable, Foldable等typeclass,更不要说Monad了
  • 在有dt的语言里,HOAS过不了strictly positive check(Agda, Coq),也过不了total check。

所以一般只把HOAS用临时要求值的情况下,比如dt的type checker里可以用到,HOAS可以拿到normal form

De Bruijn Indices

在LC(lambda calculus)中,这是一种用索引来指代变量的表示法:

λx. (λy. y (λz. z)) (λy. x y)

代表一个变量的索引 为 距离 该变量所定义的 作用域之间 的距离(之间嵌套了多少个作用域),比如:

  • λx. x就是λ 1
  • λx. λy. λz. x z (y z)就是λ λ λ 3 1 (2 1)

这种表示法不仅干掉了α-conversion,还干掉了shadow和capture,处理起来会方便不少(疑似)。当然,这种表示法不单单可以用在LC上,在所有有“变量”/“作用域”的语言中都可以用这样一种方式来表示变量。

(其实用索引代表变量是一个很常见的操作)(对了 除了dbi,还有co-dbi)

将DBI应用在AST上,我们将变量区分为“绑定”/“自由”变量:

data Expr
  = FV String
  | BV !Int
  | Lam Expr
  | App Expr Expr

稍微抽象一下:

newtype Scope f a = Scope (f a) deriving (Eq,Show,Read,Functor,Foldable,Traversable)
-- 表示有一个Bound Var的作用域
data Expr a 
  = FV a -- 自由变量不一定用String表示
  | BV !Int
  | App (Exp a) (Exp a)
  | Lam (Scope Exp a)
  deriving (Eq,Show,Read,Functor,Foldable,Traversable)

顺便定义两个组合子,一个用于将自由变量提出,一个用于替换:

abstract :: Eq a => a -> Expr a -> Scope Expr a
abstract var expr = Scope (go 0 expr) where
  go dbi (FV fv) | fv == var = BV dbi
                 | otherwise = FV fv
  go dbi (BV bv)             = BV bv
  go dbi (App f x)           = go dbi f `App` go dbi x
  go dbi (Lam (Scope body))  = Lam $ Scope $ go (succ dbi) body

instantiate :: Expr a -> Scope Expr a -> Expr a
instantiate val (Scope body) = go 0 body where
  go dbi (BV bv) | dbi == bv = val
                 | otherwise = BV bv
  go dbi (FV fv)             = FV fv
  go dbi (App f x)           = go dbi f `App` go dbi x
  go dbi (Lam (Scope body))  = Lam $ Scope $ go (succ dbi) body

比如可以用来定义一些smart constructor和evaluator:

var :: a -> Expr a
var = FV

infixl 9 @@
(@@) :: Expr a -> Expr a -> Expr a
(@@) = App

lam :: Eq a => a -> Expr a -> Expr a
lam var = Lam . abstract var

whnf :: Expr a -> Expr a
whnf (App f a) = case whnf f of
  Lam b -> whnf (instantiate a b)
  f'    -> App f' a
whnf e = e

nf :: Expr a -> Expr a
nf fv@(FV _)   = fv
nf bv@(BV _)   = bv
nf (Lam b) = Lam $ Scope $ nf $ unScope b
nf (App f x) = case whnf f of
  Lam b -> eval (instantiate x b)
  f' -> App (nf f') (nf x)

现在这个AST可以看做一个放着FV的容器,并实现了Functor,Foldable,Traversable(甚至可以实现Monad),可以通过一些通用的函数对AST进行操作,比如说判定一颗语法树对应的项是否是闭项:

isClosed :: Expr a -> Bool
isClosed = all (const False)

isClosed (lam "x" (var "x")) -- True
isClosed (lam "x" (var "y")) -- False

DBI是一个很好的让变量带上作用域信息的方案,但是就上面定义的AST的定义来说还不够安全,还无法禁止构造像Lam (BV 2)这样的不合法项。

DBI in dependent type

如果在有dependent type的语言下,上面的问题就很好解决了,只需要限制DBI在每个项的大小就好:

Index : Nat -> Type
Index = Fin

-- 新的作用域 变量的索引可以加一
Scope : Nat -> (Nat -> Type -> Type) -> Type -> Type
Scope n f a = f (S n) a
-- 是否可以实现 Monad (Scope n f)?

infixl 9 :@
data Expr : Nat -> Type -> Type where
  FV : a -> Expr n a
  -- Bound Variable, 限制index小于n
  BV : Index n -> Expr n a
  Lam : Scope n Expr a -> Expr n a
  (:@) : Expr n a -> Expr n a -> Expr n a

-- 闭项
Closed : Type
Closed = Expr 0 Void

-- demo
demo1 : Expr 0 String
demo1 = Lam $ Lam $ BV 1 :@ BV 0 :@ FV "z"

demo2 : Closed
demo2 = Lam (BV 0)

demo3 : Expr 1 Void
demo3 = BV 0

加上那两个重要的组合子,但这里需要点证明:

weakenDBI : Expr n a -> Expr (S n) a
weakenDBI (FV fv)     = FV fv
weakenDBI (BV dbi)    = BV (weaken dbi)
weakenDBI (Lam body)  = Lam (weakenDBI body)
weakenDBI (f :@ x)    = weakenDBI f :@ weakenDBI x

abstract : Eq a => a -> Expr n a -> Scope n Expr a
abstract var expr = go 0 expr where
  go : Fin (S n) -> Expr n a -> Expr (S n) a
  go dbi (FV fv) with (fv == var)
                  | True  = BV dbi
                  | False = FV fv
  go dbi (BV bv)            = BV (weaken bv)
  go dbi (f :@ x)           = go dbi f :@ go dbi x
  go dbi (Lam body)         = Lam (go (FS dbi) body)

instantiate : Expr n a -> Scope n Expr a -> Expr n a
instantiate val (BV bv) with (last `decEq` bv)
                        | (Yes _) = val
                        | (No cf) = BV (believe_me bv) 
                        -- 抱歉,我偷懒了
instantiate val (FV fv)           = FV fv
instantiate val (f :@ x)          = instantiate val f :@ instantiate val x
instantiate val (Lam body)        = Lam (instantiate (weakenDBI val) body)

这是一种简单直接的方法,非要说缺点的话,就是有dt的语言实在是太少了。不过用上了dt,限制也会很大,这里就很难进一步抽象,abstract和instantiate就只能依赖于Expr了。

一个类型index上一个值之后,就很难实现其它的typeclass了。

DBI as a nested datatype

有些dt是可以编码进数据结构里的:

data Vec : Nat -> Type -> Type where
  Nil  : Vec Z a
  (::) : a -> Vec n a -> Vec (S n) a

可以编码为:

newtype S a = S a deriving(Show, Eq, Functor, Foldable, Traversable)
infixr :-
data Vec a where 
  Nil  :: Vec a
  (:-) :: a -> Vec a -> Vec (S a)

-- 长度为3的vec
vec :: Vec (S (S (S Int)))
vec = (S . S) 1 :- S 2 :- 3 :- Nil

当然,下面这种要麻烦得多,且一些关于长度的性质会很难表达(要上TypeFamilies,DataKinds等等扩展)。但是用于限制长度来说,这种足够了。

可以受此启发,dbi也可以编码到结构中:

-- Nothing             => 0
-- Just Nothing        => 1
-- Just (Just Nothing) => 2
-- Just (Just (Just a)) => Free var
-- :: Maybe (Maybe (Maybe a)) -- < 3

data Expr a
  = Var a
  | Lam (Expr (Maybe a))
  | Expr a :@ Expr a
  deriving(Eq, Show, Read, Functor, Foldable, Traversable)
infixl 9 :@

-- λx.λy.x
-- Lam (Lam (BV 1)) 
-- Lam (Lam (Var (Just Nothing)))

-- λx.λy.x y z
-- Lam (Lam (BV 1 @@ BV 0 @@ FV "z"))
-- Lam (Lam (Var (Just Nothing) @@ Var Nothing @@ Var (Just (Just "z"))))

-- Lam _ 
-- hole 的类型是 Expr (Maybe a),能填Var Nothing,也就是BV 0

-- Lam (Lam _)
-- hole 的类型是 Expr (Maybe (Maybe a))
-- 能填Var (Just Nothing),也就是BV 1
-- 或者填 Var Nothing,也就是BV 0

(注:forall a. Expr a或者Expr Void是闭项)

像之前那样稍作抽象,将Scope拿出来:

newtype Scope f a = Scope { unScope :: f (Maybe a) }
  deriving (Functor, Foldable, Traversable)

instance Monad f => Applicative (Scope f) where
    pure = Scope . return . Just
    (<*>) = ap

instance Monad f => Monad (Scope f) where
  return = pure
  Scope m >>= f = Scope $ m >>= maybe (return Nothing) (unScope . f)

instance MonadTrans Scope where
  lift = Scope . liftM Just

现在,那两个组合子也只需要依赖于Scope就可以了:

abstract :: (Functor f, Eq a) => a -> f a -> Scope f a
abstract x = Scope . fmap go where
  go y = y <$ guard (x /= y)

instantiate :: Monad f => f a -> Scope f a -> f a
instantiate x (Scope xs) = xs >>= go where
  go Nothing  = x
  go (Just y) = return y

那么AST就可以这样定义了:

data Expr a
  = Var a
  | Expr a :@ Expr a
  | Lam (Scope Expr a)
  deriving (Functor, Foldable, Traversable)
infixl 9 :@

instance Applicative Expr where
    pure = Var
    (<*>) = ap

-- 这个monad可用来遍历Expr里的Free var
instance Monad Expr where
  return = pure
  Var a >>= f    = f a
  l :@ r >>= f   = (l >>= f) :@ (r >>= f)
  Lam body >>= f = Lam (body >>= lift . f)

-- smart constructor
lam :: Eq a => a -> Expr a -> Expr a
lam v = Lam . abstract v 

var :: a -> Expr a
var = Var

就很inductive,很abstract,很safe,很nice。(来自于paper

到目前为止,我们就得到了一个比较完整的在AST中表示“变量”/“作用域”的方案(并不)。haskell的bound包就有很完整的实现。

Scope Check

我们当然可以这样scope check:

type Name = String
data Expr
  = Var Name 
  | App Expr Expr
  | Lam Name Expr

isClosed :: Expr -> Bool
isClosed = go [] where
  go vs (Var v)   = v `elem` vs
  go vs (Lam v b) = go (v:vs)  b
  go vs (App f x) = go vs f && go vs x

但这样操作稍微low了一点。如果能将裸的AST转换为上面介绍的几种带Scope的AST,相当于“证明了”原表达式是个闭项,也就是过了scope check:

data Expr 
  = VarE String
  | AppE Expr Expr
  | LamE String Expr
  deriving(Show, Eq)

data Term a
  = VarT a
  | AppT (Term a) (Term a)
  | LamT (Scope Term a)
  deriving (Functor, Foldable, Traversable)

-- 为原AST附加上scope信息,也相当于证明了原表达式是闭项
fromExpr :: Expr -> Maybe (Term a)
fromExpr = go HM.empty where
  go :: HM.HashMap String a -> Expr -> Maybe (Term a)
  go m (VarE v)   = VarT <$> HM.lookup v m
  go m (LamE v b) = LamT <$> Scope <$> go (HM.insert v Nothing $ fmap Just m) b
  go m (AppE f x) = AppT <$> go m f <*> go m x

我们还可以进一步从dbi的AST的到HOAS。因为HOAS和dbi带有的信息是一致的,所以它们之间还能互转。这里留作习题。

当一个AST过了scope check之后,就可以着手type check了,接下来的事情也是十分有趣的,先给自己挖个坑,再写一篇AST with Type。

有一个很有意思的项目,讲的是dt的type check和elaboration,有兴趣可以去看一下。


更新:

(冰带逛)

data In : (x : a) -> (ctx : List a) -> Type where
  -- -----------
  -- Ctx, x |- x
  Z : x `In` (x :: ctx)
  --  Ctx |- x
  -- -----------
  -- Ctx, y |- x
  S : x `In` ctx -> x `In` (y :: ctx) 


data Term : (ctx : List a) -> Type where
  -- x ∈ Ctx
  -- --------
  -- Ctx |- x
  Var : x `In` ctx -> Term ctx
  -- Ctx, x |- M
  -- ------------
  -- Ctx |- λx. M
  Lam : (m : Term (x :: ctx)) -> Term ctx
  -- Ctx |- M  Ctx |- N
  -- ------------------
  --    Ctx |- M N
  App : (m : Term ctx) -> (n : Term ctx) -> Term ctx


demo1 : Expr ["x", "y"]
demo1 = App (Var Z) (Var (S Z))

demo2 : Expr ["y"]
demo2 = Lam {x = "x"} (Var Z {ctx = ["x", "y"]})

nothingInEmpty : In x [] -> Void
nothingInEmpty Z impossible
nothingInEmpty (S _) impossible 

isIn : DecEq a => (x : a) -> (xs : List a) -> Dec (In x xs)
isIn x [] = No nothingInEmpty
isIn x (y :: ctx) with (decEq x y)
  isIn x (x :: ctx) | (Yes Refl) = Yes Z
  isIn x (y :: ctx) | (No xneqy) with (isIn x ctx)
    isIn x (y :: ctx) | (No xneqy) | (Yes xinctx) = Yes (S xinctx)
    isIn x (y :: ctx) | (No xneqy) | (No xninctx) = No (lemma xneqy xninctx) where
      lemma : {x, y : a} -> {ctx : List a} -> Not (x = y) -> Not (In x ctx) -> Not (In x (y :: ctx))
      lemma xneqy xninctx Z         = xneqy Refl
      lemma xneqy xninctx (S xinctx) = xninctx xinctx
      
data Expr
  = VarE String
  | AppE Expr Expr
  | LamE String Expr

-- fromExpr : Expr -> (fvs ** Term fvs) 我又偷懒了呢
fromExpr : Expr -> Maybe (Term [] {a = String})
fromExpr = go [] where
  go : (ctx : List String) -> Expr -> Maybe (Term ctx)
  go ctx (VarE v) with (v `isIn` ctx)
    go ctx (VarE v) | (Yes vinctx) = Just (Var vinctx)
    go ctx (VarE v) | (No vninctx) = Nothing
  go ctx (LamE v b) = Lam <$> go (v::ctx) b
  go ctx (AppE f x) = App <$> go ctx f <*> go ctx x

也就是冰冰在评论提到https://bentnib.org/binding-universe-journal.pdf

(事实上我也就马虎写一写)