90分钟实现一门编程语言(极简解释器教程)

本文介绍了如何使用 c# 实现一个简化 scheme——ischeme 及其解释器。

如果你对下面的内容感兴趣：

实现基本的词法分析，语法分析并生成抽象语法树。实现嵌套作用域和函数调用。解释器的基本原理。以及一些 c# 编程技巧。

那么请继续阅读。

如果你对以下内容感兴趣：

高级的词法/语法分析技术。类型推导/分析。目标代码优化。

本文则过于初级，你可以跳过本文，但欢迎指出本文的错误 :-)

代码样例

public static int add( int a, int b) {

return a + b;

}

>> add(3, 4)

>> 7

>> add(5, 5)

>> 10

这段代码定义了 add 函数，接下来的 >> 符号表示对 add(3, 4) 进行求值，再下一行的 >> 7 表示上一行的求值结果，不同的求值用换行分开。可以把这里的 >> 理解成控制台提示符（即terminal中的ps）。

什么是解释器

解释器（interpreter）是一种程序，能够读入程序并直接输出结果，如上图。相对于编译器（compiler），解释器并不会生成目标机器代码，而是直接运行源程序，简单来说：

解释器是运行程序的程序。

计算器就是一个典型的解释器，我们把数学公式（源程序）给它，它通过运行它内部的]解释器]给我们答案。

ischeme 编程语言

ischeme 是什么？

scheme 语言的一个极简子集。虽然小，但变量，算术|比较|逻辑运算，列表，函数和递归这些编程语言元素一应俱全。非常非常慢——可以说它只是为演示本文的概念而存在。

ok，那么 scheme 是什么？

一种函数式程序设计语言。一种 lisp 方言。麻省理工学院程序设计入门课程使用的语言（参见 mit 6.001 和）。

使用波兰表达式（polish notation）。
更多的介绍参见 [scheme编程语言]。
以计算阶乘为例：

c#版阶乘

public static int factorial( int n) {

if (n == 1) {

return 1;

} else {

return n * factorial(n - 1);

}

ischeme版阶乘

(def factorial (lambda (n) (

if (= n 1)

(* n (factorial (- n 1))))))

数值类型

由于 ischeme 只是一个用于演示的语言，所以目前只提供对整数的支持。ischeme 使用 c# 的 int64 类型作为其内部的数值表示方法。

定义变量

ischeme使用`def`关键字定义变量

>> (def a 3)

>> 3

>> a

>> 3

算术|逻辑|比较操作

与常见的编程语言（c#, java, c++, c）不同，scheme 使用波兰表达式，即前缀表示法。例如：

c#中的算术|逻辑|比较操作

// arithmetic ops

a + b * c

a / (b + c + d)

// logical ops

(cond1 && cond2) || cond3

// comparing ops

a == b

1 < a && a < 3

对应的ischeme代码

; arithmetic ops

(+ a (* b c))

(/ a (+ b c d))

; logical ops

(or (and cond1 cond2) cond3)

; comparing ops

(= a b)

(< 1 a 3)

需要注意的几点：

ischeme 中的操作符可以接受不止两个参数——这在一定程度上控制了括号的数量。
ischeme 逻辑操作使用 and , or 和 not 代替了常见的 && , || 和 ! ——这在一定程度上增强了程序的可读性。
顺序语句

ischeme使用 begin 关键字标识顺序语句，并以最后一条语句的值作为返回结果。以求两个数的平均值为例：

c#的顺序语句

int a = 3;

int b = 5;

int c = (a + b) / 2;

ischeme的顺序语句

(def c (begin

(def a 3)

(def b 5)

(/ (+ a b) 2)))

控制流操作

ischeme 中的控制流操作只包含 if 。

if语句示例

>> (define a (if (> 3 2) 1 2))

>> 1

>> a

>> 1

列表类型

ischeme 使用 list 关键字定义列表，并提供 first 关键字获取列表的第一个元素；提供 rest 关键字获取列表除第一个元素外的元素。

ischeme的列表示例

>> (define alist (list 1 2 3 4))

>> (list 1 2 3 4)

>> (first alist)

>> 1

>> (rest alist)

>> (2 3 4)

定义函数

ischeme 使用 func 关键字定义函数：

ischeme的函数定义

(def square (func (x) (* x x)))

(def sum_square (func (a b) (+ (square a) (square b))))

对应的c#代码

public static int square ( int x) {

return x * x;

}

public static int sumsquare( int a, int b) {

return square(a) + square(b);

}

递归

由于 ischeme 中没有 for 或 while 这种命令式语言（imperative programming language）的循环结构，递归成了重复操作的唯一选择。

以计算最大公约数为例：

ischeme计算最大公约数

(def gcd (func (a b)

(if (= b 0)

(func (b (% a b))))))

对应的c#代码

public static int gcd ( int a, int b) {

if (b == 0) {

return a;

} else {

return gcd(b, a % b);

}

高阶函数
和 scheme 一样，函数在 ischeme 中是头等对象，这意味着：
可以定义一个变量为函数。函数可以接受一个函数作为参数。函数返回一个函数。
ischeme 的高阶函数示例

?

; defines a multiply function.

(def mul (func (a b) (* a b)))

; defines a list map function.

(def map (func (f alist)

   (if (empty? alist)

     (list )

     (append (list (f (first alist))) (map f (rest alist)))

     )))

; doubles a list using map and mul.

>> (map (mul 2) (list 1 2 3))

>> (list 2 4 6)

小结

对 ischeme 的介绍就到这里——事实上这就是 ischeme 的所有元素，会不会太简单了？ -_-

接下来进入正题——从头开始构造 ischeme 的解释程序。

解释器构造
ischeme 解释器主要分为两部分，解析（parse）和求值（evaluation）：

1、解析（parse）：解析源程序，并生成解释器可以理解的中间（intermediate）结构。这部分包含词法分析，语法分析，语义分析，生成语法树。
2、求值（evaluation）：执行解析阶段得到的中介结构然后得到运行结果。这部分包含作用域，类型系统设计和语法树遍历。
词法分析

词法分析负责把源程序解析成一个个词法单元（lex），以便之后的处理。

ischeme 的词法分析极其简单——由于 ischeme 的词法元素只包含括号，空白，数字和变量名，因此c#自带的 string#split 就足够。

ischeme的词法分析及测试

?

public static string[] tokenize(string text) {

   string[] tokens = text.replace("(", " ( ").replace(")", " ) ").split(" \t\r\n".toarray(), stringsplitoptions.removeemptyentries);

   return tokens;

}

// extends string.join for a smooth api.

public static string join(this string separator, ienumerable<object> values) {

   return string.join(separator, values);

}

// displays the lexes in a readable form.

public static string prettyprint(string[] lexes) {

   return "[" + ", ".join(lexes.select(s => "'" + s + "'") + "]";

}

// some tests

>> prettyprint(tokenize("a"))

>> ['a']

>> prettyprint(tokenize("(def a 3)"))

>> ['(', 'def', 'a', '3', ')']

>> prettyprint(tokenize("(begin (def a 3) (* a a))"))

>> ['begin', '(', 'def', 'a', '3', ')', '(', '*', 'a', 'a', ')', ')']

注意个人不喜欢 string.join 这个静态方法，所以这里使用c#的（extension methods）对string类型做了一个扩展。相对于linq syntax，我个人更喜欢linq extension methods，接下来的代码也都会是这种风格。不要以为词法分析都是这么离谱般简单！vczh的给出了一个完整编程语言的词法分析教程。

语法树生成
得到了词素之后，接下来就是进行语法分析。不过由于 lisp 类语言的程序即是语法树，所以语法分析可以直接跳过。

以下面的程序为例：

程序即语法树

?

;

(def x (if (> a 1) a 1))

; 换一个角度看的话：

(

   def

   x

   (

     if

     (

       >

       a

     )

     a

   )

)

更加直观的图片：

这使得抽象语法树（abstract syntax tree）的构建变得极其简单（无需考虑操作符优先级等问题），我们使用 sexpression 类型定义 ischeme 的语法树（事实上s expression也是lisp表达式的名字）。

抽象语法树的定义

?

public class sexpression {

   public string value { get ; private set ; }

   public list<sexpression> children { get ; private set ; }

   public sexpression parent { get ; private set ; }

   public sexpression( string value, sexpression parent) {

     this .value = value;

     this .children = new list<sexpression>();

     this .parent = parent;

   }

   public override string tostring() {

     if ( this .value == "(" ) {

       return "(" + " " .join(children) + ")" ;

     } else {

       return this .value;

     }

   }

}

然后用下面的步骤构建语法树：
碰到左括号，创建一个新的节点到当前节点（ current ），然后重设当前节点。碰到右括号，回退到当前节点的父节点。否则把为当前词素创建节点，添加到当前节点中。
抽象语法树的构建过程

?

public static sexpression parseasischeme( this string code) {

   sexpression program = new sexpression(value: "" , parent: null );

   sexpression current = program;

   foreach (var lex in tokenize(code)) {

     if (lex == "(" ) {

       sexpression newnode = new sexpression(value: "(" , parent: current);

       current.children.add(newnode);

       current = newnode;

     } else if (lex == ")" ) {

       current = current.parent;

     } else {

       current.children.add( new sexpression(value: lex, parent: current));

     }

   }

   return program.children[0];

}

注意使用（auto property），从而避免重复编写样版代码（boilerplate code）。使用（named parameters）提高代码可读性： new sexpression(value: "", parent: null) 比 new sexpression("", null) 可读。使用  提高代码流畅性： code.tokenize().parseasischeme 比 parseasischeme(tokenize(code)) 流畅。大多数编程语言的语法分析不会这么简单！如果打算实现一个类似c#的编程语言，你需要更强大的语法分析技术：如果打算手写语法分析器，可以参考 ll(k), precedence climbing 和top down operator precedence。如果打算生成语法分析器，可以参考 antlr 或 bison。

作用域
作用域决定程序的运行环境。ischeme使用嵌套作用域。

以下面的程序为例

?

>> (def x 1)

>> 1

>> (def y (begin (def x 2) (* x x)))

>> 4

>> x

>> 1

利用c#提供的 dictionary<tkey, tvalue> 类型，我们可以很容易的实现 ischeme 的作用域 sscope ：

ischeme的作用域实现

?

public class sscope {

   public sscope parent { get; private set; }

   private dictionary<string, sobject> variabletable;

   public sscope(sscope parent) {

     this.parent = parent;

     this.variabletable = new dictionary<string, sobject>();

   }

   public sobject find(string name) {

     sscope current = this;

     while (current != null) {

       if (current.variabletable.containskey(name)) {

         return current.variabletable[name];

       }

       current = current.parent;

     }

     throw new exception(name + " is not defined.");

   }

   public sobject define(string name, sobject value) {

     this.variabletable.add(name, value);

     return value;

   }

}

类型实现

ischeme 的类型系统极其简单——只有数值，bool，列表和函数，考虑到他们都是 ischeme 里面的值对象（value object），为了便于对它们进行统一处理，这里为它们设置一个统一的父类型 sobject ：

public class sobject { }

数值类型

ischeme 的数值类型只是对 .net 中 int64 （即 c# 里的 long ）的简单封装：

?

public class snumber : sobject {

   private readonly int64 value;

   public snumber(int64 value) {

     this .value = value;

   }

   public override string tostring() {

     return this .value.tostring();

   }

   public static implicit operator int64(snumber number) {

     return number.value;

   }

   public static implicit operator snumber(int64 value) {

     return new snumber(value);

   }

}

注意这里使用了 c# 的隐式操作符重载，这使得我们可以：

?

snumber foo = 30;

snumber bar = 40;

snumber foobar = foo * bar;

而不必：

?

snumber foo = new snumber(value: 30);

snumber bar = new snumber(value: 40);

snumber foobar = new snumber(value: foo.value * bar.value);

为了方便，这里也为 sobject 增加了隐式操作符重载（尽管 int64 可以被转换为 snumber 且 snumber 继承自 sobject ，但 .net 无法直接把 int64 转化为 sobject ）：

?

public class sobject {

   ...

   public static implicit operator sobject(int64 value) {

     return (snumber)value;

   }

}

bool类型

由于 bool 类型只有 true 和 false，所以使用静态对象就足矣。

?

public class sbool : sobject {

   public static readonly sbool false = new sbool();

   public static readonly sbool true = new sbool();

   public override string tostring() {

     return ((boolean)this).tostring();

   }

   public static implicit operator boolean(sbool value) {

     return value == sbool.true;

   }

   public static implicit operator sbool(boolean value) {

     return value ? true : false;

   }

}

这里同样使用了 c# 的，这使得我们可以：

?

sbool foo = a > 1;

if (foo) {

   // do something...

}

而不用

?

sbool foo = a > 1 ? sbool. true : sbool. false ;

if (foo == sbool. true ) {

   // do something...

}

同样，为 sobject 增加：

?

public class sobject {

   ...

   public static implicit operator sobject(boolean value) {

     return (sbool)value;

   }

}

列表类型

ischeme使用.net中的 ienumberable<t> 实现列表类型 slist ：

?

public class slist : sobject, ienumerable<sobject> {

   private readonly ienumerable<sobject> values;

   public slist(ienumerable<sobject> values) {

     this .values = values;

   }

   public override string tostring() {

     return "(list " + " " .join( this .values) + ")" ;

   }

   public ienumerator<sobject> getenumerator() {

     return this .values.getenumerator();

   }

   ienumerator ienumerable.getenumerator() {

     return this .values.getenumerator();

   }

}

实现 ienumerable<sobject> 后，就可以直接使用linq的一系列扩展方法，十分方便。
函数类型
ischeme 的函数类型（ sfunction ）由三部分组成：
函数体：即对应的 sexpression 。参数列表。作用域：函数拥有自己的作用域
sfunction的实现

?

public class sfunction : sobject {

   public sexpression body { get ; private set ; }

   public string [] parameters { get ; private set ; }

   public sscope scope { get ; private set ; }

   public boolean ispartial {

     get {

       return this 测试数据putefilledparameters().length.inbetween(1, this .parameters.length);

     }

   }

   public sfunction(sexpression body, string [] parameters, sscope scope) {

     this .body = body;

     this .parameters = parameters;

     this .scope = scope;

   }

   public sobject evaluate() {

     string [] filledparameters = this 测试数据putefilledparameters();

     if (filledparameters.length < parameters.length) {

       return this ;

     } else {

       return this .body.evaluate( this .scope);

     }

   }

   public override string tostring() {

     return string .format( "(func ({0}) {1})" ,

       " " .join( this .parameters.select(p => {

         sobject value = null ;

         if ((value = this .scope.findintop(p)) != null ) {

           return p + ":" + value;

         }

         return p;

       })), this .body);

   }

   private string [] computefilledparameters() {

     return this .parameters.where(p => scope.findintop(p) != null ).toarray();

   }

}

需要注意的几点

ischeme 支持部分求值（partial evaluation），这意味着：
部分求值

?

>> (def mul (func (a b) (* a b)))

>> (func (a b) (* a b))

>> (mul 3 4)

>> 12

>> (mul 3)

>> (func (a:3 b) (* a b))

>> ((mul 3) 4)

>> 12

也就是说，当 sfunction 的实际参数（argument）数量小于其形式参数（parameter）的数量时，它依然是一个函数，无法被求值。

这个功能有什么用呢？生成高阶函数。有了部分求值，我们就可以使用

?

(def mul (func (a b) (* a b)))

(def mul3 (mul 3))

>> (mul3 3)

>> 9

而不用专门定义一个生成函数：

?

(def times (func (n) (func (n x) (* n x)) ) )

(def mul3 (times 3))

>> (mul3 3)

>> 9

sfunction#tostring 可以将其自身还原为源代码——从而大大简化了 ischeme 的理解和测试。
内置操作

ischeme 的内置操作有四种：算术|逻辑|比较|列表操作。

我选择了表达力（expressiveness）强的 lambda 方法表来定义内置操作：

首先在 sscope 中添加静态字段 builtinfunctions ，以及对应的访问属性 builtinfunctions 和操作方法 buildin 。

?

public class sscope {

   private static dictionary< string , func<sexpression[], sscope, sobject>> builtinfunctions =

     new dictionary< string , func<sexpression[], sscope, sobject>>();

   public static dictionary< string , func<sexpression[], sscope, sobject>> builtinfunctions {

     get { return builtinfunctions; }

   }

   // dirty hack for fluent api.

   public sscope buildin( string name, func<sexpression[], sscope, sobject> builtinfuntion) {

     sscope.builtinfunctions.add(name, builtinfuntion);

     return this ;

   }

}

注意：
func<t1, t2, tresult> 是 c# 提供的委托类型，表示一个接受 t1 和 t2 ，返回 tresult 这里有一个小 hack，使用实例方法（instance method）修改静态成员（static member），从而实现一套流畅的 api（参见fluent interface）。
接下来就可以这样定义内置操作：

?

new sscope(parent: null)

   .buildin("+", addmethod)

   .buildin("-", submethod)

   .buildin("", mulmethod)

   .buildin("/", divmethod);

一目了然。

断言（assertion）扩展

为了便于进行断言，我对 boolean 类型做了一点点扩展。

?

public static void orthrows( this boolean condition, string message = null ) {

   if (!condition) { throw new exception(message ?? "wtf" ); }

}

从而可以写出流畅的断言：

(a < 3).orthrows("value must be less than 3.");
而不用

?

if (a < 3) {

   throw new exception("value must be less than 3.");

}

算术操作

ischeme 算术操作包含 + - / % 五个操作，它们仅应用于数值类型（也就是 snumber ）。

从加减法开始：

?

.buildin("+", (args, scope) => {

   var numbers = args.select(obj => obj.evaluate(scope)).cast<snumber>();

   return numbers.sum(n => n);

})

.buildin("-", (args, scope) => {

   var numbers = args.select(obj => obj.evaluate(scope)).cast<snumber>().toarray();

   int64 firstvalue = numbers[0];

   if (numbers.length == 1) {

     return -firstvalue;

   }

   return firstvalue - numbers.skip(1).sum(s => s);

})

注意到这里有一段重复逻辑负责转型求值（cast then evaluation），考虑到接下来还有不少地方要用这个逻辑，我把这段逻辑抽象成扩展方法：

?

public static ienumerable<t> evaluate<t>(this ienumerable<sexpression> expressions, sscope scope)

where t : sobject {

   return expressions.evaluate(scope).cast<t>();

}

public static ienumerable<sobject> evaluate(this ienumerable<sexpression> expressions, sscope scope) {

   return expressions.select(exp => exp.evaluate(scope));

}

然后加减法就可以如此定义：

?

.buildin("+", (args, scope) => (args.evaluate<snumber>(scope).sum(s => s)))

.buildin("-", (args, scope) => {

   var numbers = args.evaluate<snumber>(scope).toarray();

   int64 firstvalue = numbers[0];

   if (numbers.length == 1) {

     return -firstvalue;

   }

   return firstvalue - numbers.skip(1).sum(s => s);

})

乘法，除法和求模定义如下：

?

.buildin("", (args, scope) => args.evaluate<snumber>(scope).aggregate((a, b) => a b))

.buildin("/", (args, scope) => {

   var numbers = args.evaluate<snumber>(scope).toarray();

   int64 firstvalue = numbers[0];

   return firstvalue / numbers.skip(1).aggregate((a, b) => a * b);

})

.buildin("%", (args, scope) => {

   (args.length == 2).orthrows("parameters count in mod should be 2");

   var numbers = args.evaluate<snumber>(scope).toarray();

   return numbers[0] % numbers[1];

})

逻辑操作
ischeme 逻辑操作包括 and ， or 和 not ，即与，或和非。

需要注意的是 ischeme 逻辑操作是短路求值（short-circuit evaluation），也就是说：
(and conda condb) ，如果 conda 为假，那么整个表达式为假，无需对 condb 求值。 (or conda condb) ，如果 conda 为真，那么整个表达式为真，无需对 condb 求值。
此外和 + - * / 一样， and 和 or 也可以接收任意数量的参数。

需求明确了接下来就是实现，ischeme 的逻辑操作实现如下：

?

.buildin("and", (args, scope) => {

   (args.length > 0).orthrows();

   return !args.any(arg => !(sbool)arg.evaluate(scope));

})

.buildin("or", (args, scope) => {

   (args.length > 0).orthrows();

   return args.any(arg => (sbool)arg.evaluate(scope));

})

.buildin("not", (args, scope) => {

   (args.length == 1).orthrows();

   return args[0].evaluate(scope);

})

比较操作
ischeme 的比较操作包括 = < > >= <= ，需要注意下面几点：
= 是比较操作而非赋值操作。
同算术操作一样，它们应用于数值类型，并支持任意数量的参数。

= 的实现如下：

?

.buildin("=", (args, scope) => {

   (args.length > 1).orthrows("must have more than 1 argument in relation operation.");

   snumber current = (snumber)args[0].evaluate(scope);

   foreach (var arg in args.skip(1)) {

     snumber next = (snumber)arg.evaluate(scope);

     if (current == next) {

       current = next;

     } else {

       return false;

     }

   }

   return true;

})

可以预见所有的比较操作都将使用这段逻辑，因此把这段比较逻辑抽象成一个扩展方法：

?

public static sbool chainrelation(this sexpression[] expressions, sscope scope, func<snumber, snumber, boolean> relation) {

   (expressions.length > 1).orthrows("must have more than 1 parameter in relation operation.");

   snumber current = (snumber)expressions[0].evaluate(scope);

   foreach (var obj in expressions.skip(1)) {

     snumber next = (snumber)obj.evaluate(scope);

     if (relation(current, next)) {

       current = next;

     } else {

       return sbool.false;

     }

   }

   return sbool.true;

}

接下来就可以很方便的定义比较操作：

?

.buildin("=", (args, scope) => args.chainrelation(scope, (s1, s2) => (int64)s1 == (int64)s2))

.buildin(">", (args, scope) => args.chainrelation(scope, (s1, s2) => s1 > s2))

.buildin("<", (args, scope) => args.chainrelation(scope, (s1, s2) => s1 < s2))

.buildin(">=", (args, scope) => args.chainrelation(scope, (s1, s2) => s1 >= s2))

.buildin("<=", (args, scope) => args.chainrelation(scope, (s1, s2) => s1 <= s2))

注意 = 操作的实现里面有 int64 强制转型——因为我们没有重载 snumber#equals ，所以无法直接通过 == 来比较两个 snumber 。

列表操作

ischeme 的列表操作包括 first ， rest ， empty? 和 append ，分别用来取列表的第一个元素，除第一个以外的部分，判断列表是否为空和拼接列表。

first 和 rest 操作如下：

?

.buildin("first", (args, scope) => {

   slist list = null;

   (args.length == 1 && (list = (args[0].evaluate(scope) as slist)) != null).orthrows("<first> must apply to a list.");

   return list.first();

})

.buildin("rest", (args, scope) => {

   slist list = null;

   (args.length == 1 && (list = (args[0].evaluate(scope) as slist)) != null).orthrows("<rest> must apply to a list.");

   return new slist(list.skip(1));

})

又发现相当的重复逻辑——判断参数是否是一个合法的列表，重复代码很邪恶，所以这里把这段逻辑抽象为扩展方法：

?

public static slist retrieveslist(this sexpression[] expressions, sscope scope, string operationname) {

   slist list = null;

   (expressions.length == 1 && (list = (expressions[0].evaluate(scope) as slist)) != null)

     .orthrows("<" + operationname + "> must apply to a list");

   return list;

}

有了这个扩展方法，接下来的列表操作就很容易实现：

?

.buildin("first", (args, scope) => args.retrieveslist(scope, "first").first())

.buildin("rest", (args, scope) => new slist(args.retrieveslist(scope, "rest").skip(1)))

.buildin("append", (args, scope) => {

   slist list0 = null, list1 = null;

   (args.length == 2

     && (list0 = (args[0].evaluate(scope) as slist)) != null

     && (list1 = (args[1].evaluate(scope) as slist)) != null).orthrows("input must be two lists");

   return new slist(list0.concat(list1));

})

.buildin("empty?", (args, scope) => args.retrieveslist(scope, "empty?").count() == 0)

测试

ischeme 的内置操作完成之后，就可以测试下初步成果了。

首先添加基于控制台的分析/求值（parse/evaluation）循环：

?

public static void keepinterpretinginconsole( this sscope scope, func< string , sscope, sobject> evaluate) {

   while ( true ) {

     try {

       console.foregroundcolor = consolecolor.gray;

       console.write( ">> " );

       string code;

       if (! string .isnullorwhitespace(code = console.readline())) {

         console.foregroundcolor = consolecolor.green;

         console.writeline( ">> " + evaluate(code, scope));

       }

     } catch (exception ex) {

       console.foregroundcolor = consolecolor.red;

       console.writeline( ">> " + ex.message);

     }

   }

}

然后在 sexpression#evaluate 中补充调用代码：

?

public override sobject evaluate(sscope scope) {

   if ( this .children.count == 0) {

     int64 number;

     if (int64.tryparse( this .value, out number)) {

       return number;

     }

   } else {

     sexpression first = this .children[0];

     if (sscope.builtinfunctions.containskey(first.value)) {

       var arguments = this .children.skip(1).select(node => node.evaluate(scope)).toarray();

       return sscope.builtinfunctions[first.value](arguments, scope);

     }

   }

   throw new exception( "this is just temporary!" );

}

最后在 main 中调用该解释/求值循环：

?

static void main( string [] cmdargs) {

   new sscope(parent: null )

     .buildin( "+" , (args, scope) => (args.evaluate<snumber>(scope).sum(s => s)))

     // 省略若干内置函数

     .buildin( "empty?" , (args, scope) => args.retrieveslist( "empty?" ).count() == 0)

     .keepinterpretinginconsole((code, scope) => code.parseasscheme().evaluate(scope));

}

运行程序，输入一些简单的表达式：

看样子还不错 :-)

接下来开始实现ischeme的执行（evaluation）逻辑。

执行逻辑

ischeme 的执行就是把语句（sexpression）在作用域（sscope）转化成对象（sobject）并对作用域（sscope）产生作用的过程，如下图所示。

ischeme的执行逻辑就在 sexpression#evaluate 里面：

?

public class sexpression {

   // ...

   public override sobject evaluate(sscope scope) {

     // todo: todo your ass.

   }

}

首先明确输入和输出：
处理字面量（literals）： 3 ；和具名量（named values）： x 处理 if ：(if (< a 3) 3 a) 处理 def ：(def pi 3.14) 处理 begin ：(begin (def a 3) (* a a)) 处理 func ：(func (x) (* x x)) 处理内置函数调用：(+ 1 2 3 (first (list 1 2))) 处理自定义函数调用：(map (func (x) (* x x)) (list 1 2 3))
此外，情况1和2中的 sexpression 没有子节点，可以直接读取其 value 进行求值，余下的情况需要读取其 children 进行求值。

首先处理没有子节点的情况：
处理字面量和具名量

?

if ( this .children.count == 0) {

   int64 number;

   if (int64.tryparse( this .value, out number)) {

     return number;

   } else {

     return scope.find( this .value);

   }

}

接下来处理带有子节点的情况：

首先获得当前节点的第一个节点：

sexpression first = this.children[0];
然后根据该节点的 value 决定下一步操作：

处理 if

if 语句的处理方法很直接——根据判断条件（condition）的值判断执行哪条语句即可：

?

if (first.value == "if" ) {

   sbool condition = (sbool)( this .children[1].evaluate(scope));

   return condition ? this .children[2].evaluate(scope) : this .children[3].evaluate(scope);

}

处理 def

直接定义即可：

?

else if (first.value == "def" ) {

   return scope.define( this .children[1].value, this .children[2].evaluate( new sscope(scope)));

}

处理 begin

遍历语句，然后返回最后一条语句的值：

?

else if (first.value == "begin" ) {

   sobject result = null ;

   foreach (sexpression statement in this .children.skip(1)) {

     result = statement.evaluate(scope);

   }

   return result;

}

处理 func

利用 sexpression 构建 sfunction ，然后返回：

?

else if (first.value == "func" ) {

   sexpression body = this .children[2];

   string [] parameters = this .children[1].children.select(exp => exp.value).toarray();

   sscope newscope = new sscope(scope);

   return new sfunction(body, parameters, newscope);

}

处理 list

首先把 list 里的元素依次求值，然后创建 slist ：

?

else if (first.value == "list" ) {

   return new slist( this .children.skip(1).select(exp => exp.evaluate(scope)));

}

处理内置操作

首先得到参数的表达式，然后调用对应的内置函数：

?

else if (sscope.builtinfunctions.containskey(first.value)) {

   var arguments = this .children.skip(1).toarray();

   return sscope.builtinfunctions[first.value](arguments, scope);

}

处理自定义函数调用
自定义函数调用有两种情况：
非具名函数调用：((func (x) (* x x)) 3) 具名函数调用：(square 3)
调用自定义函数时应使用新的作用域，所以为 sfunction 增加 update 方法：

?

public sfunction update(sobject[] arguments) {

   var existingarguments = this .parameters.select(p => this .scope.findintop(p)).where(obj => obj != null );

   var newarguments = existingarguments.concat(arguments).toarray();

   sscope newscope = this .scope.parent.spawnscopewith( this .parameters, newarguments);

   return new sfunction( this .body, this .parameters, newscope);

}

为了便于创建自定义作用域，并判断函数的参数是否被赋值，为 sscope 增加 spawnscopewith 和 findintop 方法：

?

public sscope spawnscopewith( string [] names, sobject[] values) {

   (names.length >= values.length).orthrows( "too many arguments." );

   sscope scope = new sscope( this );

   for (int32 i = 0; i < values.length; i++) {

     scope.variabletable.add(names[i], values[i]);

   }

   return scope;

}

public sobject findintop( string name) {

   if (variabletable.containskey(name)) {

     return variabletable[name];

   }

   return null ;

}

下面是函数调用的实现：

?

else {

   sfunction function = first.value == "(" ? (sfunction)first.evaluate(scope) : (sfunction)scope.find(first.value);

   var arguments = this .children.skip(1).select(s => s.evaluate(scope)).toarray();

   return function.update(arguments).evaluate();

}

完整的求值代码

综上所述，求值代码如下

?

public sobject evaluate(sscope scope) {

   if ( this .children.count == 0) {

     int64 number;

     if (int64.tryparse( this .value, out number)) {

       return number;

     } else {

       return scope.find( this .value);

     }

   } else {

     sexpression first = this .children[0];

     if (first.value == "if" ) {

       sbool condition = (sbool)( this .children[1].evaluate(scope));

       return condition ? this .children[2].evaluate(scope) : this .children[3].evaluate(scope);

     } else if (first.value == "def" ) {

       return scope.define( this .children[1].value, this .children[2].evaluate( new sscope(scope)));

     } else if (first.value == "begin" ) {

       sobject result = null ;

       foreach (sexpression statement in this .children.skip(1)) {

         result = statement.evaluate(scope);

       }

       return result;

     } else if (first.value == "func" ) {

       sexpression body = this .children[2];

       string [] parameters = this .children[1].children.select(exp => exp.value).toarray();

       sscope newscope = new sscope(scope);

       return new sfunction(body, parameters, newscope);

     } else if (first.value == "list" ) {

       return new slist( this .children.skip(1).select(exp => exp.evaluate(scope)));

     } else if (sscope.builtinfunctions.containskey(first.value)) {

       var arguments = this .children.skip(1).toarray();

       return sscope.builtinfunctions[first.value](arguments, scope);

     } else {

       sfunction function = first.value == "(" ? (sfunction)first.evaluate(scope) : (sfunction)scope.find(first.value);

       var arguments = this .children.skip(1).select(s => s.evaluate(scope)).toarray();

       return function.update(arguments).evaluate();

     }

   }

}

去除尾递归
到了这里 ischeme 解释器就算完成了。但仔细观察求值过程还是有一个很大的问题，尾递归调用：
处理 if 的尾递归调用。处理函数调用中的尾递归调用。
alex stepanov 曾在 elements of programming 中介绍了一种将严格尾递归调用（strict tail-recursive call）转化为迭代的方法，细节恕不赘述，以阶乘为例：

?

// recursive factorial.

int fact ( int n) {

   if (n == 1)

     return result;

   return n * fact(n - 1);

}

// first tranform to tail recursive version.

int fact ( int n, int result) {

   if (n == 1)

     return result;

   else {

     result = n;

     n -= 1;

     return fact(n, result); // this is a strict tail-recursive call which can be omitted

   }

}

// then transform to iterative version.

int fact ( int n, int result) {

   while ( true ) {

     if (n == 1)

       return result;

     else {

       result = n;

       n -= 1;

     }

   }

}

应用这种方法到 sexpression#evaluate ，得到转换后的版本：

?

public sobject evaluate(sscope scope) {

   sexpression current = this ;

   while ( true ) {

     if (current.children.count == 0) {

       int64 number;

       if (int64.tryparse(current.value, out number)) {

         return number;

       } else {

         return scope.find(current.value);

       }

     } else {

       sexpression first = current.children[0];

       if (first.value == "if" ) {

         sbool condition = (sbool)(current.children[1].evaluate(scope));

         current = condition ? current.children[2] : current.children[3];

       } else if (first.value == "def" ) {

         return scope.define(current.children[1].value, current.children[2].evaluate( new sscope(scope)));

       } else if (first.value == "begin" ) {

         sobject result = null ;

         foreach (sexpression statement in current.children.skip(1)) {

           result = statement.evaluate(scope);

         }

         return result;

       } else if (first.value == "func" ) {

         sexpression body = current.children[2];

         string [] parameters = current.children[1].children.select(exp => exp.value).toarray();

         sscope newscope = new sscope(scope);

         return new sfunction(body, parameters, newscope);

       } else if (first.value == "list" ) {

         return new slist(current.children.skip(1).select(exp => exp.evaluate(scope)));

       } else if (sscope.builtinfunctions.containskey(first.value)) {

         var arguments = current.children.skip(1).toarray();

         return sscope.builtinfunctions[first.value](arguments, scope);

       } else {

         sfunction function = first.value == "(" ? (sfunction)first.evaluate(scope) : (sfunction)scope.find(first.value);

         var arguments = current.children.skip(1).select(s => s.evaluate(scope)).toarray();

         sfunction newfunction = function.update(arguments);

         if (newfunction.ispartial) {

           return newfunction.evaluate();

         } else {

           current = newfunction.body;

           scope = newfunction.scope;

         }

       }

     }

   }

}

一些演示

基本的运算

高阶函数

回顾

小结
除去注释（貌似没有注释-_-），ischeme 的解释器的实现代码一共 333 行——包括空行，括号等元素。

在这 300 余行代码里，实现了函数式编程语言的大部分功能：算术|逻辑|运算，嵌套作用域，顺序语句，控制语句，递归，高阶函数，部分求值。

与我两年之前实现的 scheme 方言 lucida相比，ischeme 除了没有字符串类型，其它功能和lucida相同，而代码量只是前者的八分之一，编写时间是前者的十分之一（lucida 用了两天，ischeme 用了一个半小时），可扩展性和易读性均秒杀前者。这说明了：
代码量不能说明问题。不同开发者生产效率的差别会非常巨大。这两年我还是学到了一点东西的。-_-

一些设计决策使用扩展方法提高可读性
例如，通过定义 orthrows

?

public static void orthrows( this boolean condition, string message = null ) {

   if (!condition) { throw new exception(message ?? "wtf" ); }

}

写出流畅的断言：

?

(a < 3).orthrows("value must be less than 3.");

声明式编程风格

以 main 函数为例：

?

static void main(string[] cmdargs) {

   new sscope(parent: null)

     .buildin("+", (args, scope) => (args.evaluate<snumber>(scope).sum(s => s)))

     // other build

     .buildin("empty?", (args, scope) => args.retrieveslist("empty?").count() == 0)

     .keepinterpretinginconsole((code, scope) => code.parseasischeme().evaluate(scope));

}

非常直观，而且
如果需要添加新的操作，添加写一行 buildin 即可。如果需要使用其它语法，替换解析函数 parseasischeme 即可。如果需要从文件读取代码，替换执行函数 keepinterpretinginconsole 即可。
不足

当然ischeme还是有很多不足：

语言特性方面：
缺乏实用类型：没有 double 和 string 这两个关键类型，更不用说复合类型（compound type）。没有io操作，更不要说网络通信。效率低下：尽管去除尾递归挽回了一点效率，但ischeme的执行效率依然惨不忍睹。错误信息：错误信息基本不可读，往往出错了都不知道从哪里找起。不支持延续调用（call with current continuation，即call/cc）。没有并发。各种bug：比如可以定义文本量，无法重载默认操作，空括号被识别等等。
设计实现方面：
使用了可变（mutable）类型。没有任何注释（因为觉得没有必要 -_-）。糟糕的类型系统：lisp类语言中的数据和程序可以不分彼此，而ischeme的实现中确把数据和程序分成了 sobject 和 sexpression ，现在我依然没有找到一个融合他们的好办法。
这些就留到以后慢慢处理了 -_-（todo your ass）

延伸阅读
书籍
compilers: priciples, techniques and tools principles: http://HdhCmsTestamazon.co.uk/compilers-principles-techniques-v-aho/dp/1292024348/ language implementation patterns: http://HdhCmsTestamazon.co.uk/language-implementation-patterns-domain-specific-programming/dp/193435645x/ the definitive antlr4 reference: http://HdhCmsTestamazon.co.uk/definitive-antlr-4-reference/dp/1934356999/ engineering a compiler: http://HdhCmsTestamazon.co.uk/engineering-compiler-keith-cooper/dp/012088478x/ flex & bison: http://HdhCmsTestamazon.co.uk/flex-bison-john-levine/dp/0596155972/ writing compilers and interpreters: http://HdhCmsTestamazon.co.uk/writing-compilers-interpreters-software-engineering/dp/0470177071/ elements of programming: http://HdhCmsTestamazon.co.uk/elements-programming-alexander-stepanov/dp/032163537x/
注：带*号的没有中译本。

文章
大多和编译前端相关，自己没时间也没能力研究后端。-_-

为什么编译技术很重要？看看 Steve Yegge（没错，就是被王垠黑过的 Google 高级技术工程师）是怎么说的（需要翻墙）。

http://steve-yegge.blogspot.co.uk/2007/06/rich-programmer-food.html

本文重点参考的 Peter Norvig 的两篇文章：
How to write a lisp interpreter in Python: http://norvig测试数据/lispy.html An even better lisp interpreter in Python: http://norvig测试数据/lispy2.html
几种简单实用的语法分析技术：
LL(k) Parsing： http://eli.thegreenplace.net/2008/09/26/recursive-descent-ll-and-predictive-parsers/ http://eli.thegreenplace.net/2009/03/20/a-recursive-descent-parser-with-an-infix-expression-evaluator/ http://eli.thegreenplace.net/2009/03/14/some-problems-of-recursive-descent-parsers/ Top Down Operator Precendence：http://javascript.crockford测试数据/tdop/tdop.html Precendence Climbing Parsing：http://en.wikipedia.org/wiki/Operator-precedence_parser

原文链接：http://lucida.me/blog/how-to-implement-an-interpreter-in-csharp/

dy("nrwz");

查看更多关于90分钟实现一门编程语言(极简解释器教程)的详细内容...

声明：本文来自网络，不代表【好得很程序员自学网】立场，转载请注明出处：http://haodehen.cn/did56663

更新时间：2022-09-26 阅读：48次