Stork, An Example Programming Language, Lesson 3: Expression Evaluation

  • strict warning: Non-static method view::load() should not be called statically in /home/sigpboo7/public_html/modules/views/views.module on line 842.
  • strict warning: Declaration of views_handler_field_comment::init() should be compatible with views_handler_field::init(&$view, $options) in /home/sigpboo7/public_html/modules/views/modules/comment/views_handler_field_comment.inc on line 50.
  • strict warning: Declaration of views_plugin_row::options_validate() should be compatible with views_plugin::options_validate(&$form, &$form_state) in /home/sigpboo7/public_html/modules/views/plugins/views_plugin_row.inc on line 135.
  • strict warning: Declaration of views_plugin_row::options_submit() should be compatible with views_plugin::options_submit(&$form, &$form_state) in /home/sigpboo7/public_html/modules/views/plugins/views_plugin_row.inc on line 135.

Welcome back!

For those of you just joining, Stork is an example programming language “course” designed to demonstrate the principles of programming language implementation in 10 “lessons.” This is Lesson 3 in a series of 10, so if you’re just joining now, you may want to check out lessons 1 and 2 to gear up a bit for this post.

Lesson 3 covers the basics of compiler design (front end versus back end) and types, plus a very brief preview of static analysis. At the end of this lesson, Stork will be a working interpreter for simple numerical expressions.

The code for this lesson is available on github under the tag lesson3, and you can follow the discussion about this lesson on reddit.

(Basic) Compiler Theory

Most developers are familiar with the use of compilers like javac and gcc — instant program, just add source code — but aren’t familiar with their inner workings. Stork is intended to dispel some of the mystery around compilers, and its far enough along now to start discussing Stork in the greater context of compiler design.

In the most general sense, compilers are simply translators that turn program source code into executable instructions. There are many compilers: javac, the Java compiler, turns Java code into Java bytecode; gcc, the Gnu C Compiler, turns C code into native instructions, and so on. There are also similar programs called interpreters that execute program source code directly without first compiling them down to instructions, like ruby or the subject of this course, Stork. While interpreters are technically different from compilers, the same design principles apply, so the Stork interpreter will serve nicely as a platform for exploring simple compiler design.

Compiler Design: Front End, Middle End, and Back End

At a high level, compilers look like this:

Basic Compiler Design

Interpreters work much the same, but do not emit source code and have a Runtime in place of a back end. While this discussion will focus on compilers, note the similarity between compiler and interpreter design:

Basic Interpreter Design

The Reason for Three-Part Design

While a compiler is a translator from source code to executable instructions, it is not necessarily a translator from one kind of source code to one kind of executable instructions. That is, a single compiler (or compiler framework, like GCC) may translate programs written in any one of many source languages — C, C++, Fortran, etc. — into a program using any one of many executable instruction setsx86, ARM, MIPS, etc.

To this end, a compiler is typically divided into three parts: the front end, the middle end, and the backend. The front end is programming language-specific and translates program source code into an “intermediate representation” (IR) within the compiler, and the backend is instruction set-specific and translates the intermediate representation into the desired instructions. The middle end, on the other hand, focuses on language- and architecture-agnostic concerns like optimization and operates on the intermediate representation, which allows it to work across any programming language or instruction set.

This approach to compiler design promotes a great deal of code re-use, which is of tremendous benefit to compiler writers, particularly for compiler frameworks like GCC. A new programming language can quickly support multiple instruction sets by implementing a new frontend to a popular compiler, and an instruction set can support multiple programming languages by implementing a new backend. In addition, any new frontend or backend can make instant use of the existing, high-quality optimizer already in that compiler.

(Basic) Static Analysis

When a compiler or interpreter builds or executes source code, it necessarily performs analysis on that source. Some parts of this analysis — tokenization and parsing — are syntactic in nature, while other parts, like type analysis and optimization, are semantic in nature and focus on things like determining program correctness and improving performance. These semantic analytical steps are called “static analysis,” which is an umbrella term for any analysis of a program that is done without actually executing that program.

For now, the only kind of static analysis Stork applies to programs is type analysis, although more will be added in subsequent lessons.

Type Analysis

Stork is a statically-typed programming language, which means that the types of all values are (approximately) known at compile time. Since these approximate types — called “static types” — are known at compile time, compilers analyze these types to make sure that program expressions “make sense.” In its current state, Stork supports only two types: int, and float. Stork will have several more types of things by the end of the course, but for now it keeps type analysis simple. The difference between compile-time static types and runtime types will be discussed in later lessons, too.

Because there are only two types and they’re both numerical, most expressions that it is currently possible to spell in Stork make sense. The expressions 1+2, 1.0+2.0, 1+2.0*(4-7.5)*6.1234, and 5%3 make sense, or more formally are “type consistent.” However, some expressions, like 1.0%5.0, do not make sense, here because the % modulus operator is (correctly) defined only on ints and not on floats. Stork’s static typing engine identifies those expressions that don’t make sense and raises an error when it sees them.

Besides recognizing type errors, static typing also allows a compiler to identify required data conversions at compile time. For example, using type analysis, Stork can determine that evaluating 1+2.0 requires the int 1 to be coerced, or automatically converted, to the float 1.0 before it can evaluate the expressions properly as 1.0+2.0. Such coercion rules are a good example of syntactic sugar, or technically unnecessary but in practice useful automatic changes to (usually) mundane details of program semantics to make the programmer’s life easier. (Editorial note: If you’re not convinced things like coercion are a good idea, I invite you to give Scheme a try!)

The Stork Interpreter

So far, the Stork implementation has consisted entirely of front end components. However, in this lesson, the Stork implementation finally gets a proper runtime engine, which makes it a full interpreter, although admittedly not a very interesting one yet.

To keep things simple, Stork has no middle end for the time being. Again, this is where things like optimization would happen, but for now Stork is focusing on adding features, so it’s not the right time to be thinking about optimization yet. This lesson won’t discuss the middle end further.

Stork’s Front End

The first two front end components of an interpreter should look pretty familiar from Stork lessons 1 (tokenization) and 2 (parsing). This lesson covers the third step in the front end, the translation of parsed program text into its corresponding intermediate representation.

The translation piece is implemented by a new class, Translator, which translates ASTs into new IR Expr objects using the visitor pattern. This required adding an abstract method to the ExprAST class:

public abstract class ExprAST extends AST {
    public BinaryOperatorExprAST asBinaryOperator() {
        return (BinaryOperatorExprAST) this;
    }

    public UnaryOperatorExprAST asUnaryOperator() {
        return (UnaryOperatorExprAST) this;
    }

    public IntExprAST asInt() {
        return (IntExprAST) this;
    }

    public FloatExprAST asFloat() {
        return (FloatExprAST) this;
    }
    
    // NEW METHOD
    public abstract Expr translate(Translator translate);
}

And each ExprAST subclass simply calls back into the given Translator to implement the visitor pattern and return the resulting IR Expr, all of which is necessary because Java doesn’t support double dispatch:

public class IntExprAST extends ExprAST {
    private long value;
    
    public IntExprAST(long value) {
        this.value = value;
    }

    public long getValue() {
        return value;
    }

    // NEW METHOD
    public Expr translate(Translator translator) {
        return translator.compile(this);
    }
}

The Translator itself has specialized method for each kind of ExprAST that actually translates an arbitrary ExprAST to its corresponding Expr:

public class Translator {
    // ...

    public Expr translate(BinaryOperatorExprAST expr) {
        // ...
    }

    public Expr translate(UnaryOperatorExprAST expr) {
        // ...
    }

    public Expr translate(IntExprAST expr) {
        // ...
    }
    
    public Expr translate(FloatExprAST expr) {
        // ...
    }
}

The IntExprAST and FloatExprAST translate() methods are very simple:

    public Expr translate(IntExprAST expr) {
        return new IntExpr(expr.getValue());
    }
    
    public Expr translate(FloatExprAST expr) {
        return new FloatExpr(expr.getValue());
    }

While the BinaryOperatorExprAST translate() look more complex, but is actually pretty simple:

    private static final Map<Type,BinaryOperatorExpr.Operator> binplusRuntimeOperators=new HashMap<Type,BinaryOperatorExpr.Operator>() {{
        put(Type.FLOAT, BinaryOperatorExpr.Operator.FADD);
        put(Type.INT, BinaryOperatorExpr.Operator.IADD);
    }};
    
    private static final Map<Type,BinaryOperatorExpr.Operator> binminusRuntimeOperators=new HashMap<Type,BinaryOperatorExpr.Operator>() {{
        put(Type.FLOAT, BinaryOperatorExpr.Operator.FSUB);
        put(Type.INT, BinaryOperatorExpr.Operator.ISUB);
    }};
    
    private static final Map<Type,BinaryOperatorExpr.Operator> bintimesRuntimeOperators=new HashMap<Type,BinaryOperatorExpr.Operator>() {{
        put(Type.FLOAT, BinaryOperatorExpr.Operator.FMUL);
        put(Type.INT, BinaryOperatorExpr.Operator.IMUL);
    }};
    
    private static final Map<Type,BinaryOperatorExpr.Operator> bindivideRuntimeOperators=new HashMap<Type,BinaryOperatorExpr.Operator>() {{
        put(Type.FLOAT, BinaryOperatorExpr.Operator.FDIV);
        put(Type.INT, BinaryOperatorExpr.Operator.IDIV);
    }};
    
    private static final Map<Type,BinaryOperatorExpr.Operator> binmodulusRuntimeOperators=new HashMap<Type,BinaryOperatorExpr.Operator>() {{
        put(Type.INT, BinaryOperatorExpr.Operator.MOD);
    }};
    
    private static final Map<BinaryOperatorExprAST.Operator,Map<Type,BinaryOperatorExpr.Operator>> binaryOperators=new EnumMap<BinaryOperatorExprAST.Operator,Map<Type,BinaryOperatorExpr.Operator>>(BinaryOperatorExprAST.Operator.class) {{
        put(BinaryOperatorExprAST.Operator.PLUS, binplusRuntimeOperators);
        put(BinaryOperatorExprAST.Operator.MINUS, binminusRuntimeOperators);
        put(BinaryOperatorExprAST.Operator.TIMES, bintimesRuntimeOperators);
        put(BinaryOperatorExprAST.Operator.DIVIDE, bindivideRuntimeOperators);
        put(BinaryOperatorExprAST.Operator.MOD, binmodulusRuntimeOperators);
    }};

    protected static Type computeBinaryOperatorResultType(Type left, Type right) {
        Type result;
        if(left.equals(right))
            result = left;
        else
        if(left instanceof NumericType && right instanceof NumericType) {
            NumericType nleft=(NumericType) left;
            NumericType nright=(NumericType) right;
            if(nleft.getPrecision() > nright.getPrecision())
                result = nleft;
            else
                result = right;
        }
        else
            throw new StorkException("Unrecognized type: "+left);
        return result;
    }
    
    protected static Expr coerce(Expr expr, Type type, Type target) {
        Expr result;

        if(type.equals(target))
            result = expr;
        else
        if(type instanceof NumericType && target instanceof NumericType) {
            NumericType ntype=(NumericType) type;
            NumericType ntarget=(NumericType) target;
            if(ntype.getPrecision() < ntarget.getPrecision()) {
                if(ntype.equals(Type.INT) && ntarget.equals(Type.FLOAT)) {
                    result = new IntToFloatExpr(expr);
                }
                else
                    throw new InternalCompilationStorkException("Asked to coerce unrecognized numeric type(s) `"+type.getText()+"' to `"+target.getText()+"'"); 
            }
            else
                throw new InternalCompilationStorkException("Asked to coerce type `"+type.getText()+"' to less precise type `"+target.getText()+"'");
        }
        else
            throw new InternalCompilationStorkException("Asked to perform impossible coercion of type `"+type.getText()+"' to type `"+target.getText()+"'");
        
        return result;
    }
    
    public Expr translate(BinaryOperatorExprAST expr) {
        Expr result;
        
        Type ltype=computeType(expr.getLeft());
        Type rtype=computeType(expr.getRight());
        Type resultType=computeBinaryOperatorResultType(ltype, rtype);
        
        Map<Type,BinaryOperatorExpr.Operator> runtimeOperators=binaryOperators.get(expr.getOperator());
        if(runtimeOperators == null)
            throw new InternalCompilationStorkException("Unrecognized BinaryOperatorExpr.Operator: "+expr.getOperator());
        
        if(runtimeOperators.containsKey(resultType)) {
            BinaryOperatorExpr.Operator operator=runtimeOperators.get(resultType);
            if(operator != null) {
                result = new BinaryOperatorExpr(
                    operator,
                    coerce(expr.getLeft().translate(this), ltype, resultType),
                    coerce(expr.getRight().translate(this), rtype, resultType));
            }
            else
                throw new InternalCompilationStorkException("Implicit binary operator not allowed: "+expr.getOperator());
        }
        else
            throw new NoSuchOperatorException(expr.getOperator().getText(), ltype, rtype);

        return result;
    }

Per its name, a BinaryOperatorExprAST has two operands, left and right. The translate() method computes the type for each operand, and then determines what the type of the result should be, per coercion. Then, it determines if (a) the given operator is defined; and (b) is defined on the given result type. (All the different defined operators and types are encoded in the given static data structures.) As long as the given operator is defined on the given result type, the resulting is a BinaryOperatorExpr using the corresponding runtime operator, with the translated left and right operands promoted to the given result type. If the given operator is not defined on the given result type, that indicates that the given expression is not type consistent, and an error is raised.

These corresponding Expr objects look — and are! — very similar to their corresponding ExprAST. However, there is one important difference: ExprAST objects only understand the syntactic relationships among the tokens in program text; Expr objects understand the semantic relationships among the different expressions in the program text. In particular, Expr objects understand types, which means (at this point in Stork’s implementation) that the Exprs understand whether they’re working on floating-point or integer objects.

This translated Expr format is what the back end will execute. (This format is also the intermediate representation that the middle end will work on.)

Stork’s Back End

The IR Expr objects are very simple in their construction:

public abstract class Expr {
    public abstract Object eval();
}

Runtime values in Stork are representing using native Java objects, which is useful for both simplicity and debugging during implementation. (Not that I ever make mistakes that need debugging.)

Currently, every ExprAST class has a corresponding Expr class:

public class IntExpr extends Expr {
    private long value;
    
    public IntExpr(long value) {
        this.value = value;
    }
    
    public long getValue() {
        return value;
    }

    public Object eval() {
        return Long.valueOf(getValue());
    }
}

public class FloatExpr extends Expr {
    private double value;
    
    public FloatExpr(double value) {
        this.value = value;
    }
    
    public double getValue() {
        return value;
    }

    public Object eval() {
        return Double.valueOf(getValue());
    }
}

…and so on. However, there are some additional expression types, like IntToFloatExpr:

public class IntToFloatExpr extends Expr {
    private Expr inner;
    
    public IntToFloatExpr(Expr inner) {
        this.inner = inner;
    }
    
    public Expr getInner() {
        return inner;
    }
    
    public Object eval() {
        Long value=(Long) getInner().eval();
        return Double.valueOf(value.doubleValue());
    }
}

This is because while AST classes correspond to syntactical constructs, Expr objects correspond to evaluation constructs.

Downloading and Running Stork

Stork is designed to be easy to download, build, and run. As along as you have Git, Maven, and a JDK installed, you can get this lesson of Stork up and running on your computer like this:

$ git clone https://github.com/sigpwned/stork.git
$ cd stork
$ git checkout lesson3
$ mvn compile
$ cd target/classes
$ java com.sigpwned.stork.Stork
>>> 1+2*(3+4)
15
>>> 5%3.0
ERROR: No operator `%' that accepts types: Int, Float
>>> ^D
$

Next Lesson: Variables

The next step in building Stork will be to add named, typed variables to Stork. This lesson will also cover some interesting static analysis to guarantee the correctness of variable references, so it will include some more interesting static analysis as well.

Questions

  1. Why is there a IntToFloatExpr, but no corresponding IntToFloatExprAST?
  2. Is the result of the expression 1+2.0 an int or a float? Why?