Stork, An Example Programming Language, Lesson 4: Variables

  • strict warning: Non-static method view::load() should not be called statically in /home/sigpboo7/public_html/modules/views/views.module on line 842.
  • strict warning: Declaration of views_handler_field_comment::init() should be compatible with views_handler_field::init(&$view, $options) in /home/sigpboo7/public_html/modules/views/modules/comment/views_handler_field_comment.inc on line 50.
  • strict warning: Declaration of views_plugin_row::options_validate() should be compatible with views_plugin::options_validate(&$form, &$form_state) in /home/sigpboo7/public_html/modules/views/plugins/views_plugin_row.inc on line 135.
  • strict warning: Declaration of views_plugin_row::options_submit() should be compatible with views_plugin::options_submit(&$form, &$form_state) in /home/sigpboo7/public_html/modules/views/plugins/views_plugin_row.inc on line 135.

Welcome back!

For those of you just joining, Stork is an example programming language “course” designed to demonstrate the principles of programming language implementation in 10 “lessons.” This is Lesson 4 in a series of 10, so if you’re just joining now, you may want to take a peek at lessons 1, 2, and 3 to gear up a bit for this post.

Lesson 4 adds variables to Stork, which involves adding statements in addition to the expressions already in the language. The addition of variables provides fodder for some additional (and more interesting) static analysis as well. At the end of this lesson, Stork will be a working interpreter for simple numerical expressions with support for variables. (The variables will become much more interesting over the course of the next couple of lessons, which will add support for functions and control structures.)

The code for this lesson is available on github under the tag lesson4, and you can follow the discussion about this lesson on reddit. As a quick preview, though, Stork is getting cool:

$ java com.sigpwned.stork.Stork
>>> var x:Int
>>> x
ERROR: Variable may not have been initialized: x
>>> x = 1+2*(3+4)
15
>>> var y:Float
>>> y = x
15.0
>>> x = y
ERROR: Will not coerce to less precise type: Float -> Int
>>> x = (cast Int) y
15
>>> x+y
30.0
>>> ^D
$

Compile-Time Statement Support

Lesson 2 covered the first version of Stork’s parser, which only parsed simple numerical expressions. This lesson adds support for a different kind of syntactic construct, statements. Whereas expressions are the units of computation that represent simple mathematical equations, statements indicate the sequential execution order of expressions and other more sophisticated program semantics, like defining a new variable or (in later lessons) conditions and loops.

Statement Terminators: Significant Whitespace

Statements are separated by a statement terminator; in Stork, that delimiter can be either a semicolon (;) or a newline. Supporting newlines as statement terminators requires that some whitespace (namely newlines) be treated as significant. To this end, the Token class was changed slightly by adding a flags field that tracks the kinds of whitespace that precede a Token:

public class Token {
    public static enum Flag {
        NEWLINE;
    }

    // ... 
    
    private Set<Flag> flags;
    
    public Token(Type type, int offset, String text) {
        // ...
        this.flags = EnumSet.noneOf(Flag.class);
    }

    // ...

    protected Set<Flag> getFlags() {
        return flags;
    }
    
    public void setFlag(Flag flag) {
        getFlags().add(flag);
    }
    
    public boolean hasFlag(Flag flag) {
        return getFlags().contains(flag);
    }
}

Next, the Tokenizer whitespace-consuming method w() was modified to track the types of whitespace that were consumed, and the peekToken() method uses that data to determine if the parsed Token is preceded by whitespace including a newline:

    public Token peekToken() throws IOException, ParseException {
        if(next == null) {
            int w=w();

            // ...            
            
            if((w&NEWLINE) != 0)
                next.setFlag(Token.Flag.NEWLINE);
        }
        return next;
    }

    private static int SPACE=  1 << 0;
    private static int TAB=    1 << 1;
    private static int NEWLINE=1 << 2;
    protected int w() throws IOException {
        int result=0;
        while(Character.isWhitespace(getInput().peek())) {
            int ch=getInput().read();
            if(ch == ' ')
                result = result | SPACE;
            else
            if(ch == '\t')
                result = result | TAB;
            else
            if(ch=='\n' || ch=='\r')
                result = result | NEWLINE;
        }
        return result;
    }

Parsing Statements

A stmt() method was also added to the Parser that actually parses the statements. Fortunately, parsing statements is very similar to parsing expressions. (On the whole, parsers are parsers.) This lesson defines two types of statements: a variable declaration statement of the form var name:type=value, and an evaluation expression that is simply an expression. The statement parser implementation added a few keywords and a new stmt() method that parses a statement as indicated by the next token in the stream:

    public StmtAST stmt() throws IOException, ParseException {
        StmtAST result;
        
        if(getTokens().peekType() == Token.Type.VAR) {
            getTokens().consumeType(Token.Type.VAR);
            String name=getTokens().consumeType(Token.Type.SYMBOL).getText();
            getTokens().consumeType(Token.Type.COLON);
            TypeExpr type=type();
            
            ExprAST init;
            if(getTokens().peekType() == Token.Type.EQ) {
                getTokens().consumeType(Token.Type.EQ);
                init = expr();
            }
            else
                init = null;
            
            term();
            
            result = new DeclareStmtAST(name, type, init);
        }
        else {
            ExprAST expr=expr();
            term();
            result = new EvalStmtAST(expr);
        }
        
        return result;
    }

If the next token is the var keyword, then stmt() parses a variable declaration statement; otherwise, it parses an evaluation expression.

The term() method referenced above parses a statement terminator by looking for either the end of the file, a semicolon, or a newline (which is detected per the preceding discussion) preceding the token after the statement’s tokens:

    protected void term() throws IOException, ParseException {
        if(getTokens().peekType() == Token.Type.EOF) {
            // EOF qualifies as a statement terminator
        } else
        if(getTokens().peekType() == Token.Type.SEMICOLON) {
            // Semicolons are explicit terminators; consume it.
            getTokens().consumeType(Token.Type.SEMICOLON);
        } else
        if(getTokens().peekToken().hasFlag(Token.Flag.NEWLINE)) {
            // The next token has a newline before it. That's a term.
        }
        else {
            // We need a term and we didn't get one. Just go ahead
            // and consume a `;', which will generate an error since
            // we know the next token is not a `;'.
            getTokens().consumeType(Token.Type.SEMICOLON);
        }
    }

Parsing and Resolving Types

Variable declaration syntax is the first place in Stork that type names are written out explicitly. As a result, a new method type() was added that parses TypeExpr, or a compile-time reference to a runtime type. Currently, only a small number of symbolic types is supported, so that method simply looks for a SYMBOL token:

    protected TypeExpr type() throws IOException, ParseException {
        return new TypeExpr(getTokens().consumeType(Token.Type.SYMBOL).getText());
    }

TypeExpr objects have an eval() method that is used to resolve type names (and later, more complex constructed types) to actual types at compile type. Because the only TypeExpr currently defined is a simple by-name reference, that method is currently just a simple lookup into the current scope Gamma:

    public Type eval(Translator translate) {
        return translate.eval(this);
    }

Other Language Additions

An assignment operator (=) was also added to the BinaryOperatorExprAST to represent both variable initialization and assignment. A new CastExprAST of the form (cast type) expr was also added, which converts a value of one type to another (compatible) type. Currently, it’s only useful to cast a value of type Float to a value of type Int, which does not have an automatic conversion because such a conversion may result in a loss of precision.

Compiling Statements

While parsing these statements is reasonably straightforward, compiling them is more complex. In particular, the Translator visitor class gets a lot more complex with the addition of support for variables.

First, a new class Gamma has been added that tracks what variables are currently defined plus some associated metadata about each variable, namely its type and whether or not it’s been assigned to yet, since referencing an unassigned variable is illegal syntax:

public class Gamma {
    public static class Slot {
        public static enum Flag {
            INITIALIZED;
        }
        
        private Type type;
        private Set<Flag> flags;
        
        protected Slot(Type type) {
            this.type = type;
            this.flags = EnumSet.noneOf(Flag.class);
        }

        public Type getType() {
            return type;
        }
        
        protected Se<Flag> getFlags() {
            return flags;
        }
        
        public boolean hasFlag(Flag flag) {
            return getFlags().contains(flag);
        }
        
        public void setFlag(Flag flag) {
            getFlags().add(flag);
        }
    }
    
    private Map<String,Slot> slots;
    private Map<String,Type> types;
    
    public Gamma() {
        this.slots = new HashMap<String,Slot>();
        this.types = new HashMap<String,Type>();
    }
    
    protected Map<String,Slot> getSlots() {
        return slots;
    }
    
    public Set<String> listSlots() {
        return getSlots().keySet();
    }
    
    public Slot getSlot(String name) {
        Slot result=getSlots().get(name);
        if(result == null)
            throw new InternalCompilationStorkException("No such slot: "+name);
        return result;
    }
    
    public void addSlot(String name, Type type) {
        getSlots().put(name, new Slot(type));
    }
    
    protected Map<String,Type> getTypes() {
        return types;
    }
    
    public Set<String> listTypes() {
        return getTypes().keySet();
    }
    
    public Type getType(String name) {
        Type result=getTypes().get(name);
        if(result == null)
            throw new InternalCompilationStorkException("No such type: "+name);
        return result;
    }
    
    public void addType(String name, Type type) {
        getTypes().put(name, type);
    }
}

The Translator class has been updated in three important ways:

  1. A Gamma instance globe has been added, which represents the global scope
  2. translate() methods have been added for the new ExprAST and StmtAST objects
  3. Two new methods, typeOf and assign, have been added for each ExprAST type that return the type of the ExprAST and handle an assignment to that ExprAST, respectively

The translate() methods for the new ExprAST objects are the best example of static analysis in Stork so far:

    public Expr translate(VarExprAST expr) {
        Expr result;

        if(!getGlobe().listSlots().contains(expr.getName()))
            throw new UndefinedVariableException(expr.getName());
        else
        if(!getGlobe().getSlot(expr.getName()).hasFlag(Gamma.Slot.Flag.INITIALIZED))
            throw new UninitializedVariableException(expr.getName());
        else
            result = new VarExpr(expr.getName());

        return result;
    }

    public Expr translate(CastExprAST expr) {
        Type castType=expr.getType().eval(this);

        Type valueType=expr.getExpr().typeOf(this);

        Expr value=expr.getExpr().translate(this);

        return coerce(value, valueType, castType, true);
    }

The VarExprAST method is a particularly good example, as it detects references to both undefined and uninitialized variables.

The typeOf() methods are also simple, and in some cases familiar from the last lesson’s discussion of type coercion:

    public Type typeOf(BinaryOperatorExprAST expr) {
        Type ltype=expr.getLeft().typeOf(this);
        Type rtype=expr.getRight().typeOf(this);
        return computeBinaryOperatorResultType(ltype, rtype);        
    }

    public Type typeOf(VarExprAST expr) {
        Type result;
        if(getGlobe().listSlots().contains(expr.getName()))
            result = getGlobe().getSlot(expr.getName()).getType();
        else
            throw new UndefinedVariableException(expr.getName());
        return result;
    }

    public Type typeOf(UnaryOperatorExprAST expr) {
        return expr.getChild().typeOf(this);
    }

    public Type typeOf(IntExprAST expr) {
        return Type.INT;
    }

    public Type typeOf(FloatExprAST expr) {
        return Type.FLOAT;
    }

    public Type typeOf(CastExprAST expr) {
        return expr.getType().eval(this);
    }

The assign() methods are essentially trivial because most expressions currently in Stork are not valid l-values. An l-value is any value which is valid as the left-hand side of an assignment. (More intuitively, it’s any value that can receive an assigned value.) In Stork, the only l-value currently defined is a simple variable reference, like x. Examples of l-values from other languages include array index references (x[0]), pointer dereferences (*x), and so on. Stork will receive more valid l-values in later lessons.

For now, though, the assign() methods are exceptionally simple because there’s one “umbrella” method that handles any ExprAST by simply raising an error, and a second method for the only (current) valid l-value, VarExprAST:

    public Expr assign(ExprAST left, ExprAST right) {
        throw new NotAnLValueException(left);
    }

    public Expr assign(VarExprAST left, ExprAST right) {
        Expr result;

        Expr rvalue=right.translate(this);

        if(left instanceof VarExprAST) {
            VarExprAST lvalue=(VarExprAST) left;
            if(!getGlobe().listSlots().contains(lvalue.getName()))
                throw new UndefinedVariableException(lvalue.getName());
            else {
                Type ltype=getGlobe().getSlot(lvalue.getName()).getType();
                result = new VarAssignExpr(lvalue.getName(), coerce(rvalue, right.typeOf(this), ltype, false));
                getGlobe().getSlot(lvalue.getName()).setFlag(Gamma.Slot.Flag.INITIALIZED);
            }
        }
        else
            throw new NotAnLValueException(left);

        return result;
    }

Once these changes have been made, compiling the StatementAST objects themselves is easy:

    public Stmt translate(EvalStmtAST ast) {
        return new EvalStmt(ast.getExpr().translate(this));
    }

    public Stmt translate(DeclareStmtAST ast) {
        String name=ast.getName();
        if(getGlobe().listSlots().contains(name))
            throw new DuplicateVariableException(name);

        Type type=ast.getType().eval(this);
        getGlobe().addSlot(name, type);

        Expr init=ast.getInit()!=null ? ast.getInit().translate(this) : null;
        if(init != null)
            getGlobe().getSlot(name).setFlag(Gamma.Slot.Flag.INITIALIZED);

        return new DeclareStmt(ast.getName(), init);
    }

Translating an EvalStmtAST is essentially the same thing as translating an ExprAST, and translating a DeclareStmtAST is mostly an exercise in making sure the variable declaration is valid in the context of the current Gamma.

Executing Statements

A great deal of work is done during the compilation phase to make executing Statement objects quick and straightforward. The Gamma class has a runtime equivalent, the Scope class:

public class Scope {
    private Map<String,Object> vars;

    public Scope() {
        this.vars = new HashMap<String,Object>();
    }

    protected Map<String,Object> getVars() {
        return vars;
    }

    public Set<String> listVars() {
        return getVars().keySet();
    }

    public void defineVar(String name, Object value) {
        if(listVars().contains(name))
            throw new InternalRuntimeStorkException("Cannot re-define variable: "+name);
        getVars().put(name, value);
    }

    public void setValue(String name, Object value) {
        if(!listVars().contains(name))
            throw new InternalRuntimeStorkException("Cannot assign to undefined variable: "+name);
        getVars().put(name, value);
    }

    public Object getValue(String name) {
        Object result;
        if(!listVars().contains(name))
            throw new InternalRuntimeStorkException("No variable with name: "+name);
        else {
            result = getVars().get(name);
            if(result == null)
                throw new InternalRuntimeStorkException("Variable has no value: "+name);
        }
        return result;
    }
}

The individual Statement objects have very short exec() methods:

    // DeclareStmt
    public Object exec(Scope scope) {
        Object value;
        if(getInit() != null)
            value = getInit().eval(scope);
        else
            value = null;
        scope.defineVar(getName(), value);
        return value;
    }

    // EvalStmt
    public Object exec(Scope scope) {
        return getExpr().eval(scope);
    }

Those few changes represent the core of variable support in Stork.

Downloading and Running Stork

Stork is designed to be easy to download, build, and try out. As along as you have Git, Maven, and a JDK installed, getting this lesson of Stork up and running on your computer is as simple as this:

$ git clone https://github.com/sigpwned/stork.git
$ cd stork
$ git checkout lesson4
$ mvn compile
$ cd target/classes
$ java com.sigpwned.stork.Stork
>>> 1+2*(3+4)
15
>>> ^D
$

The Stork REPL is a great way to try out the new features from this lesson, especially the static analysis:

$ java com.sigpwned.stork.Stork
>>> 1+2*(3+4)
15
>>> var x:Int
>>> x
ERROR: Variable may not have been initialized: x
>>> x = 1+2*(3+4)
15
>>> var x:Int
ERROR: Variable with name already defined: x
>>> var y:Float
>>> y = x
15.0
>>> x = y
ERROR: Will not coerce to less precise type: Float -> Int
>>> x = (cast Int) y
15
>>> x+y
30.0
>>> ^D
$

Next Lesson: Functions

The next step in building Stork will be to add named functions to Stork. This lesson will also cover Stork’s first constructed types (function types), so it will include some more interesting static analysis as well.

Questions

  1. How are expressions and statements different?
  2. Why is there an automatic coercion from Int to Float, but not from Float to Int?
  3. Why isn’t the value 4 an l-value?