Tsonnet #15 - Teaching Tsonnet to remember: adding variables
A deep dive into implementing variable bindings
Welcome to the Tsonnet series!
If you're just joining, you can check out how it all started in the first post of the series.
In the previous post, we added parsing error tracing.
Now we are prepared to expand the syntax and add other constructs to augment the language. It's been a while since the last post, but development didn't halt. Let's resume by adding one special feature: variables -- named local
in Jsonnet.
This will be a nice addition that paves the way for more interesting stuff.
Let's get started with the AST
Naming is hard and I felt that the AST `value` type wasn't expressing it properly. There were things being dealt with as values, like `BinOp`. They are not really values, but expressions.
So I opted to represent everything as an `expr`, as we had right at the beginning.
Since locals can be declared in multiple ways, we need to make it flexible enough to support:
1. Single declaration:
2. Multiple declarations, inline:
3. Multiple declarations, multiline:
4. Scoped declarations:
Plus, we have been dealing with a single `expr` in our programs, and this new set of requirements asks for a sequence of `expr`. Besides that, we need a new `Local` variant to implement locals.
The updated `expr` type looks like this:
`Local` will store a list of identifiers and their respective expressions.
OCaml is expressive enough to allow us to have recursive types, which pair well with pattern matching.
The `dummy_pos` will come in handy later. It's a fake position for cases when this value doesn't matter.
Lexing local(s)
The lexer change is trivial -- just 3 more tokens:
Oh My Parser!
The parser was the one with the most changes. Check it out in its entirety (I explain below):
The parser added the new tokens, but it got revamped in the process:
The `prog` rule now accepts an `exp_seq` since a program is composed of one or more sequential expressions;
Rules such as `number`, `bin_op`, and `unary_op` will be inlined from where they are being used. This helps avoid shift/reduce ambiguities in certain cases.
The `literal` rule is just nice to have, as it helps with readability.
The new `scoped_expr` rule will take care of expressions or expression sequences wrapped in parentheses. A requirement for handling local and object attribute assignments.
Now we also need to differentiate `assignable_expr` from `expr`. Variable declarations cannot be assigned unless they are scoped. This is cleanly captured by `expr`. The use of recursive rule calls makes the grammar quite expressive and easy to handle.
Last but not least, the `var` and `vars` rules take care of parsing locals.
Evaluating the environment
The interpreter needs to be tackled in parts. It's a big one.
The entry point (the `run` function) needs to initiate an empty global environment that will hold the variables during interpretation:
Each recursive call inherits its local `env`.
Here's the `Env` module implementation:
It implements a dictionary using Map -- carefully chosen for its immutability property. As we are going to pass this environment around, it is an important trait to have. If we update the environment in-place, let's say using Hashtbl, we need to undo changes as we leave the scope, and this complicates the implementation and increases the likelihood of bugs A LOT. Neat!
Fun fact: functional programming languages tend to implement immutable data structures as persistent data structures since values are immutable by default (generally), it is safe to share references. It has some nice properties, like thread safety and easier reasoning, to name a few.
Let's get back to the interpreter.
Here's the meatier `interpret` function (split by parts for ease of reasoning):
When evaluating `Ident`, we look up the name in the local environment. If found, we retrieve and `interpret` it. If not found, we raise an undefined variable error.
Here's the new entry, the local token. Since it is a list of variables, we iterate over each entry and add the `expr` to the environment:
The semantics here are important. Each item we iterate will accumulate in a new `env`, which should be passed to the next, adding variables to the scope.
We need to be attentive here to one aspect of Jsonnet. It's lazy, so the `expr` is added to the environment unevaluated. It will be evaluated if and only if eventually needed to generate a result. This is an important characteristic that we need to comply with.
In the end, the accumulated environment is returned, and the program continues.
We have two other new matching types:
Unit simply returns itself -- nothing to do.
Sequences will be interpreted one by one till they end.
The function `interpret_arith_op` is omitted -- it simply does arithmetic operations as before.
In `interpret_concat_op`, we don't need to do any crazy `expr` tricks when concatenating strings anymore:
The resulting string has no position in the source code, so `dummy_pos` it is. Another possibility could be having a type to represent an evaluated string, but we are going to be just fine with it as is -- it's simpler for now.
We can't forget the tests
Here's a new cram test file diff with the new samples and their results:
We can't get away without testing mistyped locals, so here's a sample and its cram test:
Menhir explain and grammar ambiguities
Menhir has a great feature that can be enabled with the flag `--explain` when using its standalone CLI, rather than integrated with dune (like we've been using).
It will output debugging information, especially ambiguities in the grammar, that I've used extensively while adding locals. So much so that it gained its own place in the Makefile:
Conclusion
Phew, what a big write-up! Bindings are stepping stones in any programming language, so they deserve careful attention to get them right!
The diff in its entirety can be seen here (if you want to follow the precise change).
I still want to write about late binding, but I'll leave it to another post.