Welcome to the Tsonnet series!
If you're just joining, you can check out how it all started in the first post of the series.
In the previous post, Tsonnet has been augmented with property-based tests:
This allows us to add new features to the language while being relatively certain that things will continue working in a myriad of scenarios.
Now it's time to take the first step towards type safety in Tsonnet.
Before starting
Before continuing, I want to clean up lib/tsonnet.ml
. There's too much code in there and I prefer that it stays as a clean orchestrator, therefore, I'm moving the interpreter related functions to its own module.
I won't bother copying the change here, but you can see the diff here.
The code now lives in a new file: lib/interpreter.ml
. This makes lib/tsonnet.ml
much more comprehensible:
Semantic analysis
While Tsonnet's feature set is still modest -- primitive types, locals, and binary/unary operations -- there are semantic issues we need to address that go beyond simple type verification.
The most pressing issue is detecting invalid binding cycles. Currently, Tsonnet can get stuck in infinite loops during evaluation -- something a proper semantic analysis phase should catch. Consider this example:
// samples/semantics/valid_binding_cycle.jsonnet
local a = b;
local b = d;
local c = a;
local d = 1;
a
Jsonnet evaluates top to bottom, and if a variable is not found predefined earlier, it fails immediately:
$ jsonnet samples/semantics/valid_binding_cycle.jsonnet
samples/semantics/valid_binding_cycle.jsonnet:1:11-12 Unknown variable: b
local a = b;
As we explored in Tsonnet 16 - Late binding and Jsonnet inconsistency, Jsonnet's evaluation model prevents it from resolving forward references, despite claiming lazy evaluation.
Tsonnet supports late binding, and this binding cycle, although seemingly confusing, is valid:
dune exec -- tsonnet samples/semantics/valid_binding_cycle.jsonnet
1
Here's an actually invalid case:
// samples/semantics/invalid_binding_cycle.jsonnet
local a = c;
local b = d;
local c = a;
local d = 1;
a
Lazy evaluation cannot help here, because there's an infinite cycle.
Jsonnet does behave correctly here, due to its eagerly evaluated binding nature:
jsonnet samples/semantics/invalid_binding_cycle.jsonnet
samples/semantics/invalid_binding_cycle.jsonnet:1:11-12 Unknown variable: c
local a = c;
In Tsonnet though, we have an infinite loop -- this should be fixed by semantic analysis performed by the type checker.
A minimal type checker
Before evaluating our code, we'll type-check our program:
See how clean it is to add intermediate steps to the compiler now?!
The refactored architecture makes adding compiler phases remarkably clean -- this is why separating concerns matters.
I'm also adding a new helper function to Error
module to reuse across different modules. This is a pretty common pattern we've encountered so far in the interpreter and now will be shared with the type checker:
let error_at pos = fun msg -> trace msg pos >>= error
The Type.check
function serves two purposes: it translates expressions to their corresponding types and performs semantic validation. Here's the core type system:
These types mirror Tsonnet's values but exist purely for compile-time analysis. The Lazy
constructor is particularly important -- it represents unevaluated expressions that we'll resolve when needed, preserving our lazy evaluation semantics during type checking.
We need a bucket to store our types, our environment. Here we are re-using the Env
module, but instead of having Env of string * expr
, we'll have instead Env of string * tsonnet_type
.
The translate
function recursively descents expressions until it reaches the bottom -- the concrete (primitive) types.
This code is straightforward to understand. A crucial detail: we can't immediately type-check local bindings because of lazy evaluation. Instead, we store them as Lazy
expressions in the environment. Only when an Ident
forces evaluation do we recursively type-check the referenced expression.
This is enough to make the valid binding case to pass:
dune exec -- tsonnet samples/semantics/valid_binding_cycle.jsonnet
1
But this actually breaks existing code, because we aren't type-checking some literals like arrays and operations yet:
dune exec -- tsonnet samples/literals/array.jsonnet
Not yet implemented
dune exec -- tsonnet samples/binary_operations.jsonnet
Not yet implemented
Recursing on the rest of types
We need a Tany
type for cases where precise type inference isn't possible or necessary. This commonly occurs with heterogeneous arrays -- while JSON allows mixed-type arrays, we'll track them as Tany
rather than creating complex union types for now:
And a to_string
function to make error messages more user-friendly:
Most of the types are direct translations. Product types such as array and object are the ones carrying more complexity:
All primitive types are now covered, and invalid binary and unary operations can now be checked by the type checker too.
We take a pragmatic approach to array typing: if all elements have the same type, we infer a homogeneous array type. Otherwise, we fall back to Tany
.
This works great for all tests but the invalid cyclic reference.
Checking cyclic references
Now for the main event: detecting those infinite cycles that could crash our interpreter.
After adding all local bindings to the environment, we need to validate that none create circular dependencies. The algorithm walks the dependency graph, maintaining a "seen" list to detect cycles:
The check_cyclic_refs
function will retrieve the variables from the environment and check if they've been seen before, if yes it means we have a cyclic reference:
The cycle detection must recursively examine every expression because lazy evaluation means cycles can hide deep within complex expressions. A variable might reference another variable buried inside an array or object literal, creating indirect cycles.
Now the invalid binding is caught by the compiler:
dune exec -- tsonnet samples/semantics/invalid_binding_cycle.jsonnet
samples/semantics/invalid_binding_cycle.jsonnet:1:10 Cyclic reference found for c
1: local a = c;
^^^^^^^^^^^^
And many other invalid cases. Such as:
Binding to itself:
// samples/semantics/invalid_binding_itself.jsonnet
local a = a;
a
dune exec -- tsonnet samples/semantics/invalid_binding_itself.jsonnet
samples/semantics/invalid_binding_itself.jsonnet:1:10 Cyclic reference found for a
1: local a = a;
^^^^^^^^^^^^
Invalid cyclic binding in array:
// samples/semantics/invalid_binding_cycle_array.jsonnet
local a = b,
b = c,
c = (local d = 1, e = 2; [a, b, c, d, e]);
b
dune exec -- tsonnet samples/semantics/invalid_binding_cycle_array.jsonnet
samples/semantics/invalid_binding_cycle_array.jsonnet:3:30 Cyclic reference found for a
3: c = (local d = 1, e = 2; [a, b, c, d, e]);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Invalid cyclic binding in object:
// samples/semantics/invalid_binding_cycle_object.jsonnet
local obj = { a: 1, b: 2, c: obj };
obj
dune exec -- tsonnet samples/semantics/invalid_binding_cycle_object.jsonnet
samples/semantics/invalid_binding_cycle_object.jsonnet:1:29 Cyclic reference found for obj
1: local obj = { a: 1, b: 2, c: obj };
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Invalid cyclic binding in binary operation:
// samples/semantics/invalid_binding_cycle_binop.jsonnet
local a = b;
local b = a + 1;
a
dune exec -- tsonnet samples/semantics/invalid_binding_cycle_binop.jsonnet
samples/semantics/invalid_binding_cycle_binop.jsonnet:1:10 Cyclic reference found for b
1: local a = b;
^^^^^^^^^^^^
Invalid cyclic binding in unary operation:
// samples/semantics/invalid_binding_cycle_unaryop.jsonnet
local a = b;
local b = ~a;
a
dune exec -- tsonnet samples/semantics/invalid_binding_cycle_unaryop.jsonnet
samples/semantics/invalid_binding_cycle_unaryop.jsonnet:1:10 Cyclic reference found for b
1: local a = b;
^^^^^^^^^^^^
Conclusion
As of the time of writing this, Tsonnet remains a relatively small language, featuring primitive types, local bindings, binary and unary operations. The type checker represents a significant improvement that will prove even more valuable as we introduce upcoming features.
So far, when it comes to type-checking, we need only one environment -- the value environment -- to translate inferred types from expressions. At some point, type annotations will be added, thus requiring a second environment: the type environment. I'm deferring this complexity until we actually need it -- it will become relevant for functions, typed objects, and likely any disambiguation scenarios when the compiler can't infer the type.
I've been enjoying OCaml since I started Tsonnet; however, introducing the type checker made me realize the language was truly an excellent choice for writing a compiler. The pattern matching and algebraic data types make expressing type relationships and transformations remarkably clean and intuitive.