Debugging shift-reduce conflicts: Lessons learned building Tsonnet's parser

Sometimes the best way to understand parsing conflicts is to watch your parser make mathematically wrong decisions.

Sep 01, 2025

Building Tsonnet has been a wild ride of compiler adventures, and one thing that kept tripping me up was those cryptic shift-reduce conflict warnings that menhir would throw at me. You know the ones:

Warning: 2 shift/reduce conflicts were arbitrarily resolved.

Arbitrarily resolved? That sounds threatening. What exactly is menhir doing to my beautiful grammar, and why should I care?

Turns out, understanding these conflicts is crucial for anyone building parsers. Let me walk you through what I learned by building a simple arithmetic parser that demonstrates exactly what goes wrong -- and how to fix it.

two white arrows pointing in opposite directions on asphalt — Photo by Claudio Schwarz on Unsplash

What is shifting and reducing anyway?

Before we dive into conflicts, let's understand what shifting and reducing actually mean. When your menhir-generated parser is running, it maintains two key data structures:

A stack (holding parsing states and partially-built AST nodes)
An input stream (the tokens from ocamllex)

Shifting means: "Take the next token from the input stream and push it onto the stack, along with a new parser state."

Let's trace through parsing 2 + 3 * 4. Our ocamllex tokenizer produces: [INT(2); PLUS; INT(3); TIMES; INT(4)]

Here's what happens during shifting:

(* Initial state *)
Stack: [State0]
Input: [INT(2); PLUS; INT(3); TIMES; INT(4)]

(* Shift INT(2) *)
Stack: [State0; INT(2), StateX]  
Input: [PLUS; INT(3); TIMES; INT(4)]

(* Shift PLUS *)
Stack: [State0; INT(2), StateX; PLUS, StateY]
Input: [INT(3); TIMES; INT(4)]

The exact state numbers depend on how menhir generates the state machine for your specific grammar.

The shift action says: "I'm in the middle of parsing some rule(s), and I need to consume this next token to make progress."

Reducing is the flip side: "I've collected enough tokens to complete a grammar rule, so let me pop items from the stack and build an AST node."

(* After seeing INT(2), we could reduce immediately *)
Stack: [State0; INT(2), StateX] -> [State0; expr(Int 2), StateX]

Building a conflict-prone parser

Let's create a complete example that shows these conflicts in action. I'll give you all the code needed to reproduce this:

First, the dependencies (install with opam install menhir).

Then, our files:

dune-project:

(lang dune 3.0)

(using menhir 3.0)

(package
 (name arithmetic)
 (depends ocaml dune menhir))

lib/ast.ml:

type expr =
  | Int of int
  | Add of expr * expr
  | Mul of expr * expr

let rec string_of_expr = function
  | Int n -> string_of_int n
  | Add (e1, e2) -> Printf.sprintf "(%s + %s)" (string_of_expr e1) (string_of_expr e2)
  | Mul (e1, e2) -> Printf.sprintf "(%s * %s)" (string_of_expr e1) (string_of_expr e2)

let rec eval = function
  | Int n -> n
  | Add (e1, e2) -> eval e1 + eval e2
  | Mul (e1, e2) -> eval e1 * eval e2

lib/lexer.mll:

{
  open Parser

  exception SyntaxError of string
}

let white = [' ' '\t']
let newline = '\r' | '\n' | "\r\n"
let digit = ['0'-'9']
let int = digit+

rule token = parse
  | white+     { token lexbuf }
  | newline    { Lexing.new_line lexbuf; token lexbuf }
  | int        { INT (int_of_string (Lexing.lexeme lexbuf)) }
  | '+'        { PLUS }
  | '*'        { TIMES }
  | '('        { LPAREN }
  | ')'        { RPAREN }
  | eof        { EOF }
  | _          { raise (SyntaxError ("Unexpected char: " ^ Lexing.lexeme lexbuf)) }

lib/parser.mly (the conflict-prone version):

%{
  open Ast
%}

%token <int> INT
%token PLUS TIMES
%token LPAREN RPAREN
%token EOF

%start <Ast.expr> main

%%

main:
  | expr EOF { $1 }

(* This grammar has shift/reduce conflicts! *)
expr:
  | INT { Int $1 }
  | expr PLUS expr { Add ($1, $3) }
  | expr TIMES expr { Mul ($1, $3) }
  | LPAREN expr RPAREN { $2 }

lib/dune:

(library
 (public_name arithmetic)
 (name arithmetic))

(ocamllex lexer)

(menhir
 (modules parser)
 (flags --dump))

bin/main.ml:

open Arithmetic

let parse_string s =
  let lexbuf = Lexing.from_string s in
  try
    Ok (Parser.main Lexer.token lexbuf)
  with
  | Lexer.SyntaxError msg -> Error ("Lexer error: " ^ msg)
  | Parser.Error ->
    let pos = Lexing.lexeme_start_p lexbuf in
    Error (Printf.sprintf "Parser error at line %d, character %d" 
           pos.pos_lnum (pos.pos_cnum - pos.pos_bol))

let test_expressions = [
  "2 + 3 * 4";
  "2 * 3 + 4";
  "(2 + 3) * 4";
  "2 + (3 * 4)";
  "1 + 2 + 3";
  "1 * 2 * 3";
]

let () =
  Printf.printf "=== Arithmetic Expression Parser ===\n\n";
  List.iter (fun expr ->
    Printf.printf "Input: %s\n" expr;
    match parse_string expr with
    | Ok ast ->
      Printf.printf "  Parsed as: %s\n" (Ast.string_of_expr ast);
      Printf.printf "  Evaluates to: %d\n" (Ast.eval ast);
    | Error msg ->
      Printf.printf "  Error: %s\n" msg;
    Printf.printf "\n"
  ) test_expressions

The moment of truth

Build and run this:

dune build
Warning: 2 states have shift/reduce conflicts.
Warning: 4 shift/reduce conflicts were arbitrarily resolved.

Here's what you'll get:

dune exec ./bin/main.exe
Input: 2 + 3 * 4
  Parsed as: (2 + (3 * 4))
  Evaluates to: 14

Input: 2 * 3 + 4
  Parsed as: (2 * (3 + 4))
  Evaluates to: 14

Wait, what?! That second result is mathematically wrong. It should be ((2 * 3) + 4) = 10, not (2 * (3 + 4)) = 14.

Breaking down the conflict

When parsing 2 * 3 + 4, after seeing 2 * 3, menhir faces a choice:

Shift the +: Keep building, eventually getting 2 * (3 + 4)
Reduce the *: Finish multiplication first, eventually getting (2 * 3) + 4

Menhir consistently chose to shift, making everything right-associative. It resolved the conflict by picking a strategy, but not necessarily the correct one for arithmetic.

The conflict report in _build/default/lib/parser.conflicts shows exactly what happened:

** Conflict (shift/reduce) in state 8.
** Token involved: PLUS
** This state is reached from main after reading:

    expr TIMES expr

** The derivations that appear below have the following common factor:
** (The question mark symbol (?) represents the spot where the derivations begin to differ.)

    main
    expr EOF
    expr (?)

** In state 8, looking ahead at PLUS, reducing production
** expr -> expr TIMES expr
** is permitted because of the following sub-derivation:

    expr PLUS expr
    expr PLUS expr TIMES expr // lookahead token appears

** In state 8, looking ahead at PLUS, shifting is permitted
** because of the following sub-derivation:

    expr PLUS expr
    expr TIMES expr PLUS expr // lookahead token appears

Menhir is telling us: "I saw expr TIMES expr and now I see a PLUS. I could either finish the multiplication (reduce) or keep reading (shift). I chose to shift."

The fix: precedence declarations

Here's where the magic happens. Update your parser with precedence rules:

%{
  open Ast
%}

%token <int> INT
%token PLUS TIMES
%token LPAREN RPAREN
%token EOF

%start <Ast.expr> main

(* Precedence declarations: lower precedence first *)
%left PLUS
%left TIMES

%%

main:
  | expr EOF { $1 }

expr:
  | INT { Int $1 }
  | expr PLUS expr { Add ($1, $3) }
  | expr TIMES expr { Mul ($1, $3) }
  | LPAREN expr RPAREN { $2 }

The precedence declarations tell menhir:

%left PLUS: + is left-associative with precedence level 1
%left TIMES: * is left-associative with precedence level 2 (higher)

Higher precedence = tighter binding, so * binds tighter than +.

Now rebuild and run:

dune build  # No more conflict warnings!

dune exec ./bin/main.exe
Input: 2 + 3 * 4
  Parsed as: (2 + (3 * 4))    # Still correct
  Evaluates to: 14

Input: 2 * 3 + 4
  Parsed as: ((2 * 3) + 4)    # Now correct!
  Evaluates to: 10            # Right answer!

Conclusion

This is what I wish someone had told me when I started working on Tsonnet: shift-reduce conflicts aren't bugs -- they're missing specifications.

Our grammar was ambiguous, and precedence declarations make it unambiguous. When menhir encounters our earlier conflict, instead of arbitrarily choosing, it now compares precedence:

TIMES has higher precedence than PLUS
So reduce the TIMES first: ((2 * 3) + 4)

While building Tsonnet, I ran into these conflicts constantly. Object field(s) evaluation, variable declaration(s), array indexing -- they all create similar ambiguities. The lesson? Don't ignore those menhir warnings. Each conflict represents a place where your language's semantics are unclear.

Sometimes the "arbitrary" resolution happens to work by accident (like our 2 + 3 * 4 case), but sometimes it produces completely wrong results (like our 2 * 3 + 4 case). Better to be explicit about what you want.

The next time menhir tells you about shift-reduce conflicts, don't panic. It's not breaking your parser -- it's asking for clarification about your language's design. Take the time to understand what the conflicts mean and resolve them explicitly with precedence declarations.

Building a compiler? Check out the Tsonnet series where I document the complete journey from zero to a working Jsonnet-compatible interpreter. It's been quite the parsing adventure!

Bit Maybe Wise

Discussion about this post