A New Kind of Parser Method
Source: Dev.to
Introduction
Pratt parsing is a powerful technique, but concepts such as binding power and NUD can be hard to grasp.
The following describes an alternative parsing method that aims to be simpler while offering comparable capabilities.
General Approach
- Tokenise the expression into a flat list of terms, operators, and grouping symbols.
- Identify operator precedence (e.g.,
*and/have higher precedence than+and-). - Iteratively combine adjacent terms according to precedence, building a tree structure.
- Handle special token types (unary vs. binary operators, numeric literals, member access, function calls) in a separate merging phase before the main tree construction.
Example
Consider the expression:
2 + foo(4, 5) * 4 / 23
Token List
2, foo(4, 5), *, 4, /, 23, +
Tree Construction
-
First pass (multiplication)
- Combine
foo(4, 5)and4with*→[foo(4, 5), 4]{*}
- Combine
-
Second pass (division)
- Combine the result of step 1 with
23using/→[[foo(4, 5), 4]{*}, 23]{/}
- Combine the result of step 1 with
-
Final pass (addition)
- Combine
2with the result of step 2 using+→[2, [[foo(4, 5), 4]{*}, 23]{/}]{+}
- Combine
Resulting AST (in pseudo‑code)
add(
2,
divide(
multiply(
foo(4, 5),
4
),
23
)
)
Handling Unary/Binary Operators and Special Tokens
The method distinguishes several core token types:
| Symbol | Meaning |
|---|---|
-u | Unary minus |
-b | Binary minus |
.n | Numeric/decimal point |
.m | Member access (a.b) |
(n | Numeric/grouping opening parenthesis |
(c | Function‑call opening parenthesis |
Merging Phase
Before building the AST, adjacent tokens are merged to form higher‑level constructs:
- Unary minus with its operand:
-u x→-x - Member access:
a .m b→a.b - Function call with member access:
foo .m bar (c 23, 24 )→foo.bar(23, 24)
This mini‑parser deals only with functions and a few mix‑fix operators, leaving the rest of the expression to the main tree‑building algorithm.
Final Remarks
After the merging phase, the parser works on a list containing terms, operators, and grouping symbols.
Recursive parsing of parentheses yields a fully‑formed abstract syntax tree (AST), which can then be used for evaluation, code generation, or further analysis.