This is a great article that I've recommended many times for learning about precedence-climbing, but I found it much more intuitive and concise to derive precedence climbing as the refactoring of a recursive-descent parser in two steps. This is succintly describable as:
1. Notice how all the functions in an RD parser are almost the same: call the next-higher-level, then loop with that call while the current token is at this level. The topmost level is for consuming a terminal or a parenthesised subexpression. That can easily be turned into one parameterised function and a table lookup for the tokens.
2. The first set of recursive calls always goes "to the top", so why not call that directly. This is the 'P' function in the article. The second set of recursive calls are nothing more than finding the right level to process the current token, so jump directly to that level by calling the function directly with that level number instead of repeatedly descending via recursive calls until the current level matches the token. With the current token being of level n, the parameterised RD parser calls expr(1) -> expr(2) -> expr(3) -> ... expr(n) when it could just call expr(n).
I think this is such an elegantly simple algorithm that anyone learning computer science should have implemented at least once along with the usual study of recursive data structures. It's also a very good step towards understanding how a compiler works.
'''Left recursive grammars, such as G, are unsuitable for recursive-descent parsing because a left-recursive production leads to an infinite recursion. While the parser may be partially correct, it may not terminate.'''
I don't see why a left-recursive solution doesn't terminate. Does anyone have an insight to offer, or an example grammar and text on which a left-recursive production doesn't terminate?
> an example grammar and text on which a left-recursive production doesn't terminate
All grammars with left recursion have this problem. The parser has no lookahead (and even if it did, the required lookahead to avoid this is unboundedly large) -- you can't tell, given your input, whether you should recurse or not.
Here's a pretty simple grammar:
S -> A
A -> a
A -> A B (this is left recursion, a rule for producing A that itself begins with A)
B -> b
The language matches the same strings as the regular expression (a|ab+), that is, a single a followed by zero or more b's.
So, suppose you're the parser. You begin with the start rule, which tells you to expect an A. For the A, you have two choices: recur via the rule A -> A B, or stop recurring and process a literal a. You see the input "a". How can you know which to do?
Compare the analogous right-recursive grammar, which matches (a|b+a):
S -> A
A -> a
A -> B A
B -> b
Here, you start out expecting an A. When you see the input "a", you know that the only way to make sense of it is to invoke the rule "A -> a", which is a terminal rule. And when you see the input "b", you know the only thing you can do is invoke the rule "A -> B A", since that is the only rule allowing an A to begin with "b".
edit: replaced regexes involving * with uglier regexes using disjunction and +, since HN doesn't allow asterisks in most normal uses.
I've implemented parsing left recursions with a memoized recursive descent parser generator. It requires hints to tell the generator it is left recursion[1] - but even with this inconvenience, I think it's more convenient than having to refactor a left-recursive grammar into a right recursive one and then flipping the precedence back. I'm still finishing building the first useful parser using it, there may still be bugs.
It works by dynamically refactoring the grammar at each recursion. [2] The dynamic refactoring only happens if left recursion is hinted for that rule.
1. Notice how all the functions in an RD parser are almost the same: call the next-higher-level, then loop with that call while the current token is at this level. The topmost level is for consuming a terminal or a parenthesised subexpression. That can easily be turned into one parameterised function and a table lookup for the tokens.
2. The first set of recursive calls always goes "to the top", so why not call that directly. This is the 'P' function in the article. The second set of recursive calls are nothing more than finding the right level to process the current token, so jump directly to that level by calling the function directly with that level number instead of repeatedly descending via recursive calls until the current level matches the token. With the current token being of level n, the parameterised RD parser calls expr(1) -> expr(2) -> expr(3) -> ... expr(n) when it could just call expr(n).
I think this is such an elegantly simple algorithm that anyone learning computer science should have implemented at least once along with the usual study of recursive data structures. It's also a very good step towards understanding how a compiler works.