S → E $
E → E + T | T
T → T * F | F
F → ( E ) | Id
Here is a representative parsing function
bool A() { for(each production A → X1X2⋯Xk in G) { n = current input position for(i = 1 to k) { if(Xi is a terminal) if( Xi appears at the front of the input) advance the input past Xi else break; else if(!Xi()) break; } if(i == k + 1) return true; else set the current input position back to n } return false; }
The scheme shown above will fail if the grammar G contains a left-recursive rule such as
E → E + T
This causes an infinite recursion as E() tries to call E()
This problem can be resolved by replacing all left-recursive grammar rules with non-recursive equivalents. Here is an algorithm that does this.
Arrange the non-terminals in some order A1, A2, …, An for(i = 1 to n) { for(j = 1 to i-1) { replace each production of the form Ai→Aj γ by the productions Ai → δ1γ | δ2γ |⋯| δkγ where Aj→δ1 | δ2 |⋯| δk are all productions for Aj } eliminate the immediate left recursions among productions for Ai }
Here is an example of how to remove an immediate left recursion. Consider the left-recursive grammar rule
A → A α | β
this can be replaced with the rules
The resulting productions are still recursive, but they are not left-recursive. Top-down parsing techiques handle these productions without a problem.
The grammar shown at the beginning is left-recursive. After applying this algorithm to it we come to a non-left-recursive equivalent.
S → E$
F → ( E ) | id
The top-down parsing function shown above typically has to do a lot of backtracking. Backtracking occurs when the parsing function A() calls another parsing function Xi() recursively and Xi() fails. We can optimize this process by refusing to call Xi() whenever it is clear that the non-terminal Xi can not possibly derive a string whose first symbol is the symbol currently at the front of the input. This technique is called predictive parsing, and represents a significant optimization of the top-down parsing process.
Predictive parsing is typically implemented as a table-driven, iterative algorithm that uses a stack. To construct the parsing table we use the following two functions to construct FIRST and FOLLOW sets for every non-terminal.
Set FIRST(X) { if( X is a terminal ) return {X} I = {}; for( each production X → Y1Y2⋯Yk in G ) I = I ⋃ FIRST(Y1); if( X → ε is in G ) I = I ⋃ ε; return I; } Set FOLLOW(X) { I = {}; for( each production A → αXβ in G ) I = I ⋃ (FIRST(β) - {ε}); for( each production A → αX in G ) I = I ⋃ FOLLOW(A); return I; }
The following procedure constructs the predictive parsing table.
If we construct a parser table for a grammar and discover that all non-error entries contain a single production, then we say that the grammar is LL(1). The parser table for an LL(1) grammar allows us to use the next symbol in the input to select a single production to explore for every non-terminal we encounter. This is a considerable optimization over the original top-down parsing algorithm, which attempts to explore every possible production for each non-terminal it encounters.
Here is an example of a parsing table created by this algorithm. For the grammar
F → ( E ) | id
we have
Here now is the complete LL(1) parsing algorithm:
push $ and S on the stack a = (first symbol of w) X = S while(X != $) { if(X == a) pop the stack and let a = (next symbol of w) else if(X is a terminal) error(); else if( M[X,a] is empty ) error(); else { output the production M[X,a] pop X from stack push the right hand side of M[X,a] on the stack in reverse order } X = (symbol on top of the stack) }