LL Parsing

A representative grammar

S → E $

E → E + T | T

T → T * F | F

F → ( E ) | Id

Top-down parsing

For every non-terminal in the grammar we construct a boolean function whose name is that non-terminal. The function returns true if and only if the given non-terminal derives some prefix of the remaining input.
The input string w$ is a valid input if and only if the function S() returns true.
This strategy effectively performs a top-down, depth-first search of the set of all derivations until it discovers a derivation for the input string or determines that no such derivation exists.

Here is a representative parsing function

bool A() {
  for(each production A → X₁X₂⋯X_k in G) {
    n = current input position
    for(i = 1 to k) {
      if(X_i is a terminal)
        if( X_i appears at the front of the input)
          advance the input past X_i
        else
          break;
      else if(!X_i())
          break;
      }
    if(i == k + 1)
      return true;
    else
      set the current input position back to n
    }
  return false;
  }

Eliminating left recursion

The scheme shown above will fail if the grammar G contains a left-recursive rule such as

E → E + T

This causes an infinite recursion as E() tries to call E()

This problem can be resolved by replacing all left-recursive grammar rules with non-recursive equivalents. Here is an algorithm that does this.

Arrange the non-terminals in some order A₁, A₂, …, A_n
for(i = 1 to n) {
  for(j = 1 to i-1) {
    replace each production of the form A_i→A_j γ by the productions
    A_i → δ₁γ | δ₂γ |⋯| δ_kγ where A_j→δ₁ | δ₂ |⋯| δ_k are all productions for A_j
    }
  eliminate the immediate left recursions among productions for A_i
}

Here is an example of how to remove an immediate left recursion. Consider the left-recursive grammar rule

A → A α | β

this can be replaced with the rules

The resulting productions are still recursive, but they are not left-recursive. Top-down parsing techiques handle these productions without a problem.

The grammar shown at the beginning is left-recursive. After applying this algorithm to it we come to a non-left-recursive equivalent.

S → E$

F → ( E ) | id

Optimizing top-down parsing

The top-down parsing function shown above typically has to do a lot of backtracking. Backtracking occurs when the parsing function A() calls another parsing function X_i() recursively and X_i() fails. We can optimize this process by refusing to call X_i() whenever it is clear that the non-terminal X_i can not possibly derive a string whose first symbol is the symbol currently at the front of the input. This technique is called predictive parsing, and represents a significant optimization of the top-down parsing process.

Predictive parsing is typically implemented as a table-driven, iterative algorithm that uses a stack. To construct the parsing table we use the following two functions to construct FIRST and FOLLOW sets for every non-terminal.

Set FIRST(X) {
  if( X is a terminal )
    return {X}
  I = {};
  for( each production X → Y₁Y₂⋯Y_k in G )
    I = I ⋃ FIRST(Y₁);
  if( X → ε is in G )
    I = I ⋃  ε;
  return I;
}

Set FOLLOW(X) {
  I = {};
  for( each production A → αXβ in G )
    I = I ⋃ (FIRST(β) - {ε});
  for( each production A → αX in G )
    I = I ⋃ FOLLOW(A);
  return I;
}

The following procedure constructs the predictive parsing table.

The entries of the table take the form M[A,a] where A is a non-terminal and a is a terminal.
For each production A → α in the grammar do

For each terminal a in FIRST(α), add A → α to M[A,a]
If ε is in FIRST(α), then for each terminal b in FOLLOW(A), add A → α to M[A,b]. If ε is in FIRST(α) and $ is in FOLLOW(A), add A → α to M[A,$].

All table entries without productions are considered error entries.

If we construct a parser table for a grammar and discover that all non-error entries contain a single production, then we say that the grammar is LL(1). The parser table for an LL(1) grammar allows us to use the next symbol in the input to select a single production to explore for every non-terminal we encounter. This is a considerable optimization over the original top-down parsing algorithm, which attempts to explore every possible production for each non-terminal it encounters.

Here is an example of a parsing table created by this algorithm. For the grammar

F → ( E ) | id

we have

LL(1) parsing algorithm

Here now is the complete LL(1) parsing algorithm:

push $ and S on the stack
a = (first symbol of w)
X = S
while(X != $) {
  if(X == a) pop the stack and let a = (next symbol of w)
  else if(X is a terminal) error();
  else if( M[X,a] is empty ) error();
  else {
    output the production M[X,a]
    pop X from stack
    push the right hand side of M[X,a] on the stack in reverse order
    }
  X = (symbol on top of the stack)
  }