A representative grammar

SE $

EE + T | T

TT * F | F

F → ( E ) | Id

Top-down parsing

  1. For every non-terminal in the grammar we construct a boolean function whose name is that non-terminal. The function returns true if and only if the given non-terminal derives some prefix of the remaining input.
  2. The input string w$ is a valid input if and only if the function S() returns true.
  3. This strategy effectively performs a top-down, depth-first search of the set of all derivations until it discovers a derivation for the input string or determines that no such derivation exists.

Here is a representative parsing function

bool A() {
  for(each production AX1X2Xk in G) {
    n = current input position
    for(i = 1 to k) {
      if(Xi is a terminal)
        if( Xi appears at the front of the input)
          advance the input past Xi
        else
          break;
      else if(!Xi())
          break;
      }
    if(i == k + 1)
      return true;
    else
      set the current input position back to n
    }
  return false;
  }

Eliminating left recursion

The scheme shown above will fail if the grammar G contains a left-recursive rule such as

EE + T

This causes an infinite recursion as E() tries to call E()

This problem can be resolved by replacing all left-recursive grammar rules with non-recursive equivalents. Here is an algorithm that does this.

Arrange the non-terminals in some order A1, A2, …, An
for(i = 1 to n) {
  for(j = 1 to i-1) {
    replace each production of the form AiAj γ by the productions
    Aiδ1γ | δ2γ |⋯| δkγ where Ajδ1 | δ2 |⋯| δk are all productions for Aj
    }
  eliminate the immediate left recursions among productions for Ai
}

Here is an example of how to remove an immediate left recursion. Consider the left-recursive grammar rule

AA α | β

this can be replaced with the rules

The resulting productions are still recursive, but they are not left-recursive. Top-down parsing techiques handle these productions without a problem.

The grammar shown at the beginning is left-recursive. After applying this algorithm to it we come to a non-left-recursive equivalent.

SE$

F → ( E ) | id

Optimizing top-down parsing

The top-down parsing function shown above typically has to do a lot of backtracking. Backtracking occurs when the parsing function A() calls another parsing function Xi() recursively and Xi() fails. We can optimize this process by refusing to call Xi() whenever it is clear that the non-terminal Xi can not possibly derive a string whose first symbol is the symbol currently at the front of the input. This technique is called predictive parsing, and represents a significant optimization of the top-down parsing process.

Predictive parsing is typically implemented as a table-driven, iterative algorithm that uses a stack. To construct the parsing table we use the following two functions to construct FIRST and FOLLOW sets for every non-terminal.

Set FIRST(X) {
  if( X is a terminal )
    return {X}
  I = {};
  for( each production XY1Y2Yk in G )
    I = I ⋃ FIRST(Y1);
  if( Xε is in G )
    I = I ⋃  ε;
  return I;
}

Set FOLLOW(X) {
  I = {};
  for( each production AαXβ in G )
    I = I ⋃ (FIRST(β) - {ε});
  for( each production AαX in G )
    I = I ⋃ FOLLOW(A);
  return I;
}

The following procedure constructs the predictive parsing table.

  1. The entries of the table take the form M[A,a] where A is a non-terminal and a is a terminal.
  2. For each production Aα in the grammar do
    1. For each terminal a in FIRST(α), add Aα to M[A,a]
    2. If ε is in FIRST(α), then for each terminal b in FOLLOW(A), add Aα to M[A,b]. If ε is in FIRST(α) and $ is in FOLLOW(A), add Aα to M[A,$].
  3. All table entries without productions are considered error entries.

If we construct a parser table for a grammar and discover that all non-error entries contain a single production, then we say that the grammar is LL(1). The parser table for an LL(1) grammar allows us to use the next symbol in the input to select a single production to explore for every non-terminal we encounter. This is a considerable optimization over the original top-down parsing algorithm, which attempts to explore every possible production for each non-terminal it encounters.

Here is an example of a parsing table created by this algorithm. For the grammar

F → ( E ) | id

we have

LL(1) parsing algorithm

Here now is the complete LL(1) parsing algorithm:

push $ and S on the stack
a = (first symbol of w)
X = S
while(X != $) {
  if(X == a) pop the stack and let a = (next symbol of w)
  else if(X is a terminal) error();
  else if( M[X,a] is empty ) error();
  else {
    output the production M[X,a]
    pop X from stack
    push the right hand side of M[X,a] on the stack in reverse order
    }
  X = (symbol on top of the stack)
  }