Context-free Grammar

A context-free grammar is a four-tuple (V, Σ, R, S) where

  1. V is a finite set of variables,
  2. Σ is a finite set, disjoint from V, called the terminals,
  3. R is a finite set of rules, with each rule being a variable and a string of variables and terminals,
  4. SV is the start variable.

An example grammar

Here is an example grammar

S → 0S0

S → 1S1

Sε

Abbreviated notation

If a grammar offers multiple options for what a variable can expand to, we can consolidate all of those options into a single rule that offers several alternatives on the right hand side separated by | symbols.

For example,

S → 0S0

S → 1S1

Sε

becomes

S → 0S0 | 1S1 | ε

Derivations

A derivation is a sequence of applications of grammar rules, starting from the string S and continuing until there are only terminal symbols left in the string.

Here is an example derivation using the previous example grammar.

S → 0S0 →01S10→011110

Context-free languages

A context-free language is the set of all strings that can be derived from a particular context-free grammar.

To demonstrate that a particular string is in the language, we have to present a derivation for that string.

To show that a particular language is a context-free language, we must construct a grammar for the language and then argue that the grammar generates exactly L.

An example

Consider the language

L = { x | x contains at least two 1s }

I claim that the following grammar generates L:

SAAB

A → 0A | A0 | 1A | A1 | 1

B → 0B | 1B | ε

The intuition behind this grammar is that any string in L can be decomposed into three parts: a substring containing at least one 1, a second substring containing at least one 1, and a third substring containing 0 or more 1s. The rules for the variable A generate substrings that have to have at least one 1. The rules for the variable B generate substrings that contain 0 or more 1s.

Chomsky normal form

A context-free grammar is in Chomsky normal form if all of its rules obey the following constraints:

  1. The start variable does not appear on the right hand side of any rule.
  2. If a right hand side of a rule contains any variables, the right hand side contains exactly two variables.
  3. If a right hand side of a rule contains a terminal symbol, that terminal symbol is the only symbol appearing on the right hand side.
  4. The only rule allowed to contain an ε on its right hand side is the rule Sε

Converting grammars to Chomsky normal form

See https://en.wikipedia.org/wiki/Chomsky_normal_form

An example

We convert the previous example grammar

SAAB

A → 0A | A0 | 1A | A1 | 1

B → 0B | 1B | ε

into Chomsky normal form.

1) Eliminate start symbol from right-hand sides: not needed

2) Elimate rules with nonsolitary terminals:

SAAB

AZA | AZ | OA | AO | O

BZB | OB | ε

Z → 0

O → 1

3) Eliminate long right hand sides:

SAC

CAB

AZA | AZ | OA | AO | O

BZB | OB | ε

Z → 0

O → 1

4) Eliminate ε rules

SAC

CAB | A

AZA | AZ | OA | AO | O

BZB | OB | Z | O

Z → 0

O → 1

5) Eliminate unit rules

SAC | AA | OC | AO | OA | OO

CAB | OB | AZ | AO | OZ | OO

AZA | AZ | OA | AO | ZO | OZ | OO

BZB | OB | ZZ | ZO | OO | OZ

Z → 0

O → 1