Some languages are not regular languages

Later in this lecture we are going to prove that the following example language can not possibly be decided by a finite automaton:

L = { x | x contains an equal number of 0s and 1s }

Pumping Lemma

If A is a regular language, then there is a number p where if s is any string in A of length at least p, then s may be divided into three pieces, s = xyz, satisfying the following conditions:

  1. for each i ≥ 0, xyizA.
  2. |y| > 0, and
  3. |xy| ≤ p.

Proof of the pumping lemma

  1. If A is a regular language, then some DFA M decides the language.
  2. Let p be the number of states in M.
  3. Consider any string sA with length greater than or equal to p. The characters in s will cause the machine to transition through a number of states, eventually reaching an accepting state after the last character in s.
  4. Because the number of characters in s equals or exceeds the number of states in M, the machine will pass through at least one state more than once as M reads s. That is, the path that s traces through the machine has at least one loop in it.
  5. For this argument we will use the first repeated state along the path: we will circle back around to that state for the first time before we have processed p characters.
  6. We divide our path into three parts: x is the portion of s that leads from the start state to the first occurance of the repeated state. y is the part of s that traces the loop back around to the repeated state. z is the final part of s that takes us eventually to the accepting state.
  7. M will now accept any string of the form xyiz for i ≥ 0, because such a string takes M from start to the repeated state, goes i times around the loop back to the repeated state, and then runs from the repeated state on to the accepting state.

How to use the pumping lemma

The pumping lemma essentially says

If A is a regular language then all long strings in A can be pumped

Like almost all theorems, the pumping lemma takes the logical form

If this then that

In logic, a logical statement in that general form is logically equivalent to that statement's contrapositive:

If not that then not this

This is the form of the pumping lemma that is most useful to us:

If not all long strings in A can be pumped, then A is not a regular language

How to show that not all long strings can be pumped

To use the contrapositive form of the pumping lemma successfully we have to carry out the following steps:

  1. Assume, by way of contradiction, that A is regular. Let p be the number from the pumping lemma.
  2. Construct a counter-example string s which is in A and has length greater than or equal to p.
  3. Show that for all ways to decompose s into substrings (s = xyz) with |y| ≥ 0 and |xy| ≤ p there exists an i such that the pumped string xyiz is not in A.

An example

Consider the language

L = { x | x contains an equal number of 0s and 1s }

Our counterexample string s = 0p1p.

s is clearly in L and has length greater than p.

Consider what happens when we decompose s into substrings x, y, and z while obeying the constraints that |y| ≥ 0 and |xy| ≤ p. The second constraint forces y to consist only of 0s.

Now pick i = 2 and consider the string xy2z. This string contains more 0s than s did, while keeping the same number of 1s. This string has an unequal number of 0s and 1s, and is not in L.

Another example

Consider the language

L = { ww | wΣ* }

Our counterexample string s = 0p10p1.

s is clearly in L and has length greater than p.

Consider what happens when we decompose s into substrings x, y, and z while obeying the constraints that |y| ≥ 0 and |xy| ≤ p. The second constraint forces y to consist only of 0s.

Now pick i = 2 and consider the string xy2z. This string essentially takes the form 0p+k10p1, where k = |y|. There is no way to decompose this string into two equal substrings. Hence, the pumped string is not in L.