Another way to describe sets of character strings is with finite automata. Finite automata are also known as state machines, and we will use “automaton” and “machine” interchangeably.
As a simple example, here is a machine recognizing the set of strings matched by the regular expression
a(bb)+a
: A finite automaton is always in one of its states, represented in the diagram by circles. (The numbers inside the circles are labels to make this discussion easier; they are not part of the machine's operation.) As it reads the string, it switches from state to state. This machine has two special states: the start state s0 and the matching state s4. Start states are depicted with lone arrowheads pointing at them, and matching states are drawn as a double circle.
The machine reads an input string one character at a time, following arrows corresponding to the input to move from state to state. Suppose the input string is
abbbba
. When the machine reads the first letter of the string, the a
, it is in the start state s0. It follows the a
arrow to state s1. This process repeats as the machine reads the rest of the string: b
to s2
, b
to s3
, b
to s2
, b
to s3
, and finally a
to s4
. The machine ends in s4, a matching state, so it matches the string. If the machine ends in a non-matching state, it does not match the string. If, at any point during the machine's execution, there is no arrow for it to follow corresponding to the current input character, the machine stops executing early.
The machine we have been considering is called a deterministic finite automaton (DFA), because in any state, each possible input letter leads to at most one new state. We can also create machines that must choose between multiple possible next states. For example, this machine is equivalent to the previous one but is not deterministic:
The machine is not deterministic because if it reads a
b
in state s2, it has multiple choices for the next state: it can go back to s1 in hopes of seeing another bb
, or it can go on to s3 in hopes of seeing the final a
. Since the machine cannot peek ahead to see the rest of the string, it has no way to know which is the correct decision. In this situation, it turns out to be interesting to let the machine always guess correctly. Such machines are called non-deterministic finite automata (NFAs or NDFAs). An NFA matches an input string if there is some way it can read the string and follow arrows to a matching state. Sometimes it is convenient to let NFAs have arrows with no corresponding input character. We will leave these arrows unlabeled. An NFA can, at any time, choose to follow an unlabeled arrow without reading any input. This NFA is equivalent to the previous two, but the unlabeled arrow makes the correspondence with
a(bb)+a
clearest: