Lunes, Abril 25, 2011

Designing Finite Automata


The difference between Deterministic finite Automata and Non-Deterministic finita Automata

Deterministic Finite Automata
 
A DFA represents a finite state machine that recognizes a RE. For example, the following DFA:
 
 
dfa1.gif
recognizes (abc+)+. A finite automaton consists of a finite set of states, a set of transitions (moves), one start state, and a set of final states (accepting states). In addition, a DFA has a unique transition for every state-character combination. For example, the previous figure has 4 states, state 1 is the start state, and state 4 is the only final state.
A DFA accepts a string if starting from the start state and moving from state to state, each time following the arrow that corresponds the current input character, it reaches a final state when the entire input string is consumed. Otherwise, it rejects the string.
The previous figure represents a DFA even though it is not complete (ie, not all state-character transitions have been drawn). The complete DFA is:
dfa2.gif
but it is very common to ignore state 0 (called the error state) since it is implied. (The arrows with two or more characters indicate transitions in case of any of these characters.) The error state serves as a black hole, which doesn't let you escape.
A DFA is represented by a transition table T, which gives the next state T[s, c] for a state s and a character c. For example, the T for the DFA above is:
 

abc
0000
1200
2030
3004
4204
 
Suppose that we want to build a scanner for the REs:


for - keyword=for
identifier=[a - z][a - z0 - 9]*

The corresponding DFA has 4 final states: one to accept the for-keyword and 3 to accept an identifier:
dfa3.gif
(the error state is omitted again). Notice that for each state and for each character, there is a single transition.
A scanner based on a DFA uses the DFA's transition table as follows:
 

state = initial_state;
current_character = get_next_character();
while ( true )
{   next_state = T[state,current_character];
    if (next_state == ERROR)
       break;
    state = next_state;
    current_character = get_next_character();
    if ( current_character == EOF )
       break;
};
if ( is_final_state(state) )
   `we have a valid token'
else `report an error'
 
This program does not explicitly take into account the longest match disambiguation rule since it ends at EOF. The following program is more general since it does not expect EOF at the end of token but still uses the longest match disambiguation rule.
 

state = initial_state;
final_state = ERROR;
current_character = get_next_character();
while ( true )
{   next_state = T[state,current_character];
    if (next_state == ERROR)
       break;
    state = next_state;
    if ( is_final_state(state) )
       final_state = state;
    current_character = get_next_character();
    if (current_character == EOF)
       break;
};
if ( final_state == ERROR )
   `report an error'
else if ( state != final_state )
   `we have a valid token but we need to backtrack
         (to put characters back into the input stream)'
else `we have a valid token'
 
Is there any better (more efficient) way to build a scanner out of a DFA? Yes! We can hardwire the state transition table into a program (with lots of gotos). You've learned in your programming language course never to use gotos. But here we are talking about a program generated automatically, which no one needs to look at. The idea is the following. Suppose that you have a transition from state s1 to s2 when the current character is c. Then you generate the program:
 

s1: current_character = get_next_character();
    ...
    if ( current_character == 'c' )
       goto s2;
    ...
s2: current_character = get_next_character();
    ...


Non-Deterministic Finite Automata


An NFA is typically described using a directed graph as shown in Figure 11.8b, and is considered as a special kind of finite state machine. Each vertex of the graph represents a state, and edges represent possible transitions. An input string of finite length is read by the machine. Typically, the input string is a binary sequence of 0's and $ 1$'s. The initial state is designated by an inward arrow that has no source vertex, as shown pointing into state $ a$ in Figure 11.8b. The machine starts in this state and reads the first symbol of the input string.

Based on its value, it makes appropriate transitions. For a DFA, the next state must be specified for each of the two inputs 0 and $ 1$ from each state. From a state in an NFA, there may be any number of outgoing edges (including zero) that represent the response to a single symbol. For example, there are two outgoing edges if 0 is read from state $ c$ (the arrow from $ c$ to $ b$ actually corresponds to two directed edges, one for 0 and the other for $ 1$). There are also edges designated with a special $ \epsilon$ symbol. If a state has an outgoing $ \epsilon$, the state may immediately transition along the edge without reading another symbol. This may be iterated any number of times, for any outgoing $ \epsilon$ edges that may be encountered, without reading the next input symbol. The nondeterminism arises from the fact that there are multiple choices for possible next states due to multiple edges for the same input and $ \epsilon$ transitions. There is no sensor that indicates which state is actually chosen.
 
The interpretation often given in the theory of computation is that when there are multiple choices, the machine clones itself and one copy runs each choice. It is like having multiple universes in which each different possible action of nature is occurring simultaneously. If there are no outgoing edges for a certain combination of state and input, then the clone dies. Any states that are depicted with a double boundary, such as state $ a$ in Figure 11.8, are called accept states. When the input string ends, the NFA is said to accept the input string if there exists at least one alternate universe in which the final machine state is an accept state.
The formulation usually given for NFAs seems very close to Formulation 2.1 for discrete feasible planning. Here is a typical NFA formulation [891], which formalizes the ideas depicted in Figure 11.8:

Formulation 11..2 (Nondeterministic Finite Automaton)  
  1. A finite state space $ X$.
  2. A finite alphabet $ \Sigma$ which represents the possible input symbols. Let $ \Sigma_\epsilon = \Sigma \cup \{\epsilon\}$.
  3. A transition function, . For each state and symbol, a set of outgoing edges is specified by indicating the states that are reached.
  4. A start state $ x_0 \in X$.
  5. A set $ A \subseteq X$ of accept states.
An NFA, similar to a DFA, consumes a string of input symbols. For each input symbol it transitions to a new state until all input symbols have been consumed.
Unlike a DFA, it is non-deterministic in that, for any input symbol, its next state may be any one of several possible states. Thus, in the formal definition, the next state is an element of the power set of states. This element, itself a set, represents some subset of all possible states to be considered at once.

An extension of the NFA is the NFA-lambda (also known as NFA-epsilon or the NFA with epsilon moves), which allows a transformation to a new state without consuming any input symbols. For example, if it is in state 1, with the next input symbol an a, it can move to state 2 without consuming any input symbols, and thus there is an ambiguity: is the system in state 1, or state 2, before consuming the letter a? Because of this ambiguity, it is more convenient to talk of the set of possible states the system may be in. Thus, before consuming letter a, the NFA-epsilon may be in any one of the states out of the set {1,2}. Equivalently, one may imagine that the NFA is in state 1 and 2 'at the same time': and this gives an informal hint of the powerset construction: the DFA equivalent to an NFA is defined as the one that is in the state q={1,2}. Transformations to new states without consuming an input symbol are called lambda transitions or epsilon transitions. They are usually labeled with the Greek letter λ or ε.

The notion of accepting an input is similar to that for the DFA. When the last input symbol is consumed, the NFA accepts if and only if there is some set of transitions that will take it to an accepting state. Equivalently, it rejects, if, no matter what transitions are applied, it would not end in an accepting state.

 Formal definition

Two similar types of NFAs are commonly defined: the NFA and the NFA with ε-moves. The ordinary is defined as a 5-tuple, (Q, Σ, T, q0, F), consisting of
  • a finite set of states Q
  • a finite set of input symbols Σ
  • a transition function T : Q × Σ → P(Q).
  • an initial (or start) state q0Q
  • a set of states F distinguished as accepting (or final) states FQ.
Here, P(Q) denotes the power set of Q. The NFA with ε-moves (also sometimes called NFA-epsilon or NFA-lambda) replaces the transition function with one that allows the empty string ε as a possible input, so that one has instead
T : Q × (Σ ∪{ε}) → P(Q).
It can be shown that ordinary NFA and NFA with epsilon moves are equivalent, in that, given either one, one can construct the other, which recognizes the same language.

Linggo, Abril 3, 2011

Markov Chains

A Markov chain, named for Andrey Markov, is a mathematical system that transits from one state to another (out of a finite or countable number of possible states) in a chainlike manner. It is a random process endowed with the Markov property: that the next state depends only on the current state and not on the past. Markov chains have many applications as statistical models of real-world processes.



 A Markov chain is a discrete (discrete-time) random process with the Markov property. Often, the term "Markov chain" is used to mean a Markov process which has a discrete (finite or countable) state-space. Usually a Markov chain would be defined for a discrete set of times (i.e. a discrete-time Markov chain)[1] although some authors use the same terminology where "time" can take continuous values.[2][3] Also see continuous-time Markov process. The use of the term in Markov chain Monte Carlo methodology covers cases where the process is in discrete-time (discrete algorithm steps) with a continuous state space. The following concentrates on the discrete-time discrete-state-space case.

Markov chain is the so-called "drunkard's walk", a random walk on the number line where, at each step, the position may change by +1 or −1 with equal probability. From any position there are two possible transitions, to the next or previous integer. The transition probabilities depend only on the current position, not on the way the position was reached. For example, the transition probabilities from 5 to 4 and 5 to 6 are both 0.5, and all other transition probabilities from 5 are 0. These probabilities are independent of whether the system was previously in 4 or 6.
Another example is the dietary habits of a creature who eats only grapes, cheese or lettuce, and whose dietary habits conform to the following (artificial) rules: it eats exactly once a day; if it ate cheese yesterday, it will not today, and it will eat lettuce or grapes with equal probability; if it ate grapes yesterday, it will eat grapes today with probability 1/10, cheese with probability 4/10 and lettuce with probability 5/10; finally, if it ate lettuce yesterday, it won't eat lettuce again today but will eat grapes with probability 4/10 or cheese with probability 6/10. This creature's eating habits can be modeled with a Markov chain since its choice depends on what it ate yesterday, not additionally on what it ate 2 or 3 (or 4, etc.) days ago. One statistical property one could calculate is the expected percentage of the time the creature will eat grapes over a long period.
A series of independent events—for example, a series of coin flips—does satisfy the formal definition of a Markov chain. However, the theory is usually applied only when the probability distribution of the next step depends non-trivially on the current state.

State Diagram

A state diagram is a type of diagram used in computer science and related fields to describe the behavior of systems. State diagrams require that the system described is composed of a finite number of states; sometimes, this is indeed the case, while at other times this is a reasonable abstraction. There are many forms of state diagrams, which differ slightly and have different semantics.


State diagram are used to give an abstract description of the behavior of a system. This behavior is analyzed and represented in series of events, that could occur in one or more possible states. Hereby "each diagram usually represents objects of a single class and track the different states of its objects through the system.


State diagrams can be used to graphically represent finite state machines. This was introduced by TaylorBooth in his 1967 book "Sequential Machines and Automata Theory". Another possible representation is the State transition table.



State diagrams are used to describe the behavior of a system.  State diagrams describe all of the possible states of an object as events occur.  Each diagram usually represents objects of a single class and track the different states of its objects through the system. 


State diagrams have very few elements.  The basic elements are rounded boxes representing the state of the object and arrows indicting the transition to the next state.  The activity section of the state symbol depicts what activities the object will be doing while it is in that state.   
All state diagrams being with an initial state of the object.  This is the state of the object when it is created.  After the initial state the object begins changing states.  Conditions based on the activities can determine what the next state the object transitions to.
Below is an example of a state diagram might look like for an Order object.  When the object enters the Checking state it performs the activity "check items."  After the activity is completed the object transitions to the next state based on the conditions [all items available] or [an item is not available].  If an item is not available the order is canceled.  If all items are available then the order is dispatched.  When the object transitions to the Dispatching state the activity "initiate delivery" is performed.  After this activity is complete the object transitions again to the Delivered state.
State diagrams can also show a super-state for the object. A super-state is used when many transitions lead to the a certain state.  Instead of showing all of the transitions from each state to the redundant state a super-state can be used to show that all of the states inside of the super-state can transition to the redundant state.  This helps make the state diagram easier to read.
The diagram below shows a super-state.  Both the Checking and Dispatching states can transition into the Canceled state, so a transition is shown  from a super-state named Active to the state Cancel.  By contrast, the state Dispatching can only transition to the Delivered state, so we show an arrow only from the Dispatching state to the Delivered state.  

Biyernes, Abril 1, 2011

Regular Expression

Regular expressions are a notation for describing sets of character strings. When a particular string is in the set described by a regular expression, we often say that the regular expression matches the string.
The simplest regular expression is a single literal character. Except for the special metacharacters *+?()|, characters match themselves. To match a metacharacter, escape it with a backslash: \+ matches a literal plus character. 

Two regular expressions can be alternated or concatenated to form a new regular expression: if e1 matches s and e2 matches t, then e1|e2 matches s or t, and e1e2 matches st.
The metacharacters *, +, and ? are repetition operators: e1* matches a sequence of zero or more (possibly different) strings, each of which match e1; e1+ matches one or more; e1? matches zero or one.
The operator precedence, from weakest to strongest binding, is first alternation, then concatenation, and finally the repetition operators. Explicit parentheses can be used to force different meanings, just as in arithmetic expressions. Some examples: ab|cd is equivalent to (ab)|(cd); ab* is equivalent to a(b*).
The syntax described so far is a subset of the traditional Unix egrep regular expression syntax. This subset suffices to describe all regular languages: loosely speaking, a regular language is a set of strings that can be matched in a single pass through the text using only a fixed amount of memory. Newer regular expression facilities (notably Perl and those that have copied it) have added many new operators and escape sequences. These additions make the regular expressions more concise, and sometimes more cryptic, but usually not more powerful: these fancy new regular expressions almost always have longer equivalents using the traditional syntax.
One common regular expression extension that does provide additional power is called backreferences. A backreference like \1 or \2 matches the string matched by a previous parenthesized expression, and only that string: (cat|dog)\1 matches catcat and dogdog but not catdog nor dogcat. As far as the theoretical term is concerned, regular expressions with backreferences are not regular expressions. The power that backreferences add comes at great cost: in the worst case, the best known implementations require exponential search algorithms, like the one Perl uses. Perl (and the other languages) could not now remove backreference support, of course, but they could employ much faster algorithms when presented with regular expressions that don't have backreferences, like the ones considered above. This article is about those faster algorithm

Non- Deterministic Fenite Automata

Non-deterministic finite automata (NFA) have transition functions which may assign several or no states to a given state and an input symbol. They accept a word if there is any possible transition from the one of initial states to one of the final states. It is important to note that although NFAs have a non-determistic transition function, it can always be determined whether or not a word belongs to its language ($ w\in L(A)$). 
An interesting connection lies between the ideas of this chapter and the theory of finite automata, which is part of the theory of computation (see [462,891]). In Section 2.1, it was mentioned that determining whether there exists some string that is accepted by a DFA is equivalent to a discrete feasible planning problem. If unpredictability is introduced into the model, then a nondeterministic finite automaton (NFA) is obtained, as depicted in Figure 11.8. This represents one of the simplest examples of nondeterminism in theoretical computer science. Such nondeterministic models serve as a powerful tool for defining models of computation and their associated complexity classes. It turns out that these models give rise to interesting examples of information spaces.  
An NFA is typically described using a directed graph as shown in Figure 11.8b, and is considered as a special kind of finite state machine. Each vertex of the graph represents a state, and edges represent possible transitions. An input string of finite length is read by the machine. Typically, the input string is a binary sequence of 0's and $ 1$'s. The initial state is designated by an inward arrow that has no source vertex, as shown pointing into state $ a$ in Figure 11.8b. The machine starts in this state and reads the first symbol of the input string. Based on its value, it makes appropriate transitions. For a DFA, the next state must be specified for each of the two inputs 0 and $ 1$ from each state. From a state in an NFA, there may be any number of outgoing edges (including zero) that represent the response to a single symbol. For example, there are two outgoing edges if 0 is read from state $ c$ (the arrow from $ c$ to $ b$ actually corresponds to two directed edges, one for 0 and the other for $ 1$). There are also edges designated with a special $ \epsilon$ symbol. If a state has an outgoing $ \epsilon$, the state may immediately transition along the edge without reading another symbol. This may be iterated any number of times, for any outgoing $ \epsilon$ edges that may be encountered, without reading the next input symbol. The nondeterminism arises from the fact that there are multiple choices for possible next states due to multiple edges for the same input and $ \epsilon$ transitions. There is no sensor that indicates which state is actually chosen.