|
www.angelfire.com/dragon/letstry
cwave04 at yahoo dot com |
|
|
Why use finite automata?
This is the third page of our tutorial,
and it may seem somewhat late now
to ask this
question. Indeed, this is the
question that a student may (and should) justly demand an answer to
even before starting to
read the tutorial! But the only reason that we have strained the
reader's patience so far is: had we listed the possible uses of finite
automata earlier, it would most possibly have made him/her think that finite
automata are very difficult things to understand. Indeed, there does not
seem to be another belief that is so wrong, yet so popular, as that
"brilliant
concepts are difficult concepts". Well, we for one do not believe in this
motto, and hope to find the reader on our side once he/she completes
reading the tutorial.
An internal state machine is a machine with a memory. We go on a bit
further and say, that any machine with a memory is an internal state
machine. The finiteness condition in the definition of finite automata is
the natural restriction imposed by the fact that computers cannot handle
infinite sets. (However, there are some important variants of finite
automata that sort of allow infinite sets. We shall learn about some such
variants in a later page.) Viewed in this light, the above question
reduces to "Why use machines with memories?"
To the modern reader the answer should be more than
obvious. What is a computer but a machine with a memory? Almost every
application that runs on a computer is some form of finite automata or
other. Here is a list that makes no pretence of being comprehensive:
The compilers that compile your codes in C, ForTran, Java or Lisp or
whatever language we may be using --- are all made up of (variants
of) finite automata. The thick books about compiler
designs essentially
tell you how to build the various finite automata
components in a
compiler.
Do you use LaTeX to typeset your documents? How does it work ? How
does it remember to indent things properly inside a block that you
have started ten lines back? Only because the LaTeX compiler
has a memory!
The very HTML file that you are reading requires finite automata for
parsing it.
Ever thought of writing a program to make a calculator that accepts
mathematical formulas like 34.7*67^(4+9) , and computes the value
keeping track of precedences? If not, try your hand at it! And you
will see how confusing it is to write it without finite automata,
and how easy with it!
Any editor that you use has lots of finite automata running behind
the screen.
Ever played games like Quake or Diablo, the so-called 3D role
playing games where you control some character to fight against a
host of monsters and robots? All those robots and monsters are made
up of finite automata, much like the simple robot that we saw in the
first page of this tutorial.
Finite automata are indispesible in hardware also! The mother board
of your computer is implementing a huge finite automaton. Also, most
of the "intelligent" electronic gadget that you see around you (eg,
automatic traffic signals, radio controlled toys, automatic elevator
controls) use finite automata at various levels.
Well, if we go on like this the list will extend up to infinity.
So we shall pause here,
and show one very important class of
application namely language recognition. This application will show how
finite automata are used in editors and in the first phase of compilers.
A concrete application : Lexical analysis
Consider a huge C-code with lots of embedded comments. As the reader
should know already, C-comments are delimited by a starting
/* and a trailing */ , and comments cannot be
nested. When a compiler reads the code it strips all the comments
off the code. How is this stripping done? To understand this we shall write
a that portion of the compiler which strips comments. Our compiler will
read the program file
character by character. Since the program may be very huge, it makes no
attempt to store huge chunks of the code at a time. Keeping this
restriction in mind we know that our compiler (which will presently emerge
as a finite automaton) has input set consisting of all the possible
characters that may be present in C code. Now let us delve into the mind
of the compiler.
When it starts reading the file it is outside any comment. It goes
on reading character after character from the file (and tossing the
characters in the
output file) until it meets the
first / . Then it knows that t has encountered a possible
start of
comment. To make sure it reads the
next character. If it is a * then the compiler is certain
that it is inside a comment. In this case the compiler keeps
on reading
the file, but now throws away the characters in the garbage bin. If it was
not a * then of course the compiler understands that it is
outside any comment and keeps on copying the characters read into
the output file. In our imaginary voyage through the mind of the compiler,
let us again imagine ourselves in the situation where the compiler is
inside a comment.
If it happens to meet a
* it knows that it may be a possible end of
comment. So after trashing the * it reads the next
character. If it is a / then the comment has indeed ended and
the compiler is again in a region that is outside any comment.
Let us take one more look at the above confusing description paying
particular attention to the bold phrases. There are four of them:
outside any comment
possible start of comment
inside a comment
possible end of comment
The behaviour of the compiler depends which of the four states ie is
currently in. Well, there is no hiding after we have already used the word
"state". Yes, these are the 4 states of the finite automaton.
What are the inputs? We have already said that the input set consists of
all possible characters that may be found in a C-file. But that's a pretty
large set. One further look at the lat paragraph will, however, convince
the reader that all that the automaton really has to know is
whether the character is a * or / or something
else. But is that all? Not quite!
Our automaton needs some way to know when the
end of file has been reached. You surely cannot leave the machine to
imitate poor Oliver, "always asking for more"! So we need another special
input symbol which we shall call EOF (End Of File). Thus we shall have
four inputs:
*
/
EOF
<anything else>
How many outputs shall we have?
Well, we shall need 7 of them:
PC : Print the Currently read character in the output file
T : Trash the currently read character
S: Store the currently read character in a temporary storage.
D : issue the message "Done".
E : issue the message "Error".
PSC : Print the Stored character followed by the Current character.
PSD : Print the Stored character and say "Done".
Without further ado we give the two tables corresponding to the output
controlling and the state controlling functions:
|
Inputs |
States |
|
*
|
/
|
EOF
|
<anything else>
|
outside any comment
|
PC
|
S
|
D
|
PC
|
possible start of comment
|
T
|
PSC
|
PSD
|
PSC
|
inside comment
|
S
|
T
|
E
|
T
|
possible end of comment
|
T
|
T
|
D
|
T
|
[Thanks to Eimantas Vaiciunas for helping me correct a mistake in this table.]
|
The state controlling function is equally obvious.
|
Inputs |
States |
|
*
|
/
|
EOF
|
<anything else>
|
outside any comment
|
outside
|
start
|
outside
|
outside
|
possible start of comment
|
inside
|
outside
|
outside
|
outside
|
inside comment
|
end
|
inside
|
inside
|
inside
|
possible end of comment
|
inside
|
outside
|
inside
|
inside
|
|
Here, in the second table, "start" is an abbreviation for "possible start of
comment", "inside" is an abbreviation for "inside comment", and so on.
The reader is requested to try his/her hand at implementing the above
finite automaton in
some standard computer language like C or ForTran.
In the next page we shall continue with a more sophisticated version of
the same example.
|