www.angelfire.com/dragon/letstry
cwave04 at yahoo dot com
Free Guestbook
My Guestbook

Why use finite automata?
This is the third page of our tutorial, and it may seem somewhat late now to ask this question. Indeed, this is the question that a student may (and should) justly demand an answer to even before starting to read the tutorial! But the only reason that we have strained the reader's patience so far is: had we listed the possible uses of finite automata earlier, it would most possibly have made him/her think that finite automata are very difficult things to understand. Indeed, there does not seem to be another belief that is so wrong, yet so popular, as that "brilliant concepts are difficult concepts". Well, we for one do not believe in this motto, and hope to find the reader on our side once he/she completes reading the tutorial.

An internal state machine is a machine with a memory. We go on a bit further and say, that any machine with a memory is an internal state machine. The finiteness condition in the definition of finite automata is the natural restriction imposed by the fact that computers cannot handle infinite sets. (However, there are some important variants of finite automata that sort of allow infinite sets. We shall learn about some such variants in a later page.) Viewed in this light, the above question reduces to "Why use machines with memories?"

To the modern reader the answer should be more than obvious. What is a computer but a machine with a memory? Almost every application that runs on a computer is some form of finite automata or other. Here is a list that makes no pretence of being comprehensive:
  • The compilers that compile your codes in C, ForTran, Java or Lisp or whatever language we may be using --- are all made up of (variants of) finite automata. The thick books about compiler designs essentially tell you how to build the various finite automata components in a compiler.
  • Do you use LaTeX to typeset your documents? How does it work ? How does it remember to indent things properly inside a block that you have started ten lines back? Only because the LaTeX compiler has a memory!
  • The very HTML file that you are reading requires finite automata for parsing it.
  • Ever thought of writing a program to make a calculator that accepts mathematical formulas like 34.7*67^(4+9) , and computes the value keeping track of precedences? If not, try your hand at it! And you will see how confusing it is to write it without finite automata, and how easy with it!
  • Any editor that you use has lots of finite automata running behind the screen.
  • Ever played games like Quake or Diablo, the so-called 3D role playing games where you control some character to fight against a host of monsters and robots? All those robots and monsters are made up of finite automata, much like the simple robot that we saw in the first page of this tutorial.
  • Finite automata are indispesible in hardware also! The mother board of your computer is implementing a huge finite automaton. Also, most of the "intelligent" electronic gadget that you see around you (eg, automatic traffic signals, radio controlled toys, automatic elevator controls) use finite automata at various levels.
Well, if we go on like this the list will extend up to infinity. So we shall pause here, and show one very important class of application namely language recognition. This application will show how finite automata are used in editors and in the first phase of compilers.
A concrete application : Lexical analysis
Consider a huge C-code with lots of embedded comments. As the reader should know already, C-comments are delimited by a starting /* and a trailing */, and comments cannot be nested. When a compiler reads the code it strips all the comments off the code. How is this stripping done? To understand this we shall write a that portion of the compiler which strips comments. Our compiler will read the program file character by character. Since the program may be very huge, it makes no attempt to store huge chunks of the code at a time. Keeping this restriction in mind we know that our compiler (which will presently emerge as a finite automaton) has input set consisting of all the possible characters that may be present in C code. Now let us delve into the mind of the compiler.

When it starts reading the file it is outside any comment. It goes on reading character after character from the file (and tossing the characters in the output file) until it meets the first /. Then it knows that t has encountered a possible start of comment. To make sure it reads the next character. If it is a * then the compiler is certain that it is inside a comment. In this case the compiler keeps on reading the file, but now throws away the characters in the garbage bin. If it was not a * then of course the compiler understands that it is outside any comment and keeps on copying the characters read into the output file. In our imaginary voyage through the mind of the compiler, let us again imagine ourselves in the situation where the compiler is inside a comment. If it happens to meet a * it knows that it may be a possible end of comment. So after trashing the * it reads the next character. If it is a / then the comment has indeed ended and the compiler is again in a region that is outside any comment.

Let us take one more look at the above confusing description paying particular attention to the bold phrases. There are four of them:
  1. outside any comment
  2. possible start of comment
  3. inside a comment
  4. possible end of comment
The behaviour of the compiler depends which of the four states ie is currently in. Well, there is no hiding after we have already used the word "state". Yes, these are the 4 states of the finite automaton. What are the inputs? We have already said that the input set consists of all possible characters that may be found in a C-file. But that's a pretty large set. One further look at the lat paragraph will, however, convince the reader that all that the automaton really has to know is whether the character is a * or / or something else. But is that all? Not quite! Our automaton needs some way to know when the end of file has been reached. You surely cannot leave the machine to imitate poor Oliver, "always asking for more"! So we need another special input symbol which we shall call EOF (End Of File). Thus we shall have four inputs:
  1. *
  2. /
  3. EOF
  4. <anything else>
How many outputs shall we have? Well, we shall need 7 of them:
  1. PC : Print the Currently read character in the output file
  2. T : Trash the currently read character
  3. S: Store the currently read character in a temporary storage.
  4. D : issue the message "Done".
  5. E : issue the message "Error".
  6. PSC : Print the Stored character followed by the Current character.
  7. PSD : Print the Stored character and say "Done".
Without further ado we give the two tables corresponding to the output controlling and the state controlling functions:
Inputs
States
* / EOF <anything else>
outside any comment PC S D PC
possible start of comment T PSC PSD PSC
inside comment S T E T
possible end of comment T T D T
[Thanks to Eimantas Vaiciunas for helping me correct a mistake in this table.]
The state controlling function is equally obvious.
Inputs
States
* / EOF <anything else>
outside any comment outside start outside outside
possible start of comment inside outside outside outside
inside comment end inside inside inside
possible end of comment inside outside inside inside
Here, in the second table, "start" is an abbreviation for "possible start of comment", "inside" is an abbreviation for "inside comment", and so on. The reader is requested to try his/her hand at implementing the above finite automaton in some standard computer language like C or ForTran.

In the next page we shall continue with a more sophisticated version of the same example.

PrevNext
© Arnab Chakraborty (2008)

free html visitor counters
hit counter