|
www.angelfire.com/dragon/letstry
cwave04 at yahoo dot com |
|
|
Syntax analysis
The output of lexical analysis, as we have already seen, is a
sequence
of tokens. Now, the token sequence of a valid AdvLang program
must conform to some structural rules, or syntax,
as they are called.
Here is an example. A typical exit from a location looks like
north flag
meaning "if you go north from the current location you will go to the
location called flag".
In terms of tokens, this is
tok_DIRN tok_IDENT
Here the tok_IDENT must come after the tok_DIRN. It
is a mistake to write
flag north
So we have the syntax rule:
An exit consists of a TOK_DIRN and a TOK_IDENT. The
TOK_IDENT must come after the TOK_DIRN.
We specify this rule as
exit : tok_DIRN tok_IDENT
This is called a grammar rule or production rule written in the
Backus-Naur Form (BNF). Read it as
An exit is made of a tok_DIRN followed by a tok_IDENT.
The colon (:) means "is made of." A production rule defines the structure of some
part of a program. Here we are defining the structure of an
exit. Such parts are called non-terminals. Thus
exit is a non-terminal. It is written to the left of the
colon. So we call it the left hand side of the rule.
How to ride a bison
Syntax analysis is more complex than lexical analysis. However, just
as we used the free utility program flex to write a lexical analyser,
we can use another free utility called bison to generate the syntax
analyser.
The bison software, like flex, is a free utility program available from
www.gnu.org. It is an
improved version of a similar utility program called yacc.
yacc or bison is part of many unix-like systems and also
of Mac OS X.
|
Recall that we had put our token-lexeme table in a file called
adv3.lex (after solving the exercise
in the last page) and then had fed the file to flex.
Similarly,
here we shall specify the syntax rules inside a file called
adv1.y and invoke bison on it as
bison adv1.y
The file adv1.y is called the bison input file,
and it
must have extension .y.
So far we have discussed only one grammar rule. The adv1.y file containing only this rule looks
as follows.
adv1.y:
%{
#include <stdio.h>
%}
%token tok_DIRN tok_IDENT
%%
exit : tok_DIRN tok_IDENT {printf("Found an exit!\n");}
%%
int yyerror(char *errMessage) {
printf("Trouble: %s\n",errMessage);
}
main() {
yyparse();
}
|
Let us explain what this file means. A bison input file has 4 parts.
The parts are delimited by special symbols:
%{, %} and %%. All these
special symbols must start from the first column.
In the first part we write any #include's that our compiler may
need. We shall use printf later, so we have included
stdio.h.
In the second part we list the tokens that we shall
need. For our rule we need only two. These are declared after
%token. (Bison will convert these into #define's to produce a file like our compile.h
in the last page.)However, since our lex file refers to all the tokens so we have added another %token line listing the other tokens. [Thanks again, Danny, for reminding me to add this line!] We can have as many %token lines as we want.
In the third part we list the production rules. Note that after the rule
we have a printf statement. This printf will be
executed whenever the rule is matched. This will help us to track
when a rule matches.
In the fourth part we
write any C function that we may need. We must always write the
yyerror(char *errMessage) function. This function is called if the
compiler finds any error in the AdvLang program.
We need a main
function to start the entire thing.
It calls
the function
yyparse(). Just as flex wrote the function
yylex(), bison writes the function
yyparse(). This function does everything for us. It
calls yylex() repeatedly to extract the tokens, and
then performs syntax analysis. If there is any syntax error, it calls
the function yyerror(char *errMessage), with
some appropriate error message.
At this stage (after solving the exercise in the last page) you
should have a flex input file
adv3.lex,
and a bison input file
adv1.y.
Now give the following commands to the computer. The comments in
red tell you the results of the commands.
flex adv3.lex
[lex.yy.c is created]
bison -d -o compile.c adv1.y
[compile.c and compile.h are created]
cc -o compile compile.c lex.yy.c
[compile is created]
(If you are using Windows, then you would get a file called
compile.exe instead of compile.)
To try out our compiler let us create a sample AdvLang file
test1.al containing a single exit:
test1.al:
north flag
|
Compile it as
compile < test1.al
You should get the output
Found an exit!
Now modify the file test1.al so that it not a valid exit
anymore (for example, change the "north" to "northwest", which is not a
direction that AdvLang understands). Then the output will be
Trouble: Syntax error.
Syntax rules are all in terms of the tokens only, and
not the lexemes. Thus the rule that "a location cannot have two
different north exits" is not a syntax rule, since it involves the lexeme
north. Such rules are called semantic rules and will
be dealt with during semantic analysis.
|
More rules
We have already defined the grammar for one part of an AdvLang
program, namely exits. The same technique can be applied to some
other parts as well. For example, we specify the name of a location as
NAME "House"
In terms of tokens this is
tok_NAME tok_STRING
So we have the rule
A name speification is a made of a tok_NAME followed by a tok_STRING.
Using BNF we write
nameSpec : tok_NAME tok_STRING
Here we have introduced a new non-terminal called
nameSpec. Similarly, for the desctiptions we have the rule
descrSpec : tok_DESCR tok_STRING
The specification of the
starting location is made of the token tok_START
followed by tok_IDENT. In BNF we write this as:
startSpec : tok_START tok_IDENT
Making the list of exits
Expressing lists in BNF is slightly tricky. Be prepared
to go through
the following explanation more than once. Once learned, this will be
a valuable tool in your arsenal.
Each list needs at least two BNF rules to define it. The first rule tells us
what the shortest possible list looks like.
The next rule tells us how to create a new list from an old list by the
minimum possible addition.
The shortest possible exitList is a list with no exits!
(For instance, there was no exit from the Marsh in the sample game).
So the first rule is:
exitList :
There is nothing to the right of the colon. The second rule
extends an existing
exitList by adding one more exit to it:
exitList : exitList exit
Notice that by virtue of this rule we can keep on adding exit-s to get as
long an exitList as we want.
Trying it out
Change the adv1.y file to adv2.y as follows.
adv2.y:
<First and second parts as before>
%%
exitList : {printf("Starting with an empty list!\n");}
exitList : exitList exit {printf("Adding an exit!\n");}
exit : tok_DIRN tok_IDENT {printf("Found an exit!\n");}
%%
<Fourth part as before>
|
Apply bison to it as before. Now change test1.al to the
file test2.al containing a
list of exits:
test2.al:
north flag
south house
|
Then apply your compiler on this file: (the red lines are the computer outputs)
compile < test2.al
Starting with an empty list!
Found an exit!
Adding an exit!
Found an exit!
Adding an exit!
Combining rules
We have already defined different parts of a location (e.g.,
nameSpec, descrSpec, exitList). Now we shall
combine these into a single rule for specifying a location.:
locnSpec : tok_LOCN tok_IDENT
nameSpec descrSpec exitList
This means: A location specification starts with the token
tok_LOCN followed by a tok_IDENT. Then comes
the name specification followed by the
location description which is followed by a list of exits.
After adding this rule and the rules for nameSpec and
descrSpec,
the file adv2.y becomes adv3.y that looks like the following.
adv3.y:
<First part as before>
%token tok_NAME tok_DIRN tok_IDENT tok_DESCR tok_LOCN tok_STRING
%%
locnSpec : tok_LOCN tok_IDENT nameSpec descrSpec exitList
{printf("Completed a location!\n");}
nameSpec : tok_NAME tok_STRING
{printf("Name specification!\n");}
descrSpec : tok_DESCR tok_STRING
{printf("Description specification!\n");}
exitList : {printf("Starting with an empty list!\n");}
exitList : exitList exit {printf("Adding an exit!\n");}
exit : tok_DIRN tok_IDENT {printf("Found an exit!\n");}
%%
<Fourth part as before>
|
To test this we shall use test3.al:
test3.al:
LOCATION house
NAME "Your house"
DESCRIPTION "This is your house."
east forest
|
Create the compiler using bison as before. Apply the compiler on
test3.al as:
compile < test3.al
Name specification
Description specification
Starting with an empty list!
Found an exit!
Adding an exit!
Completed a location!
The order in which the computer prints the above lines, depends on the
order in which the production rules are applied. The "Completed a location!"
message comes at the very end. This is because the compiler must first
recognise all the different parts before it can recognise the location.
The order in which production rules are used is of great importance, and
will be discussed at more length later.
Making the list of locations
An AdvLang contains a list of
location specifications. Making the list of locnSpec is much
like making exitList.
Each list needs at least two BNF rules to define it. The first rule tells us
what the shortest possible list looks like. In case of
locnSpecList, the shortest list consists of a single element
(locnSpec.) We write this in BNF as
locnSpecList : locnSpec
The next rule tells us how to create a new list from an old list by the
minimum possible addition.
This may be done by appending one more
locnSpec to an existing locnSpecList.
locnSpecList : locnSpecList locnSpec
Wrapping it up
An AdvLang program consists of a list of locations followed by the
specification of the starting location. So we have the rule:
program : locnSpecList startSpec
So finally we have 10 production rules that define all the
non-terminals. These production rules together form the grammar of
AdvLang:
(Rule 1) program : locnSpecList startSpec
(Rule 2) startSpec : tok_START tok_IDENT
(Rule 3) locnSpec : tok_LOCN tok_IDENT
nameSpec descrSpec exitList
(Rule 4) nameSpec : tok_NAME tok_STRING
(Rule 5) descrSpec : tok_DESCR tok_STRING
(Rule 6) exit : tok_DIRN tok_IDENT
(Rule 7) locnSpecList : locnSpec
(Rule 8) locnSpecList : locnSpecList locnSpec
(Rule 9) exitList :
(Rule 10) exitList: exitList exit
|