|
www.angelfire.com/dragon/letstry
cwave04 at yahoo dot com |
|
|
Using the attributes
In the last page we have learned that bison allows a C attribute
with each occurence of each token, and flex puts values in
the attributes during lexical analysis. The next question is: How to
use the attributes?
Recall that during syntax analysis bison constructs a derivation
of the program, ie, step by step grouping of non terminals and tokens until
we arrive at the non terminal program. Also recall that each step
in the dervation is either a shift or a reduce. In a
shift step we read the next token, in a reduce step
we apply some grammer rule. Whenever a reduce step takes place
bison can execute some C code of our choice. We have to write
these C codes after the appropriate grammer rule in the third part
of the bison input file.
Consider the line
startSpec : tok_START tok_IDENT
{
printf("startSpec : tok_START tok_IDENT(%s)", $2);
}
;
This is a grammer rule followed by some C code. The C code inside the braces will be executed whenever this rule is used in a
reduce step. Note the following points. First, the code is
enclosed in a pair of braces. Second, the opening brace must not
start from the first column. Third, the code is just plain C code except
for the $2.
The $2 refers to the attribute of the 2nd entity in the righthand side
of the BNF rule. Here this entity is the token tok_IDENT,
which has an attribute of type char*. So here we have to
treat the $2 as a char* variable.
Every entity in a grammer rule can have
its own attribute, including non terminals and the lefthand side. They
are all referred by a $ name. The attributes of the
righthand side entities are denoted by $1, $2, etc. $$ refers to
the attribute of the lefthand side type.
|
The C code segment associated with a BNF rule is called the
action for that rule. To see how actions behave we have
attached actions to each of the BNF rules in adv5.y.
Each action consists of a printf just like the
example above. Whenever a token in the righthand side has an
attribute, we also print the attribute.
Follow the usual steps to compile and run our program:
flex adv4.lex
[lex.yy.c is created]
bison -d -o compile.c adv5.y
[compile.c and compile.h are created]
cc -o compile compile.c lex.yy.c
[compile is created]
compile < toy.al
The output will consist of a list of BNF rules with the attribute
values printed beside token the names that have them. Here are
the first few lines from the output:
nameSpec : tok_NAME tok_STRING(Your house)
descrSpec : tok_DESCR tok_STRING(You are standing\nin front of your house.\nPaths lead towards east and west.)
exitList :
exit : tok_DIRN(2) tok_IDENT(flag)
exitList : exitList exit
exit : tok_DIRN(3) tok_IDENT(forest)
exitList : exitList exit
locnSpec : tok_LOCN tok_IDENT(house) nameSpec descrSpec exitList
locnSpecList : locnSpec
Observe
that each line in the output corresponds to a reduce step
in the right most derivative of the program in toy.al.
Intermediate representation
An advlang program contains information. Eventually,
the entire information should get translated into the
final low-level output. However, except for absolutely trivial languages,
this is not done in a single step. We strore the entire information
in some intermediate form first, and then generate the
low-level output based on that. The intermediate storage of
information is used only internally by the compiler. For our case,
it consists of the C data structures shown in the green box to the
left.
typedef struct {
char* name;
char* descr;
int exits[4];
}
locStruct;
locStruct locList[50];
int nLoc, startLoc;
|
The structure locStruct stores information about a
single location. The array locList stores the list
of all the locations. We have arbitrarily chosen 50 as its length.
You may later replace it by, say, a linked list. The exit
array is of size 4, since there are as many directions. We code
north as 0, south as 1, east 2, and west 3. Thus, if the north exit
from location 3 leads to location 9 then we shall have
locList[3].exits[0] = 9
If there is no exit to the east from location 6, then we shall have
locList[6].exits[2] = -1
During semantic analysis we shall need some temporary variables,
as well. These are listed in the red box to the right.
char tmp_idList[50][10];
char tmp_name[30], tmp_descr[80];
int tmp_exit[4];
|
For the temporary variables we have used names starting with
tmp_. Here tmp_idList stores the identifiers,
we can store upto 50 of them, each of length at most 9 (=10-1,
to allow for the trailing null character).
tmp_name, tmp_descr and tmp_exit store
the name, description and exits of a location while the location
is being created. tmp_nId keeps track of the number
of identifiers encountered so far.
int find(char *id) {
int i;
for(i=0;i<nLoc;i++) {
if(!strcmp(id,tmp_idList[i]))
return i;
}
strcpy(tmp_idList[nLoc],id);
nLoc++;
return (nLoc-1);
}
|
We shall need a C function (say, find(char *id))
to look up identifiers. It will check if id is already
in the array tmp_idList. If so, it will return its
index, else it will add id to the array, and then
return its index. In the latter
case
nLoc will increase by 1.
A simple implementation is shown in the box to the left.
In order to use this function in the actions in the bison
input file, we have to write the function in the fourth part of
that file. Also, the function uses strcmp to compare
character strings, and strcpy to copy from one
character string to another. These functions are declared in the C
header string.h. So we need to insert the
line
#include <string.h>
in the first part of the bison input file.
In the next page will show us how to use actions to obtain the intermediate representation
of the information.
|