Arnab Chakraborty's compiler tutorial: page 10 of 11

www.angelfire.com/dragon/letstry
cwave04 at yahoo dot com

My Guestbook

Home >

... tutorials >

... ... Compiler

Using the attributes

In the last page we have learned that bison allows a C attribute with each occurence of each token, and flex puts values in the attributes during lexical analysis. The next question is: How to use the attributes?

Recall that during syntax analysis bison constructs a derivation of the program, ie, step by step grouping of non terminals and tokens until we arrive at the non terminal program. Also recall that each step in the dervation is either a shift or a reduce. In a shift step we read the next token, in a reduce step we apply some grammer rule. Whenever a reduce step takes place bison can execute some C code of our choice. We have to write these C codes after the appropriate grammer rule in the third part of the bison input file.

Consider the line


startSpec : tok_START tok_IDENT
  {
    printf("startSpec : tok_START tok_IDENT(%s)", $2);
  }
  ;

This is a grammer rule followed by some C code. The C code inside the braces will be executed whenever this rule is used in a reduce step. Note the following points. First, the code is enclosed in a pair of braces. Second, the opening brace must not start from the first column. Third, the code is just plain C code except for the $2.

The $2 refers to the attribute of the 2nd entity in the righthand side of the BNF rule. Here this entity is the token tok_IDENT, which has an attribute of type char*. So here we have to treat the $2 as a char* variable.

Every entity in a grammer rule can have its own attribute, including non terminals and the lefthand side. They are all referred by a $ name. The attributes of the righthand side entities are denoted by $1, $2, etc. $$ refers to the attribute of the lefthand side type.

The C code segment associated with a BNF rule is called the action for that rule. To see how actions behave we have attached actions to each of the BNF rules in adv5.y. Each action consists of a printf just like the example above. Whenever a token in the righthand side has an attribute, we also print the attribute.

Follow the usual steps to compile and run our program:


flex adv4.lex
   [lex.yy.c is created]
bison -d -o compile.c adv5.y
   [compile.c and compile.h are created]
cc -o compile compile.c lex.yy.c
   [compile is created]
compile < toy.al

The output will consist of a list of BNF rules with the attribute values printed beside token the names that have them. Here are the first few lines from the output:


 nameSpec : tok_NAME tok_STRING(Your house)

 descrSpec : tok_DESCR tok_STRING(You are standing\nin front of your house.\nPaths lead towards east and west.)
exitList :

 exit : tok_DIRN(2) tok_IDENT(flag)
exitList : exitList exit

 exit : tok_DIRN(3) tok_IDENT(forest)
exitList : exitList exit


locnSpec : tok_LOCN tok_IDENT(house) nameSpec descrSpec exitList
locnSpecList : locnSpec

Observe that each line in the output corresponds to a reduce step in the right most derivative of the program in toy.al.

Intermediate representation

An advlang program contains information. Eventually, the entire information should get translated into the final low-level output. However, except for absolutely trivial languages, this is not done in a single step. We strore the entire information in some intermediate form first, and then generate the low-level output based on that. The intermediate storage of information is used only internally by the compiler. For our case, it consists of the C data structures shown in the green box to the left.


typedef struct {
    char* name;
    char* descr;
    int exits[4]; 
}
  locStruct;

locStruct locList[50];
int nLoc, startLoc;

The structure locStruct stores information about a single location. The array locList stores the list of all the locations. We have arbitrarily chosen 50 as its length. You may later replace it by, say, a linked list. The exit array is of size 4, since there are as many directions. We code north as 0, south as 1, east 2, and west 3. Thus, if the north exit from location 3 leads to location 9 then we shall have


locList[3].exits[0] = 9

If there is no exit to the east from location 6, then we shall have


locList[6].exits[2] = -1

During semantic analysis we shall need some temporary variables, as well. These are listed in the red box to the right.


char tmp_idList[50][10];
char tmp_name[30], tmp_descr[80];
int  tmp_exit[4];

For the temporary variables we have used names starting with tmp_. Here tmp_idList stores the identifiers, we can store upto 50 of them, each of length at most 9 (=10-1, to allow for the trailing null character). tmp_name, tmp_descr and tmp_exit store the name, description and exits of a location while the location is being created. tmp_nId keeps track of the number of identifiers encountered so far.


int find(char *id) {
 int i;
 for(i=0;i<nLoc;i++) {
  if(!strcmp(id,tmp_idList[i]))
   return i;
 }
   
 strcpy(tmp_idList[nLoc],id);
 nLoc++;
 return (nLoc-1);
}

We shall need a C function (say, find(char *id)) to look up identifiers. It will check if id is already in the array tmp_idList. If so, it will return its index, else it will add id to the array, and then return its index. In the latter case nLoc will increase by 1.

A simple implementation is shown in the box to the left. In order to use this function in the actions in the bison input file, we have to write the function in the fourth part of that file. Also, the function uses strcmp to compare character strings, and strcpy to copy from one character string to another. These functions are declared in the C header string.h. So we need to insert the line


#include <string.h>

in the first part of the bison input file.

In the next page will show us how to use actions to obtain the intermediate representation of the information.

hit counter