Arnab Chakraborty's advanced compiler tutorial: page 2 of 3

www.angelfire.com/dragon/letstry `cwave04 at yahoo dot com`
Home > ... tutorials > ... ... Compiler (advanced)

Information flow

We have mentioned that parsing effectively extracts the information present in a program, and presents it in a different (more easily digestable) form. In all practical situations this means filling in some global data structures. The information contained in the original program is naturally divided into two parts:

the information in the sequencing of tokens
the information in the lexemes

The following AdvLang program, for example,


LOCATION house
DESCRIPTION "You are in front of a house"
NORTH forest
SOUTH hill

becomes the following list of tokens after lexical analysis


tok_LOCN tok_IDENT
tok_DESCR tok_STRING
tok_DIRN tok_IDENT
tok_DIRN tok_IDENT

From just the sequence of tokens we can gather that there is one location with two exits. The exact name, description and the exits are not avaialable from the tokens. Those come from the lexemes.

While parsing we construct a parse tree. The structure of the tree itself gives us the information contained in the tokens, while the information from the lexemes are associated with the leaf nodes. These sundry pieces of information are to be "pumped up" the tree and gradually fed into the global structures.

The structure of the tree dictates which piece should fit where in the global structures. Consider the rule


locSpec : tok_LOCN tok_IDENT

When we encounter this rule we already have the global structures partially filled up. In particular, we know how many locations we have already encountered. If this number is n, then the current location is being stored in locList[n] (the earlier ones already occupying postions 0 to n-1). So we know that the lexeme associated with the token tok_IDENT must be stored in the variable locList[n].id. So we summarily dispatch the lexeme to this variable by associating the following action with this rule:


locSpec : tok_LOCN tok_IDENT 
   {locList[n].name = $2;}

[Incidentally, this is slightly different from what we had done originally in our AdvLang example, because there we were also converting the name into a number. But the difference does not change the main point being discussed here.]

This is an example where we could send the information directly from a leaf node to the global structures. However, this is not always possible as we see in the next example.

Example: Consider a language that allows expressions made by adding and multiplying numbers like
1 + 2 * 3 + 6 * 4 * 5
The compiler is required to evaluate such expressions keeping precedence in mind (i.e., multiplications should be done before additions). One possible grammar is
1) expr : prod 2) expr : expr PLUS prod 3) prod : N 4) prod : prod TIMES N
Since the final answer is going to be just a number, all the global structure that we shall need here is a single variable to store that number. Let us think about the action that we should associate with rule 3. We have a single token in the r.h.s., the lexeme for which is a number. Is there any way for us to know at this stage how this number is going to contribute the end result? No! Its contribution to the final answer will depend on the other numbers (some of which we have not even read before coming to this rule)! In such a situation we have no option but to store the number in some temporary location to be retrieved later. Bison provides a simple mechanism to do this. It allows us to attach a variable with each nonterminal. The variable can be of any data type we want (and hence if we want to associate many variables to a single nonterminal we can simply pack them into a single structure). This is done in exactly the same way as attaching lexemes to tokens. However, in the context of nonterminals, we use the term attribute instead of lexeme. Before taking a look at how this is done in a Bison file we shall first see how the attributes help information flow up the parse tree.

It should not be too difficult to see how the information is "pumped" up the tree. The numbers in the parantheses are the temporary values associated with the nodes. The blue number are are the lexemes (extracted from the input), the red one the attributes (computed stepwise starting from the lexemes). Here the final answer is stored in the attribute corresponding to the highest node in the tree. It is only this number that needs to be stored in the global variable. Now we shall see how to achieve this using actions in a Bison file. First we need to tell Bison that we want to associate attributes to the non terminals. In our simple example all the attributes are just numbers (say integers).
%union { int val; } %token <val> N %type <val> expr prod
In fact, Bison provides an easier alternative for such simple situations where all the attributes and lexemes are of the same type. However, we do not discuss that alternative here, since in all but trivial cases we need attributes of different types for different nonterminals.
Next we add the actions.
expr : prod { $$ = $1; } expr : expr PLUS prod { $$ = $1 + $3; } prod : N { $$ = $1; } prod : prod TIMES N { $$ = $1 * $3; }
This is much like what we have already seen in the AdvLang example, except for the new symbol $$. It stands for the attribute of the l.h.s. Also, recall that $i stands for the attribute/lexeme associated with the i-th symbol (nonterminal/token) in the r.h.s.
There is just one little thing that needs fixing here. Notice that none of the actions modify the global variable. Yet we must put the final value in the global variable! We have already learned the trick to do this: use synonymous nonterminals as in the first rule below:
finalExpr : expr { globalAns = $1; } expr : prod { $$ = $1; } expr : expr PLUS prod { $$ = $1 + $3; } prod : N { $$ = $1; } prod : prod TIMES N { $$ = $1 * $3; }
What is the attribute of the new nonterminal finalExpr? Well, it does not need an attribute, because when we come to the first rule (which, by the way, happens at the very last step) we already know the final answer. So we simply put that in the global variable (which we have called globalAns in this example). No more "pumping up" is needed at this point, so no attribute is needed for finalExpr.