Arnab Chakraborty's advanced compiler tutorial: page 1 of 3

www.angelfire.com/dragon/letstry `cwave04 at yahoo dot com`
Home > ... tutorials > ... ... Compiler (advanced)

"Useless" production rules

Compiling a program, as we have learned it so far, it basically grouping the tokens into nonterminals, and the nonterminals into higher level nonterminals, and so on. The rules of the grammar control this grouping. But sometimes we use rules that are "useless" in the sense that they do not help in this grouping. The language remains unchanged even if these rules are removed. But still they are useful for a different reason. In this page we shall explore these

An example

Consider the following grammar:


body : line 
body : body SEMICOLON line
line : ITEM 
line : line ITEM

The purpose of this should be obvious: we have a SEMICOLON-separated list of lines, each line being a list of ITEMs. It usually the case that you want to process each line in some identical way, and once a line is processed the output is dumped to some place, and we can reuse the variables to process the next line. The question is "Where do we put this processing code?".

We cannot put the action after rules 3 or 4, because we can never be sure at this point if we have come to the end of the line. We can put the action after rules 1 or 2, but then we need to put the action after both the rules, since we do not know if it is the first line or not. This is not a good solution, because if you have to modify the action later, you have to do so at two places. A better solution is to add a "useless" rule:


body : processedLine 
body : body SEMICOLON processedLine

processedLine : line

line : ITEM 
line : line ITEM

We have introduced a new nonterminal processedLine and a "useless" rule defining it. We call it "useless" because apparently, it merely defines processedLine as a synonym of the existing nonterminal line. This clearly has no effect on the syntax of the language. However, it has one advantage: it provides a convenient checkout point through which all lines must pass. So any action to process complete lines may be attached with this rule.

Empty rules

An empty rule is one whose r.h.s. is empty like the first rule below.


list : 
list : list ITEM

We have already encountered such rules in two different contexts. One is where we wanted to have lists with a nonnegative (possibly zero) number of items (as in the two rules above). The other context was in providing defaults as in


optComma : 
optComma : COMMA

where optComma denotes an optional comma.

Now we shall learn a third use for empty rules, for performing scratch computations. Consider a language that allows tables like the following


Index Age 
12    23
14    56
7     24

Such a table may be considered as a header followed by a body, where the body is a list of lines (separated by EOL, some end-of-line marker). Each line in its turn is a list of numbers. The header is also a list, where each item is an identifier. So we can use the following grammar.


1) table : header body
2) header : ID
3) header : header : ID
4) body : line
5) body : body EOL line
6) line : N
7) line : line N

Now let us suppose that we want to store each line in an array, process it, and update some global structure accordingly. How long should the array be? We do not know the answer a priori. Once we have finished processing a line we do not need the array anymore, but can reuse it for the next line. So this is what we want to do. We want to use a linked list (or any form of stretchable list) for the header (as we do not know its length beforehand). Once we come to the end of the header, we know this length for sure, and so want to allocate an array (once and for all) that can be used for all the lines. The question that we want to address here is

To which rule do we associate the action to allocate the memory for the array?

Let us check the rules one by one:

Rule 1 is not right place to put the action, because we need the array much before we come to this rule.
We cannot associate it with rule 2, because the header has just started, and so the length is still unknown.
We cannot associate it with rule 3, because the header may not be complete yet, and so the length is still unknown.
Rules 4 and 5 are used only after we have completely read a line. We already need the array during reading a line!
Rules 6 and 7 are also not appropriate, because these rules are fired once for each number, and we surely do not want to allocate fresh memory each time!

So appararently we have no place for the memory allocation action! Ideally we should put it somewhere between the header and the body, but unfortunately we are not allowed to insert actions between rules! There are a couple of remedies for this, as we discuss next.

Using an empty rule

Here we modify the grammar as follows.


1) table : header memalloc body
2) header : ID
3) header : header : ID
*) memalloc : 
4) body : line
5) body : body EOL line
6) line : N
7) line : line N

The introduction of the dummy nonterminal memalloc (with its empty rule *) does not affect the underlying syntax. However, it does provide a convenient place to attach the memory allocation action. Since the empty rule is used exactly once after the header is completely read, and before the body has started.

Using synonymous nonterminals

We need to take the memory allocation action after the header is completely read. So it can be considered as part of postprocessing the header. So accordingly we may use the following grammar.


1) table : processedHeader body
*) processedHeader : header
2) header : ID
3) header : header : ID
4) body : line
5) body : body EOL line
6) line : N
7) line : line N

Again this does not change the underlying syntax, but provides a rule to hold the memory allocation action.