|
www.angelfire.com/dragon/letstry
cwave04 at yahoo dot com |
|
|
"Useless" production rules
Compiling a program, as we have learned it so far, it basically
grouping the tokens into nonterminals, and the nonterminals into
higher level nonterminals, and so on. The rules of the grammar
control this grouping. But sometimes we use rules that are
"useless" in the sense that they do not help in this
grouping. The language remains unchanged even if these rules are
removed. But still they are useful for a different reason. In
this page we shall explore these
An example
Consider the following grammar:
body : line
body : body SEMICOLON line
line : ITEM
line : line ITEM
The purpose of this should be obvious: we have a SEMICOLON-separated
list of lines, each line being a list of ITEMs. It usually the
case that you want to process each line in some identical way,
and once a line is processed the output is dumped to some place,
and we can reuse the variables to process the next line. The
question is "Where do we put this processing code?".
We cannot put the action after rules 3 or 4, because we can never
be sure at this point if we have come to the end of the line. We can put the
action after rules 1 or 2, but then we need to put the action
after both the rules, since we do not know if it is the
first line or not. This is not a good solution, because if you
have to modify the action later, you have to do so at two
places. A better solution is to add a "useless" rule:
body : processedLine
body : body SEMICOLON processedLine
processedLine : line
line : ITEM
line : line ITEM
We have introduced a new nonterminal processedLine and a
"useless" rule defining it. We call it "useless" because
apparently, it merely defines processedLine as a synonym
of the existing nonterminal line.
This clearly has no effect on the syntax of the
language. However, it has one advantage: it provides a convenient checkout point through which all
lines must pass. So any action to process complete lines may be
attached with this rule.
Empty rules
An empty rule is one whose r.h.s. is empty like the first rule below.
list :
list : list ITEM
We have already encountered such rules in two different
contexts. One is where we wanted to have lists with a nonnegative
(possibly zero) number of items (as in the two rules above). The
other context was in providing defaults as in
optComma :
optComma : COMMA
where optComma denotes an optional comma.
Now we shall learn a third use for empty rules, for performing
scratch computations. Consider a language that allows tables like
the following
Index Age
12 23
14 56
7 24
Such a table may be considered as a header followed by a body,
where the body is a list of lines (separated by EOL, some
end-of-line marker). Each line in its turn is a list of
numbers. The header is also a list, where each item is an
identifier.
So we can use the following grammar.
1) table : header body
2) header : ID
3) header : header : ID
4) body : line
5) body : body EOL line
6) line : N
7) line : line N
Now let us suppose that we want to store each line in an array,
process it, and update some global structure accordingly. How
long should the array be? We do not know the answer a
priori. Once we have finished processing a line we
do not need the array anymore, but can reuse it for the next
line. So this is what we want to do. We want to use a linked list
(or any form of stretchable list)
for the header (as we do not know its length beforehand). Once we
come to the end of the header, we know this length for sure, and
so want to allocate an array (once and for all) that can be used
for all the lines. The question that we want to address here is
To which rule do we associate the action to allocate the memory for the array?
Let us check the rules one by one:
- Rule 1 is not right place to put the action, because we need
the array much before we come to this rule.
- We cannot associate it with rule 2, because the header has just
started, and so the length is still unknown.
- We cannot associate it with rule 3, because the header may not
be complete yet, and so the length is still unknown.
- Rules 4 and 5 are used only after we have completely read a
line. We already need the array during reading a line!
- Rules 6 and 7 are also not appropriate, because these rules are
fired once for each number, and we surely do not want to
allocate fresh memory each time!
So appararently we have no place for the memory allocation
action! Ideally we should put it somewhere between the header and
the body, but unfortunately we are not allowed to insert
actions between rules! There are a couple of remedies for
this, as we discuss next.
Using an empty rule
Here we modify the grammar as follows.
1) table : header memalloc body
2) header : ID
3) header : header : ID
*) memalloc :
4) body : line
5) body : body EOL line
6) line : N
7) line : line N
The introduction of the dummy nonterminal memalloc (with
its empty rule *) does not affect the underlying syntax. However,
it does provide a convenient place to attach the memory
allocation action. Since the empty rule is used exactly once
after the header is completely read, and before the body has
started.
Using synonymous nonterminals
We need to take the memory allocation action after the header is
completely read. So it can be considered as part of
postprocessing the header. So accordingly we may use the
following grammar.
1) table : processedHeader body
*) processedHeader : header
2) header : ID
3) header : header : ID
4) body : line
5) body : body EOL line
6) line : N
7) line : line N
Again this does not change the underlying syntax, but provides a
rule to hold the memory allocation action.
|