|
www.angelfire.com/dragon/letstry
cwave04 at yahoo dot com |
|
|
Debugging a grammar
Recall from the elementary tutorial that parsing basically
consists of grouping terminals into non-terminals, and the
non-terminals into still higher level non-terminals until we
reach the topmost non-terminal. We have also mentioned how this
hierarchical grouping may be visually presented as a parse
tree. If it is possible to group the terminals in more than one
way to arrive at the topmost non-terminal, then we have
ambiguity. In this conection we had mentioned that the
English language is ambuguous since it allows sentences like
I saw a girl with a telescope,
that allows two diffrerent possible groupings leading to two
different meanings:
I saw (a girl with a telescope),
and
(I saw a girl) with a telescope.
Finally we had remarked that we shall always steer clear of
ambuguous grammars. Indeed bison has built-in checks to detect
ambiguity in the grammar file given to it.
Unfortunately, the world is far from perfect. Though we would
love to shun ambuguous grammars by all means, we often end up
writing such grammars by mistake! Bison never fails to detect the
ambiguity, and prints some diagnostic messages trying to tell you the
reason behind the trouble, it needs some expertise to interpret
the messages, and to remove the ambiguity. This expertise is
essential for writing a grammar on one's own, just as debugging
ability is essential for software development.
Here we shall learn some useful techniques for debugging a
grammar when bison complains about it.
Making bison verbose
Bisons, as any zoologist will confirm, are not particularly
talkative beasts. The bison software also is rather terse in its
output. However, it is possible to make it more verbose by using
the commandline option -v as follows:
bison -v test.y
where test.y is the name of your grammar file. This
causes bison to output an extra file called test.output
(it just appends .output after your file root). This
file contains valuable debugging information (though in a
somewhat cryptic way). We shall now learn how o read this file.
| Example:
We shall work with the following grammar for lists of numbers
(denoted by the terminal N).
list : N
list : list N
|
The idea behind the grammar should be already familiar. A single
number is a valid
list. If we append a number after an existing list then we get a
new list.
Let us write the grammar in a file called test.y :
test.y
%{
%}
%token N
%%
list : N
| list N
%%
|
Now issue the command
bison -v test.y
Open the geenrated test.output file with your
favourite editor.
The file starts with a detailed description of the grammar and
the terminals and non-terminals used in it. Each rule is given a
reference number, and each terminal is also given a reference
number.
Next comes the most important part: the list of states. We shall
learn in a moment what a "state" is in this context.
A quick glance through test.output shows that there
are 6 states in all (more complicated grammars will have more
states). Each state has a typical structure. Consider state 2 as
an example
Part of test.output
state 2
| 1 |
list -> list . N (rule 2)
| 2 |
$ go to state 4
N shift, and go to state 3
| 3 |
|
Explanation:
This is the state number. The numbers start from 0.
This is a grammar rule with a dot somewhere in the
r.h.s. In this example there is only one such rule. It is
possible to have a longer list. This part could be absent also.
The thrird part contains a list of "what to do next"
actions. Each action starts with a terminal/non-terminal, followed by what to
do if we find that terminal. We shall explain the actions
soon. The dollar symbol ($) is a special terminal denoting the
end-of-file character.
The second part is optional. It is missing for states 0, 4 and 5,
in our example.
We shall start with the third part. This part tells the parser
how to make the parse tree.
| Example:
Consider the program
N N N
We already know that the parse tree is
Let us see how we arrive at this parse tree using the third parts
of the states.
We shall read the terminals one by one, shifting or reducing at
each step. At each step we shall be in a certain state. The
inituial state is state 0.
0
Now look at state 0 in test.y . It says if the next
symbol is the terminal N then shift and go to state 1. The
next symbol for us is indeed an N, so we shift:
0 N 1
So now we have to look up state 1 in
test.output . Here it says always reduce by rule
1. The symbol $default means "always", as can be
guessed). So we throw away the N (and the state 1 after
it) from our list and write list instead:
0 list
So it is as if we have got list at state 0. So we need
to consult state 0 again. There it said that if the next symbol
is a list then go to state 2. So we have
0 list 2
State 2 says that if the next symbol is N then shift and
go to state 3:
0 list 2 N 3
State 3 asks us to always reduce the list and N
into a list. This means throwing away the list
and N (along with the states sanwitched between and after
them) and write list instead:
0 list
So we back to the old situation: getting a list at state
0. We already know the prescription: go to state 2.
0 list 2
Now we are again in a familiar situation where the same old steps
repeat:
0 list 2 N 3
0 list 2
We have consumed the enture program. So the next terminal is the
end-of-file charcater or $. It is meaningless to read this (since
we cannot make the reading head cross the end-of-file). So state
2 simply advises us to go to 4, i.e., change the 2 to 4:
0 list 4
Remember that the next terminal is still $. So state 4 bids us to
go to state 5:
0 list 5
and state 5 tells us to accept the program as a valid one.
|
Well, it was a long example. But the process of looking up the
states in the
test.output file to decide the next step can be
simplified considerably if we represent the states in a tabular
form. The table will have one row for each state, and one column
for each symbol (terminal/non-terminal/end-of-file). The table
for our test.output is shown below.
| N | $ | list |
0 | s1 | | 2 |
1 | r1 | r1 | r1 |
2 | s3 | 4 | |
3 | r2 | r2 | |
4 | | 5 | |
5 | acc | acc | |
Each cell in this table has one of 5 types:
-
s followd by a state number: this means shoft and
goto this state.
-
r followed by a rule number: this means reduce by
this rule.
-
just a state number: this means go to this state.
-
acc: this means accept the program as a valid one.
-
empty: if you ever come to such a cell, then your
program has some syntax error.
| Example:
Let us see how the parser behaves when we feed a wrong program to
it. Well, in our super-simple grammar there is exactly one
wrong program: the empty file! This means we face the $ right at
state 0. The table shows that the corresponding cell is empty,
indicating a syntax error.
It is possible to show that the other empty cells are unreachable
(even by wrong programs).
|
| Exercise 3.1:
Generate the .output file for our AdvLang language
using bison. Then construct the table from it. Use the table to
hand parse a
syntactically correct AdvLang program and then a syntactically wrong
one.
|
|