|
www.angelfire.com/dragon/letstry
cwave04 at yahoo dot com |
|
|
What do the states mean?
The states encode some summary information about the current
scenario, like what we have read so far, what reductions have
been done etc. The states do not contain the entire history, just
the barest minimum to continue the parsing. To debug a parser it
is important to know what each state means. The
.output file gives brief hints about the meaning of
each state. These constitute the second part in the description
of each state. Let us take a look at state 2 from
test.output once again.
Part of test.output
state 2
|
list -> list . N (rule 2)
| 2 |
$ go to state 4
N shift, and go to state 3
|
|
Explanation:
This means we have just now finished reading a
list and expect to see an N. Thus the reading
head is just after a list. We do not know if indeed the
next symbol is going to be an N, but that is what the
parser is expecting to see.
Bison does not include the
second parts for states 0 and the last two states. This is
because their interpretations are trivial:
- The first state expects to see a valid program (goes without
saying!)
- The last two states are reached when we have finished reading
the program (and are facing the $). So they do not expect
to see anything. They know that they are standing bang in
front of the $.
Ambiguous grammars
We are learning debugging a grammar. But all our grammars so far are
lacking one vital ingerdient for debugging: they do not have any
bug! Let us rectify this inconveneint state of affairs by
deliberately introducing one.
| Example:
We shall work with the following grammar for lists of numbers
(denoted by the terminal N).
list : N
list : list list
The idea behind the grammar is simple. A single number is a valid
list. If we append one list after another then we get a new list.
However, this grammar is ambiguous because the program
N N N
has two possible parse trees shown below.
Let us see how bison deals with
this grammar. First, we feed the following file into bison:
amb.y
%{
%}
%token N
%%
list : N
| list list
%%
|
Bison prints the following cryptic message:
amb.y contains 1 shift/reduce conflict.
So something has gone wrong. But what? To find out we have to
use the
commandline option -v :
bison -v amb.y
and then look at the generated amb.output file.
The file starts with a summary of the problem:
State 3 contains 1 shift/reduce conflict.
We still do not know what a "shift/reduce conflict" is, but we
can guess that bison is unable to decide whether to shift or to
reduce in state 3. Here is the description of this state:
Part of test.output
state 3
|
list -> list . list (rule 2)
list -> list list . (rule 2)
| 2 |
N shift, and go to state 1
N [reduce using rule 2 (list)]
$default reduce using rule 2 (list)
list go to state 3
| 3 |
|
Explanation:
This part now contains two dotted rules. This means
state 3 is reached when either we have finished reading a
list
and expecting another; or when we have finished
reading two lists. The fact that there are two dotted rules
in this state is not a danger signal by itself.
It is in the thrid part that we see the trouble. If
the next symbol is an N, then bison gets confused: whether
to shift or reduce by rule 2.
|
Well, that is as much as we can expect from bison by way of
diagnosis. Now we have to track the cause behind this error using
some trial-and-error and educated guessing.
Remember that ambiguity is a flaw of the grammar. It means that
there is at least one program that is syntactically correct
according to this grammar, but that still has more
than one parse tree. Ambiguity is not a flaw of an
individual program. It is quite possible that some
program written according to an ambiguous grammar has exactly one
parse tree.
This may sound confusing. Let us compare debugging a grammar with
debugging a C software. Suppose that you have written a C
software, and sold it. A user sends you the feedback that for
some input your software generates a segmentaton fault at line
number 234. However, the user does not mention what particular
input has caused the fault.
So you know that your C software is not free of bugs, since at
least one input has caused it to trip. It would definitely been
of help to know that troublesome input, but all that you know is
the nature of the problem (segmentaton fault) and its location in
the code (line number 234).
How do you proceed your C software based on this? There are
basically two techniques:
- Try out some random inputs hoping to reproduce the error.
This is a simple method that has a rather low chance of success
if the software is complex.
- Look carefully at line number 234, and mentally visualise
scenarios that might cause the code at that line to break. This
requires some debugging experience, but is more effective than
the naive method. Indeed, this is how most progrmmers debug their
programs.
Well, the situtation is very similar for debugging a grammar. We
shall first try the naive approach, which will be quite effective
here since our grammar is very simple.
But first we shall construct a table from
amb.output , just as we did in the last page.
| N | $ | list |
0 | s1 | | 2 |
1 | r1 | r1 | r1 |
2 | s1 | 4 | 3 |
3 | s1/r2 | r2 | 3 |
4 | | 5 | |
5 | acc | acc | |
Notice how we have represented the shift-reduce conflict.
| Example:
Consider the program
N N N
We shall use the table to hand parse this program.
0
0 N 1
0 list 2
0 list 2 N 1
0 list 2 list 3
Well, we are now at state 3, and the next symbol is an N,
so we have hit the shift-reduce conflict. So now we have two ways
to proceed, each should take us to a valid but different parse
tree. Let us try shifting first.
0 list 2 list 3 N 1
0 list 2 list 3 list 3
0 list 2 list 3
0 list 2
0 list 4
0 list 5
and finally we accept the program.
It should not be difficult to see that this parsing corresponds
to the parse tree (a) shown earlier.
|
| Exercise 4.1:
Check that resolving
the shift-reduce conflict in favour of reducing
produces parse tree (b).
|
|