|
www.angelfire.com/dragon/letstry
cwave04 at yahoo dot com |
|
|
Handling conflicts
Now that we have seen how a shift-reduce conflict corresponds to
multiple parse trees, it is time to know how to handle it. There
are three ways:
- Bison already has a default way to handle shift-reduce
conflicts. It simply resolves a conflict in favour of
shifting. If this suits our purpose then we may just leave the
grammar as it is, and not bother about the shift-reduce conflicts. This is not a good practice.
- Sometimes it is possible to explicitly advise bison to
resolve the conflict in favour of reducing or shifting. This is
achieved
using the
%left
and %right directives. While these may appear to be
a nifty way out, these are like patches to hide a problem, rather
than fix it. Except in absolutely trivial arithmetic examples these
patches make the grammar difficult to read or maintain.
- The best method is to analyse the grammar to understand what
is causing the ambiguity, and then to rewrite the grammar to avoid
the problem. This rewriting may involve making minor changes to
the language itself, or just recasting the grammar rules in a
different way.
In order to follow the last method we need to know why a grammar
fails
to be unambiguous. It is not just enough to get an example where
it fails. It is important to know why it fails. This is
where one
needs experience, and some educated guessing. We are going to
learn this now.
| Example:
Why is the following grammar ambiguous?
list : N
list : list list
Because, when we have already found two lists and the
next symbol is an N then two possibilities open up: either
we can combine the two lists already found, or read the
new N. The ambiguity stems from this dilemma. We need to
explicitly tell bison which option we want. If we want to
first combine the two lists before moving on
(i.e., want to resolve the conflict in favour of reducing)
then we should write the grammar as
list : N
list : list N
If we want to read the N first then we should use
list : N
list : N list
Both these grammars are umambiguous and describe the same
language as our original grammar.
|
In this example it was
enough to recast the grammar without changing the underlying
language. Sometimes, however, the language may need minor
modification. This is when the language is intrinsically
ambiguous (meaning that it has no unambiguous grammar
describing it). The next example ilustrates such a case. This
example often crops up in various forms.
| Example:
The language that we want to design here is for a list of
items, where at item may be optional. One situation that calls
for such a language is where we make a list of choices, where
some choice(s) may be left out (say, there are defaults). Here is our first attempt
at grammatising this language:
list : optChoice
list : list optChoice
optChoice : /*empty*/
| CHOICE
If we feed this grmmar into bison we shall get 3 shift-reduce
conflicts (1 in state 0, and 2 in state 2). The number of
shift-reduce conflicts does not reflect the difficulty of the
problem. A single problem often triggers a large number of
conflicts.
Here are the
troublesome states:
The troublesome states
state 0
|
CHOICE shift, and go to state 1
CHOICE [reduce using rule 3 (optChoice)]
| 1 |
$default reduce using rule 3 (optChoice)
list go to state 2
optChoice go to state 3
state 2
list -> list . optChoice (rule 2)
|
$ go to state 5
CHOICE shift, and go to state 1
$ [reduce using rule 3 (optChoice)]
CHOICE [reduce using rule 3 (optChoice)]
| 2 |
$default reduce using rule 3 (optChoice)
optChoice go to state 4
|
|
Explanation:
Here we are in state 0 (i.e., parsing has just
started) and we have come accross a CHOICE. The dilemma is
whether we treat this as the first item in our list, or do we
imagine that there is one or more empty choices preceding it?
Well, this is a real problem in the language itself.
The next two conflicts are also about the same
thing. Every time we meet the next item (either new CHOICE
or $) we cannot understand if there are empty items before it.
So our list is like a collection of ghosts in a room some of whom
may be invisible, and we are trying to count how many ghosts
there are. We can never be sure if we are leaving out some
invisible memebers from that ethereal lot. Thus, there is no way of
distinguishing
CHOICE empty CHOICE empty CHOICE
from
CHOICE CHOICE empty empty CHOICE empty
A simple solution is to use a dilimited list, say using a comma
as a separator. Then the two lists above will have distinct
representations:
CHOICE , , CHOICE , , CHOICE
and
CHOICE , CHOICE , , , CHOICE ,
A grammar for this is
list : optChoice
list : list COMMA optChoice
optChoice : /*empty*/
| CHOICE
Now all the conflicts will go away!
|
The next example is motivated by an adventure language called TADS.
| Example:
In TADS we allow interesting objects (jewel-encrusted eggs, towels covering
holes, and the like) to be strewn around the adventure land. To
add to the sensation these objects can be made to appear at
specific moments and specific places. So we want commands like
OBJECT egg IN kitchen AT 5
Here 5 denotes a time point (say after the player has made 5
moves). We also want to allow the following to mean the same thing.
OBJECT egg AT 5 IN kitchen
Now some objects are not placed anywhere specifically. Similarly,
specification of the time point is also optional. So
we also want to allow commands like
OBJECT egg
OBJECT egg IN kitchen
OBJECT egg AT 5
Our first attempt to grammatise the language is
obj : OBJECT ID optPlace optTime
| OBJECT ID optTime optPlace
optTime: /*empty*/
| AT NUM
optPlace: /*empty*/
| AT ID
Unfortunately, bison does not quite approve of this brilliant
effort: it complains of 2 shift-reduce and 1 reduce-reduce
conflicts!. We may easily guess what a reduce-reduce
conflict is: when it is possible to arrive at valid parse trees
by reducing using any one of a number of rules.
As usual we look at the .output file to learn that
all the conflicts occur in
state 2:
The troublesome state (after minor editing)
state 2
obj -> OBJECT ID . optPlace optTime (rule 1)
obj -> OBJECT ID . optTime optPlace (rule 2)
|
AT shift, and go to state 3
AT [reduce using rule 5 (optPlace)]
| 1 |
IN shift, and go to state 4
IN [reduce using rule 3 (optTime)]
| 2 |
$ reduce using rule 3 (optTime)
$ [reduce using rule 5 (optPlace)]
| 3 |
$default reduce using rule 3 (optTime)
optTime go to state 5
optPlace go to state 6
|
|
Explanation:
We come to this state after reading OBJECT followed by
ID. Now we are expecting the optional time and place
specification (in some order). The first shift-reduce conflict
arises if the next symbol is an AT. So we
are going into the time specification. But what about the place
specification? Do we assume that it was supposed to precede the
time specification and was omitted? Or should we expect the place
specification after the time specification? In the first case we
should reduce, in the second case shift.
This is analogous to the above problem when we get a
place specification first.
If we meet the end-of-file immediately after
OBJECT and ID then both the time and place
specifications have been omited. But our grammar gets confused
about a silly point: is it an omitted time spec followed by an
omitted place spec, or the other way round?
Here the trouble lies in the grammar, and not in the language
itself. The bad thing about the grammar is that it makes the
compiler worry about the place spec when it meets a time spec
(and vice versa). A simple way out is to list all the 4
possiblities separately (we still have 6 rules in all!):
obj : OBJECT ID placeSpec timeSpec
| OBJECT ID timeSpec placeSpec
| OBJECT ID placeSpec
| OBJECT ID timeSpec
timeSpec: AT NUM
placeSpec : AT ID
|
The above trick worked because we had only 2 types of
specifications. But if we had 10 different types, each being
optional and presentable in any order, then we cannot make each
possibility a separate rule (we shall need 210=1024
rules for that)! In such a case we solve the problem at the
semantic level. In fact, we can
handle even more complicated situations at the semantic level as
shown in the next example.
| Example:
We want to make a list of musical pieces where the following
details will be provided: title, composer, duration, instrument,
origin, key, comments. Of these title and key are compulsory, all
the rest being optional. There may be any number of comments, but
no other fields may be repeated. Title must be the first in the
list, the others may occur in any order.
The plethora of fields and conditions immediately suggest that we
should keep the grammar simple and handle the details at the
semantic level.
|
Many ambiguity problems arise because of allowing arbitrary
ordering and optional items as the above examples show. Next we
shall learn about a different source of ambiguity. Sometimes the
same terminal is used for different purposes. For example
in C the colon used in both a switch-case statement as well as in
a ternary expression. Re-using a terminal carelessly may cause
different parts of a language to "clash".
The next example shows such a case.
| Example:
Suppose that we want to design a C-like language. It will have
assignments, functions and 1-diml arrays. Our aim is to allow
statements
like:
a = 2
myArray(29) = 45
v = fn()
w = doSomething(2,3,a)
Here is the fitst version of our grammar.
stmt : lhs EQ rhs
lhs : var
| arrayElt
var : ID
arrayElt : arrayName LPAR index RPAR
arrayName : ID
index : expr
expr : NUM
| var
| arrayElt
| func
rhs : var
| arrayElt
| func
func : funcName LPAR argList RPAR
funcName : ID
argList : /*empty*/
| argList COMMA arg
arg : expr
Bison reports 1 reduce-reduce conflict in state 8:
Part of the .output file
state 8
var -> ID . (rule 4)
arrayName -> ID . (rule 6)
funcName -> ID . (rule 16)
|
LPAR reduce using rule 6 (arrayName)
LPAR [reduce using rule 16 (funcName)]
| 1 |
$default reduce using rule 4 (var)
|
|
Explanation:
The trouble is when the compiler sees LPAR just
after an ID it cannot understand if this is an array
expression or a function call. This is because we are using the
LPAR terminal for both the pursposes.
This language is intrincically ambiguous since an expression like
abc(2)
may either mean calling a function abc with argument
2, or the element at postion 2 in an array called
abc .
A remedy is to use different kinds of brackets for functions
calls and arrays. This is what is done in C.
|
|