Arnab Chakraborty's advanced compiler tutorial: page 5 of 3

www.angelfire.com/dragon/letstry `cwave04 at yahoo dot com`
Home > ... tutorials > ... ... Compiler (advanced)

Handling conflicts

Now that we have seen how a shift-reduce conflict corresponds to multiple parse trees, it is time to know how to handle it. There are three ways:

Bison already has a default way to handle shift-reduce conflicts. It simply resolves a conflict in favour of shifting. If this suits our purpose then we may just leave the grammar as it is, and not bother about the shift-reduce conflicts. This is not a good practice.
Sometimes it is possible to explicitly advise bison to resolve the conflict in favour of reducing or shifting. This is achieved using the %left and %right directives. While these may appear to be a nifty way out, these are like patches to hide a problem, rather than fix it. Except in absolutely trivial arithmetic examples these patches make the grammar difficult to read or maintain.
The best method is to analyse the grammar to understand what is causing the ambiguity, and then to rewrite the grammar to avoid the problem. This rewriting may involve making minor changes to the language itself, or just recasting the grammar rules in a different way.

In order to follow the last method we need to know why a grammar fails to be unambiguous. It is not just enough to get an example where it fails. It is important to know why it fails. This is where one needs experience, and some educated guessing. We are going to learn this now.

Example: Why is the following grammar ambiguous?
list : N list : list list
Because, when we have already found two lists and the next symbol is an N then two possibilities open up: either we can combine the two lists already found, or read the new N. The ambiguity stems from this dilemma. We need to explicitly tell bison which option we want. If we want to first combine the two lists before moving on (i.e., want to resolve the conflict in favour of reducing) then we should write the grammar as
list : N list : list N
If we want to read the N first then we should use
list : N list : N list
Both these grammars are umambiguous and describe the same language as our original grammar.

In this example it was enough to recast the grammar without changing the underlying language. Sometimes, however, the language may need minor modification. This is when the language is intrinsically ambiguous (meaning that it has no unambiguous grammar describing it). The next example ilustrates such a case. This example often crops up in various forms.

Example: The language that we want to design here is for a list of items, where at item may be optional. One situation that calls for such a language is where we make a list of choices, where some choice(s) may be left out (say, there are defaults). Here is our first attempt at grammatising this language:
list : optChoice list : list optChoice optChoice : /*empty*/ | CHOICE
If we feed this grmmar into bison we shall get 3 shift-reduce conflicts (1 in state 0, and 2 in state 2). The number of shift-reduce conflicts does not reflect the difficulty of the problem. A single problem often triggers a large number of conflicts.
Here are the troublesome states:

The troublesome states

state 0

CHOICE shift, and go to state 1 CHOICE [reduce using rule 3 (optChoice)]
1

$default reduce using rule 3 (optChoice) list go to state 2 optChoice go to state 3 state 2 list -> list . optChoice (rule 2)

$ go to state 5 CHOICE shift, and go to state 1 $ [reduce using rule 3 (optChoice)] CHOICE [reduce using rule 3 (optChoice)]
2

$default reduce using rule 3 (optChoice) optChoice go to state 4

Explanation:

1:

Here we are in state 0 (i.e., parsing has just started) and we have come accross a CHOICE. The dilemma is whether we treat this as the first item in our list, or do we imagine that there is one or more empty choices preceding it? Well, this is a real problem in the language itself.

2:

The next two conflicts are also about the same thing. Every time we meet the next item (either new CHOICE or $) we cannot understand if there are empty items before it.
So our list is like a collection of ghosts in a room some of whom may be invisible, and we are trying to count how many ghosts there are. We can never be sure if we are leaving out some invisible memebers from that ethereal lot. Thus, there is no way of distinguishing
CHOICE empty CHOICE empty CHOICE
from
CHOICE CHOICE empty empty CHOICE empty

A simple solution is to use a dilimited list, say using a comma as a separator. Then the two lists above will have distinct representations:
CHOICE , , CHOICE , , CHOICE
and
CHOICE , CHOICE , , , CHOICE ,
A grammar for this is
list : optChoice list : list COMMA optChoice optChoice : /*empty*/ | CHOICE
Now all the conflicts will go away!

The next example is motivated by an adventure language called TADS.

Example: In TADS we allow interesting objects (jewel-encrusted eggs, towels covering holes, and the like) to be strewn around the adventure land. To add to the sensation these objects can be made to appear at specific moments and specific places. So we want commands like
OBJECT egg IN kitchen AT 5
Here 5 denotes a time point (say after the player has made 5 moves). We also want to allow the following to mean the same thing.
OBJECT egg AT 5 IN kitchen
Now some objects are not placed anywhere specifically. Similarly, specification of the time point is also optional. So we also want to allow commands like
OBJECT egg OBJECT egg IN kitchen OBJECT egg AT 5
Our first attempt to grammatise the language is
obj : OBJECT ID optPlace optTime | OBJECT ID optTime optPlace optTime: /*empty*/ | AT NUM optPlace: /*empty*/ | AT ID
Unfortunately, bison does not quite approve of this brilliant effort: it complains of 2 shift-reduce and 1 reduce-reduce conflicts!. We may easily guess what a reduce-reduce conflict is: when it is possible to arrive at valid parse trees by reducing using any one of a number of rules.
As usual we look at the .output file to learn that all the conflicts occur in state 2:

The troublesome state (after minor editing)

state 2 obj -> OBJECT ID . optPlace optTime (rule 1) obj -> OBJECT ID . optTime optPlace (rule 2)

AT shift, and go to state 3 AT [reduce using rule 5 (optPlace)]
1

IN shift, and go to state 4 IN [reduce using rule 3 (optTime)]
2

$ reduce using rule 3 (optTime) $ [reduce using rule 5 (optPlace)]
3

$default reduce using rule 3 (optTime) optTime go to state 5 optPlace go to state 6

Explanation:

1:

We come to this state after reading OBJECT followed by ID. Now we are expecting the optional time and place specification (in some order). The first shift-reduce conflict arises if the next symbol is an AT. So we are going into the time specification. But what about the place specification? Do we assume that it was supposed to precede the time specification and was omitted? Or should we expect the place specification after the time specification? In the first case we should reduce, in the second case shift.

2:

This is analogous to the above problem when we get a place specification first.

3:

If we meet the end-of-file immediately after OBJECT and ID then both the time and place specifications have been omited. But our grammar gets confused about a silly point: is it an omitted time spec followed by an omitted place spec, or the other way round?
Here the trouble lies in the grammar, and not in the language itself. The bad thing about the grammar is that it makes the compiler worry about the place spec when it meets a time spec (and vice versa). A simple way out is to list all the 4 possiblities separately (we still have 6 rules in all!):
obj : OBJECT ID placeSpec timeSpec | OBJECT ID timeSpec placeSpec | OBJECT ID placeSpec | OBJECT ID timeSpec timeSpec: AT NUM placeSpec : AT ID

The above trick worked because we had only 2 types of specifications. But if we had 10 different types, each being optional and presentable in any order, then we cannot make each possibility a separate rule (we shall need 2¹⁰=1024 rules for that)! In such a case we solve the problem at the semantic level. In fact, we can handle even more complicated situations at the semantic level as shown in the next example.

Example: We want to make a list of musical pieces where the following details will be provided: title, composer, duration, instrument, origin, key, comments. Of these title and key are compulsory, all the rest being optional. There may be any number of comments, but no other fields may be repeated. Title must be the first in the list, the others may occur in any order.
The plethora of fields and conditions immediately suggest that we should keep the grammar simple and handle the details at the semantic level.

Many ambiguity problems arise because of allowing arbitrary ordering and optional items as the above examples show. Next we shall learn about a different source of ambiguity. Sometimes the same terminal is used for different purposes. For example in C the colon used in both a switch-case statement as well as in a ternary expression. Re-using a terminal carelessly may cause different parts of a language to "clash". The next example shows such a case.

Example: Suppose that we want to design a C-like language. It will have assignments, functions and 1-diml arrays. Our aim is to allow statements like:
a = 2 myArray(29) = 45 v = fn() w = doSomething(2,3,a)
Here is the fitst version of our grammar.
stmt : lhs EQ rhs lhs : var | arrayElt var : ID arrayElt : arrayName LPAR index RPAR arrayName : ID index : expr expr : NUM | var | arrayElt | func rhs : var | arrayElt | func func : funcName LPAR argList RPAR funcName : ID argList : /*empty*/ | argList COMMA arg arg : expr
Bison reports 1 reduce-reduce conflict in state 8:

Part of the .output file

state 8 var -> ID . (rule 4) arrayName -> ID . (rule 6) funcName -> ID . (rule 16)

LPAR reduce using rule 6 (arrayName) LPAR [reduce using rule 16 (funcName)]
1

$default reduce using rule 4 (var)

Explanation:

1:

The trouble is when the compiler sees LPAR just after an ID it cannot understand if this is an array expression or a function call. This is because we are using the LPAR terminal for both the pursposes.
This language is intrincically ambiguous since an expression like
abc(2)
may either mean calling a function abc with argument 2, or the element at postion 2 in an array called abc.
A remedy is to use different kinds of brackets for functions calls and arrays. This is what is done in C.