|
www.angelfire.com/dragon/letstry
cwave04 at yahoo dot com |
|
|
Is ambiguity the only enemy?
All our conflicts so far originated from ambiguities (either just
in the grammar, or present intrinsically in the language
itself). However, it is possible to have conflicts even when the
grammar is free of ambiguity! Before we see an example of this,
a little bit of theory may help.
Bison is a tool that takes a grammar as input and produces a
compiler as output. If the grammar is ambiguous then it produces
an error message. So we can easily imagine that bison has some
inbuilt check against ambiguity. Now it is a remarkable fact of
mathematical logic that it is impossible to write a
program that will always sucessfully detect ambiguity in a
grammar! I am not kidding! Nobody can ever write such a program even
if allowed any amount of RAM and hard drive! If you are new to
such "impossibilities" then this might very well strike you as
odd. But there is a celebrated theorem called Goedel's
Incompleteness Theorem that may be used to prove such
"impossibility" claims.
Let us clearly understand what this impossibility means for
bison. Whatever check bison may keep against ambiguity cannot be
fool-proof. So one of two things must happen:
either
the check is a bit too weak, in which case some ambiguous
grammars will sneak in.
or
the check is is a bit too stringent, in which case some
unambiguous grammars will be left out (along with all the
ambiguous ones).
The first case will have terrible consequence (you may have one
parse tree in mind, while bison ends up producing another). The
second case is not too bad hoping that not too many useful
unambiguous grammars are left out. It is somewhat like choosing
the security level of a firewall: better be too stringent than
too lenient. The situation is is depited in the diagram below.
The criterion that bison uses is not very complicated. It always
looks at the next terminal before making its decision about
shifting or reducing. If it is not clear which one to do even
after looking at the next terminal, then bison complains about a
conflict. Bison will never look ahead more than one
terminal. So grammars where the conflict could be resolved by
loking at, say, the next two terminals are flagged by
bison as bad.
The next example illustrates this situation.
| Example:
This is actually a continuation of the last example where our
attempt to create a C-like language fell flat because functions
"clashed" with arrays. We suggested use of different delimiters
for functions and arrays (say, square brackets for arrays, and
round brackets for functions, just as in C).
But some smart guy comes up with a different suggestion: what
about if we do not allow functions to have parameters? Then a
function will look like
abc()
while an array with the same name will look like
abc(2)
Accordingly we change the grammar to the following.
stmt : lhs EQ rhs
lhs : var
| arrayElt
var : ID
arrayElt : arrayName LPAR index RPAR
arrayName : ID
index : expr
expr : NUM
| var
| arrayElt
| func
rhs : var
| arrayElt
| func
func : funcName LPAR RPAR
funcName : ID
Unfortunately, bison still makes the same complaint, though the
grammar is no more ambiguous. To see why, consider the program
x = abc()
When the compiler reads the abc (which is an
ID) it cannot decide if it is a arrayName or a
funcName just by knowing that
next terminal is a LPAR. The conflict could be resolved if
only bison were less lazy, and had looked ahead at the terminal
after the
LPAR (the RPAR there would have then clearly showed
that we are in a
function call).
But alas, bisons have too short necks to
look ahead beyond the very next terminal, and are too obstinate to
proceed further before making up their minds!
|
This die-hard habit of bison's of never looking ahead beyond more
than the next 1 terminal has earned its favourite grammars a
special name: LALR(1). Here the first LA stands for
Look Ahead, and the 1 inside the parantheses
tells us the range of looking ahead. If you are curious about the
LR part it has a less exciting genesis: the L
means bison-generated compilers read a program
Left-to-right (just like any ordinary mortal). The
R means bison generates the Rightmost derivation
(i.e., the newly generated non-terminal after each reduce
step is immediately to the left of the dot).
"Clashing" of different parts of a grammar sometimes happens
because of using two different non-terminals for the same
purspose.
| Example:
Some books have authors, while some have editors (like
anthologies). Is this idea properly captured in the grammar below?
authorOrEditor : author
| editor
author : NAME
editor : NAME
The answer is "No", because when the compiler meets a NAME
it has no way of knowing if it is an author or an
editor. This will give rise to a reduce-reduce conflict.
We should use a grammar like the following:
authorOrEditor : AUTH author
| EDTR editor
author : NAME
editor : NAME
or,better still,
authorOrEditor : AUTH NAME
| EDTR NAME
|
Using different non-terminals for the same purpose often occurs
when more than one people work on the same grammar. Suppose that
you have written a long grammar with lots of non-terminals. I
want to add some extra features that needs a list of floating
point numbers somewhere. So I create a non-terminal called
fpList. Now you already had a non-terminal called
intList (list of integers) in your gammar. Also an integer
may be
considered as a special case of a floating point number. Then my
fpList may clash with your intList,
because a list of integers can be construed as either an
intList or a fpList. I said "may"
because the clash will be averted if the two lists
occur in completely different scenarios (e.g., an
intList must always be inside round brackets, while a
fpList can never have round brackets around itself).
| Exercise 6.2:
We want a language with statements like
john : author
mary : editor
Discuss the following grammars to achieve this:
-
authorOrEditor : authorName COLON AUTHOR
| editorName COLON EDITOR
authorName : NAME
editorName : NAME
-
authorOrEditor : NAME COLON AUTHOR
| NAME COLON EDITOR
-
creator : NAME COLON role
role : AUTHOR
| EDITOR
|
| Exercise 6.3:
Solve the same exercise once again, but this time without the colons.
|
| Exercise 6.4:
We know how to make a comma-separated list of items. Which of the
following variant(s) is(are) intrinsically ambiguous? Why? Give an
unambiguous grammar for each of the rest.
- A comma-separated list where an item is optional
- A comma-separated list where a comma is optional
- A comma-separated list where both an item and a comma is
optional
|
|