|
www.angelfire.com/dragon/letstry
cwave04 at yahoo dot com |
|
|
Some alternatives
Consider the AdvLang language once again. When one writes a
program in the AdvLang language one essentially expresses one's
idea about an adventure game. The AdvLang compiler turns this
idea into a full-fledged implementation. Similarly, the C
program
for(i=0;i<10;i++)
printf("Hello");
expresses the programmer's intention to print "Hello" 10
times. It is the job of the compiler to produce suitable machine
language instructions to fulfil this intention.
This is true about any language.
A language is a way to allow a user to express his/her ideas in a
simple
way, and a compiler is a software to turn those ideas into a
concrete implementation. But designing a language and its
compiler is not the only mean to this end. In this page we shall
discuss three alternatives to compiler construction. All the
three techniques will produce a "language" (some way to express
ideas) and a "compiler" (some automated way to implement those
ideas) without going through the hassles of writing an elaborate
flex-bison-based compiler. The alternatives are all less powerful
than what we have already learned, but their simplicity may more
than offset their lack of power for
small scale applications.
Writing C functions and macros
This is a very powerful alternative to building a compiler.
In any compiler the final output
is actually produced by the actions attached with the
grammar rules. The grammar simply coordinates
when these actions are triggered. So why not write these actions
as single C functions/macros and simply call them directly. For
instance the above AdvLang program may be written as:
#include <game.h>
Location house;
house = newLocation("Your house",
"You are standing\nin front of your
house.\nPaths lead towards east and west.");
addExit(house,EAST,flag);
addExit(house,WEST,forest);
startAt(house);
Here Location is the pointer to some suitable
structure, and
newLocation , addExit are functions that
you shall write once and for all (and shipped via game.h). The
end-user need not know
about these, or even may not know C.
The advantage is obvious: you do not need to write any lexical
analyser or parser (you are just borrowing those from the user's
C compiler). You do have to write the structure for location and
the functions addExit etc, but anyway you had to
write them as actions in a bison file. There are even more
advantage for a user who knows C. Suppose that we want to make an
adventure game that is a maze of 100 locations arranged in a 10 by 10
grid. Such a thing now be created easily using a
for -loop in C.
The disadvantage is that such a "language" is very difficult to
debug or maintain. Imagine someone
forgetting to create a location that is mentioned somewhere in
the game. The C compiler will not generate any error message for
this omission. Only there will be some nasty stuff (anything from
a segmentation fault, access violation to wrong messages to
system crash) during run time.
In short, such a "language" is not structured. And without
structuring it is impossible to have automated error checkings.
LaTeX (which may easily win a comptetion for being the world's
worst-designed language) is just one such "unstructured" beast
hacked out of macros written in TeX (which by itself is a brilliant
low-level language).
Having said all these, I must hasten to add that for a small
scale language this approach is indeed pretty cool. In fact, I
would strngly suggest that you always start out like this when
you want to design a new language. Once you are happy with the
"unstructured" language you can later put it in a flex-bison
wrapper (using your "unstructured" functions as actions).
OpenGL is a popular example of an "unstructured language". It is
basically a bunch of functions.
XML
The main job of a parser is to build the parse tree of a
program. In a sense, the parse tree contains all the information
needed by the compiler. The design of the grammar, the efforts to
avoid shift-reduce conflicts, all revolve around the central
theme: the parse tree.
But for many small scale problems the end-user may be be ready to
proviude the parse tree directly, obviating the need for the
compiler to construct it. This is precisely the reason why XML is
so popular, though many users may not be thinking of parse trees
when they write an XML file. Consider the AdvLang program
LOCATION house
NAME "Your house"
DESCRIPTION "You are standing\nin front of your house.\nPaths lead towards east and west."
east flag
west forest
START_AT house
Isn't the same information conveyed in the following XML snippet?
<ADVENURE start="house">
<LOCATION id="house">
<NAME>Your house</NAME>
<DESCRIPTION>You are standing\nin front of your house.\nPaths
lead towards east and west.</DESCRIPTION>
<EXIT dir="east">flag</EXIT>
<EXIT dir="west">forest</EXIT>
</LOCATION>
</ADVENURE>
Well, you may not quite like the appearance of those tags. They
do not look clean. But the advantage is that now you do not need
to design a compiler to parse it. Just use a standard XML
processing (say XSL and a standard browser) to work with the tree
directly. Any XML processor already checks nesting of the tags,
so that provides a primitive syntax. If you want some additional
restrictions then an extra DTD may help.
The advantage of the approach is that such a language may be
created very easily on the fly. One can add extra features to the
language equally easily.
The disadvantage is that XSL provides only limited programming
functionalities. In particular, its cross referencing scheme is
notorious. So this approach is not a good choice for languages
where some variable is used in many times throughout a program.
But a language based on only "local information" is an ideal
candidate for XMLifying!
VRML is a "language" that is built like this using XML.
Embedding one language inside another
This is a nice technique that is gaining currency rapidly. The
idea originates from the need to add extra features to some
existing language. For example, Perl, Python and Ruby allow
multi-line strings. Thus you can write
print '''Usage notes for my program:
Please type
superprog [options] filename
where [options] is one or more of
a : abort
b : burst
c : cancel
d : die
e : explode (or E : expire)'''
Unfortunately C does not allow such a thing. Can we add design a
language that will be C plus this feature? Clearly, rewriting the
C complier is out of the question. It is in such a scenario that
the embedding trick is useful.
All that we need to do is this: write a program that takes a
multiline string and adds a backslash just before every newline
in it. Now make your program read a C program copying
everything until it comes to the first occurence of
''' . From then on it keeps on adding backslashes
before newlines until it meets the closing ''' . Now
compile the resulting C program. You can consider this simple
program as the front end of the your compiler, the resulting c
program as the intermediate representation and the actual C
compiler as the backend. You may even pack the front and back
ends in a single shell script (that will delete the intemediate C
file after the compilation is over). The fornt end may be written
in any language. I find Perl or Ruby a very good choice. You can
also do it using flex.
Many standard "languages" use this technique already:
-
C preprocessor: Though we may be used to considering the C
preprocessing directionves like
#define ,
#if #elif etc as part of C, in fact
these constitute a language of their own embedded inside C.
The c preprocessor (which is the compiler for that embedded
language) merely scans through the C program to operate on the lines
starting with # treating everything else basically as
meaningless characters (except substituting the macros).
-
JSP: There is much noise about the Java Server Pages
(JSP) technology, which is little more than our simple technique
applied to embed ordinary multiline texts inside Java
servlets. Indeed, I once wrote a little 10-line Perl code to
create my own JSP.
-
Bison: Yes, even bison itself uses the same trick in
its gramar file, which is just a grammar emebedded in a C
program. The
%{ , %} and
%% 's all keep the embedded grammar separate from the
C part.
All the examples so far are where the technique is used
sequentially. First the embedded compiler runs to process
the embedded parts, and then the main compiler runs. Sometimes
the the two compilers run independently or in
parallel. Then the original file is like two interlaced
programs. Each compiler knows how to isolate its own part from
the rest. Here are some real examples:
-
Java and Javadoc: The inline documentation facility of Java is
basically a very simple scripting language (much like our
AdvLang) embedded inside Java. The embedding is marked by a
special commening scheme to keep the regular
javac from
getting confused. By running the javac compiler we
can produce the .class file as usual. By running the
javadoc compiler we can produce the html documentation
files.
- Postscript and DSC:
Postscript is a language for
producing printable documents containing text and
graphics. However, in its original incarnation it was meant for
the printer, and did not allow
easy navigation in an onscreen viewer. The navigation features
were added to it by embedding a script-like language called DSC
(Display Setting Control) in a regular postscript file. A
Postscript file without this embedded part will print all right,
but cannot be navigated using a viewer (e.g., cannot jump
to a page, go to the previous page etc).
|