|
www.angelfire.com/dragon/letstry
cwave04 at yahoo dot com |
|
|
Let's build a computer language
Why should we build a computer language? The main reason many people
dislike the subject of compiler contruction is not knowing an answer to
this question. We shall seek answer to this question first.
Let's start with a simpler question. What is a computer language? Most
people will answer, "It is like C or ForTran or Java, a system to write
instructions for the computer." That's almost correct. However, a computer
language need not at all be like C or ForTran or Java. Some of the most
exciting languages that you will be able to design after reading this
tutorial will be drastically different from C like languages.
The following examples are chosen with two criteria in mind. First, to
give an idea
of the wide range of languages in use. Second, all the languages (and
their compilers) are freely available on the web. So you can experiment
with them. In each example
we give a simple program and its output. The aim is to just acquaint you
with the variety of languages, not to teach you their details.
In particular, each of
the languages have many more features than are covered in the examples.
Notation-like languages
| Example:(ABC)
There is a music language called ABC. A program in this language is
basically a text file containing a piece of music. The following ABC
program is a sample.
X: 1
T: Hindu Spiritual Hymn
C: Trad.
M: 2/4
K: Cmaj
L: 1/4
GD|EF/F/|ED/C/|B,2|G,B,|C/D/E/F/|EC/B,/|C2|
CD/D/-|D/D/D/E/|FG/F/|G2|DG|FF/F/|E/D/E/D/|C2|
FG/G/|B/B/B/B/|cd/B/|c2|de/f/|e/e/d/d/|cd/B/|c2|
dc|BA/G/|AG/F/|ED/D/|DG|FF|E/D/E/D/|C2|
DD|DD/E/|FG/G/|G2|DG|FF/F/|E/D/E/D/|C2|
This is part of a Hindu scriptural hymn. There is a free software called
abc2midi that converts (or compiles) this into an audio midi file:
ram1.mid. To understand the ABC
language you
need to know a few simple facts about music. Any musical piece is
basically a sequence of notes. In Indian music, these are called
Sa, Re, Ga, Ma etc. The corresponding Western names are Do, Re, Mi
etc. Another Western notation is to use the letters of the alphabet: a,b,c
etc. The notes correspond to the (white) keys of the keyboard
instruments.
|
The names of the white keys in a keyboard |
Now that we know this much about music, let's take a look at the
ABC program. The first 6 lines specify various things about the
piece. The first line, for instance, says that it is piece 1 in the file.
The second line gives the name. The third line gives the name of the
composer (here it is Trad. meaning that it is a traditional tune of
unknown origin.) The next three lines are more technical: they specify the
scale and tempo of the piece. We do not need these details
here. The actual music starts from the 7-th line. It is basically a list
of key names separated by vertical bars. (The vertical bars mark a
musical time block.) The first block is GD, which means ``Press
the G key, hold it down for a second, release it, then press the D key,
hold it down for a second, and then release it." The next block is
EF/F/, which has similar interpretation. The slashes after the
F's mean you have to hold the F key down for only half a
second. There are many other features of the ABC language that are
given at here.
|
| Example:(TeX)
Typesetting a complicated mathematical formula elegantly is not always
easy. Consider the formula
|
A complicated math formula |
The main difficulty here is that a mathematical formula is not exactly
like a left-to-right text. A popular language to specify these is
TeX. Here is a TeX program to produce the above formula:
$$\Phi(u;\mu,\sigma) =
\int_{-\infty}^{u} {1 \over \sigma\sqrt{2\pi}}
e^{-{(x-\mu)^2\over 2\sigma^2}} dx.$$
\bye
At a first glance this program may appear too complicated. But like most
programs this is actually made of simpler building blocks. The greek
letters are written as \Phi, \mu etc. The integral sign
is produced by \int_{...}^{...}. The lower limit is written
inside the first set of curly brackets. The upper limit goes inside the second set
of curly brackets. Fractions are written as {... \over ...}, the
numerator and denominator are written before and after the
\over. Notice one little point: the caret (^) is used
whenever some text needs to be raised, e.g., the exponent of a
power, the upper limit of an integral. The amounts of raising are
different in these two cases. TeX figures out the correct amount from the
context.
TeX also lets you abbreviate frequently used parts of a program as
templates. For
instance, if we need the above formula many times, but each time with
different upper and lower limits for the integral, then we can define the
following template.
\def\myint#1#2{
\int_{#1}^{#2} {1 \over \sigma\sqrt{2\pi}}
e^{-{(x-\mu)^2\over 2\sigma^2}} dx.
}
Now you can write
$$\myint{-2}{3} = \myint{-2}{0} + \myint{0}{3}.$$
to produce
|
Script-like languages
| Example:(POV-Ray) Consider the following image.
|
Computer-generated photo-realistic image |
This was created in a wonderful language called POV-Ray. Here is the
program that created it. POV-Ray is freely available in the web and this
program comes with the POV-Ray software.
// Persistence Of Vision raytracer version 3.5 sample file.
// File by Alexander Enzmann (modified by Dieter Bayer)
//
// -w320 -h240
// -w800 -h600 +a0.3
global_settings { assumed_gamma 2.2 }
camera {
location <0, 8, -15>
look_at <0, 0, 0>
angle 46
}
light_source { <10, 30, -20> color red 1 green 1 blue 1 }
blob {
threshold 0.5
cylinder { <-7, 0, 0>, <7, 0, 0>, 4, 2 }
cylinder { <0, 0, -7>, <0, 0, 7>, 4, 2 }
sphere { <0, 3, 0>, 2.5, -4 }
pigment { color red 1 green 0 blue 0 }
finish { ambient 0.2 diffuse 0.8 phong 1 }
rotate <-30, 0, 0>
}
POV-Ray takes the above program as input, and produces the image shown
earlier. The meaning of the above program is pretty
self-evident. The lines starting with // are comments. The
global_settings command sets some parameter values. Then we
specify the position of the camera and the light source. Finally, we
describe the object itself as a blob, which is POV-Ray's term for
an object of non-standard geometry. It is described in terms of standard
geometric shapes like cylinders and spheres. In fact, the object is made
by intersecting two cylinders, and then denting out a spherical cap. After
describing the shape of the object, the program specifies the colour and
texture of the material. Finally, it rotates the object for better
viewing.
In the language of compilers we would say that the POV-Ray software is a
compiler for the POV-Ray language. It converts the description of
a scene to an image of the scene. If you want to know more about POV-Ray,
click here.
|
| Example:
(Virtual Reality Modelling Language) This is another script-like
language to describe 3-dimensional scenes. However, unlike POV-Ray, here
the generated scene is not just a static image. It is an interactive world
of virtual reality that you can manipulate with your mouse. For instance,
you can walk around an object and look at it from different directions.
It is somewhat difficult to understand what I mean here unless you
experience virtual reality yourself. For this you may download and install
a VRML viewer like Cosmoplayer. Here is a sample program
#VRML V2.0 utf8
# A Cylinder
Shape {
appearance Appearance {
material Material { }
}
geometry Cylinder {
height 2.0
radius 1.5
}
}
The program is not difficult to understand. It defines a cylinder of
some unspecified material. If you have a VRML-aware browser (e.g.,
Internet Explorer after installing Cosmoplayer) then click
here to see the above code in action. If you
do not have a VRML-aware browser, then you have to be satisfied with the
following screenshot.
|
Screenshot |
Rather complicated virtual worlds may be made using VRML, including
animation and interactivity.
|
| Example:
(Prolog) A completely different type of language is Prolog. This is
a script-like language to describe facts and to pose questions. Here is an
example. Consider the two facts:
If one studies math then one will pass the math exam.
Rita studies all subjects.
Based on these two facts, we want to answer the question: Will Rita pass
in math? We represent the facts as the following Prolog program:
passes(Student,math) :-
studies(Student,math).
studies(rita, _).
To make sense of this read ":-" as "if." The
"_" stands for "everything." We feed this program into a Prolog
system. The system processes these and asks for queries answerable based
on these facts. We wanted to know if Rita will pass in the math. So we
type
?- passes(rita, math).
Prolog consults the facts, and answers "Yes."
|
C-like languages
Examples of C-like languages include C itself, C++, Java, Fortran,
Basic etc. Many people think about only these when they hear about computer
languages. These are general purpose languages. This tutorial assumes that
you already know C, so we shall not give any example.
What's common in all these?
The languages described above are of widely differing types. Still there
are some fundamental aspects common to all of them:
Each language has its own application area. The application domain of
the ABC
language is music. The person who wrote the compiler for this language
obviously had to know about music and the midi format. Similarly, the domain of
POV-Ray consists of techniques to render 3-dimensional
photo-realistic images. For general purpose C-like languages, the domain
consists
of the details of the machine language into which the
program is translated by the compiler.
In each of the examples above, we have the concept of a low level
implementation. For ABC it is the midi format. The abc2midi
software is
just a compiler that translates an ABC program to midi format. Similarly,
POV-Ray relies on standard computer graphics tools at the low level to
draw lines, curves and points. Prolog turns everything into trees, and
manipulates the trees using low level tools possibly written as a set of
functions in C.
In all the examples, the domain knowlege is built into the
compiler. A program itself is just a description of
a problem in that domain. The job of the compiler is to apply the domain
knowledge to solve the problem, and to produce a low level implementation
of the solution. For instance, suppose that a music book contains a piece
of music written in tems of notations. You know the notations
theretically, but not how to play the piece based on them. Thus, the
notations for the music is just a description of the music. In order to
listen to the actual music, you may take the book to an expert pianist,
who can play the piece for you on the piano, record it, and hand you the
CD. The abc2midi compiler plays the role of this expert.
What will this tutorial teach?
Suppose that we have a domain of problems, where each problem is easier to
describe than to solve. However, we have enough domain
knowledge to solve each
problem (using a some suitable computer system). People might come to
us with description of their problems (from that domain) and ask us to
solve them. Then we can write a compiler to automate our work. First we
shall design a language in which the description can be written
easily. We shall expect our clients to supply their descriptions in this
language. Thus each client comes with a program in this language.
Next, we shall write a compiler to compile a program in that
language to automatically produce a solution.
Apparently, this depends on the domain in question. However, it is a
remarkable fact that compiler construction techniques are extremely
general and hardly depend on the specific domain. Thus, most
of the techniques that go into designing the ABC language and its
compiler are basically the same as what can be used for POV-Ray.
This tutorial tries to teach you these domain-independent, common
techniques. We shall of course use one particular domain for illustration
purposes. But the techniques themselves are prefectly general, and may be
applied to any other domain of your choice.
|