Arnab Chakraborty's compiler tutorial: page 1 of 11

www.angelfire.com/dragon/letstry
cwave04 at yahoo dot com

My Guestbook

Home >

... tutorials >

... ... Compiler

Let's build a computer language

Why should we build a computer language? The main reason many people dislike the subject of compiler contruction is not knowing an answer to this question. We shall seek answer to this question first.

Let's start with a simpler question. What is a computer language? Most people will answer, "It is like C or ForTran or Java, a system to write instructions for the computer." That's almost correct. However, a computer language need not at all be like C or ForTran or Java. Some of the most exciting languages that you will be able to design after reading this tutorial will be drastically different from C like languages.

The following examples are chosen with two criteria in mind. First, to give an idea of the wide range of languages in use. Second, all the languages (and their compilers) are freely available on the web. So you can experiment with them. In each example we give a simple program and its output. The aim is to just acquaint you with the variety of languages, not to teach you their details. In particular, each of the languages have many more features than are covered in the examples.

Notation-like languages

Example:(ABC) There is a music language called ABC. A program in this language is basically a text file containing a piece of music. The following ABC program is a sample.
X: 1 T: Hindu Spiritual Hymn C: Trad. M: 2/4 K: Cmaj L: 1/4 GD|EF/F/|ED/C/|B,2|G,B,|C/D/E/F/|EC/B,/|C2| CD/D/-|D/D/D/E/|FG/F/|G2|DG|FF/F/|E/D/E/D/|C2| FG/G/|B/B/B/B/|cd/B/|c2|de/f/|e/e/d/d/|cd/B/|c2| dc|BA/G/|AG/F/|ED/D/|DG|FF|E/D/E/D/|C2| DD|DD/E/|FG/G/|G2|DG|FF/F/|E/D/E/D/|C2|
This is part of a Hindu scriptural hymn. There is a free software called abc2midi that converts (or compiles) this into an audio midi file: ram1.mid. To understand the ABC language you need to know a few simple facts about music. Any musical piece is basically a sequence of notes. In Indian music, these are called Sa, Re, Ga, Ma etc. The corresponding Western names are Do, Re, Mi etc. Another Western notation is to use the letters of the alphabet: a,b,c etc. The notes correspond to the (white) keys of the keyboard instruments.

The names of the white keys in a keyboard

Now that we know this much about music, let's take a look at the ABC program. The first 6 lines specify various things about the piece. The first line, for instance, says that it is piece 1 in the file. The second line gives the name. The third line gives the name of the composer (here it is Trad. meaning that it is a traditional tune of unknown origin.) The next three lines are more technical: they specify the scale and tempo of the piece. We do not need these details here. The actual music starts from the 7-th line. It is basically a list of key names separated by vertical bars. (The vertical bars mark a musical time block.) The first block is GD, which means ``Press the G key, hold it down for a second, release it, then press the D key, hold it down for a second, and then release it." The next block is EF/F/, which has similar interpretation. The slashes after the F's mean you have to hold the F key down for only half a second. There are many other features of the ABC language that are given at here.

Example:(TeX) Typesetting a complicated mathematical formula elegantly is not always easy. Consider the formula

A complicated math formula

The main difficulty here is that a mathematical formula is not exactly like a left-to-right text. A popular language to specify these is TeX. Here is a TeX program to produce the above formula:
$$\Phi(u;\mu,\sigma) = \int_{-\infty}^{u} {1 \over \sigma\sqrt{2\pi}} e^{-{(x-\mu)^2\over 2\sigma^2}} dx.$$ \bye
At a first glance this program may appear too complicated. But like most programs this is actually made of simpler building blocks. The greek letters are written as \Phi, \mu etc. The integral sign is produced by \int_{...}^{...}. The lower limit is written inside the first set of curly brackets. The upper limit goes inside the second set of curly brackets. Fractions are written as {... \over ...}, the numerator and denominator are written before and after the \over. Notice one little point: the caret (^) is used whenever some text needs to be raised, e.g., the exponent of a power, the upper limit of an integral. The amounts of raising are different in these two cases. TeX figures out the correct amount from the context.
TeX also lets you abbreviate frequently used parts of a program as templates. For instance, if we need the above formula many times, but each time with different upper and lower limits for the integral, then we can define the following template.
\def\myint#1#2{ \int_{#1}^{#2} {1 \over \sigma\sqrt{2\pi}} e^{-{(x-\mu)^2\over 2\sigma^2}} dx. }
Now you can write
$$\myint{-2}{3} = \myint{-2}{0} + \myint{0}{3}.$$
to produce

Script-like languages

Example:(POV-Ray) Consider the following image.

Computer-generated photo-realistic image

This was created in a wonderful language called POV-Ray. Here is the program that created it. POV-Ray is freely available in the web and this program comes with the POV-Ray software.
// Persistence Of Vision raytracer version 3.5 sample file. // File by Alexander Enzmann (modified by Dieter Bayer) // // -w320 -h240 // -w800 -h600 +a0.3 global_settings { assumed_gamma 2.2 } camera { location <0, 8, -15> look_at <0, 0, 0> angle 46 } light_source { <10, 30, -20> color red 1 green 1 blue 1 } blob { threshold 0.5 cylinder { <-7, 0, 0>, <7, 0, 0>, 4, 2 } cylinder { <0, 0, -7>, <0, 0, 7>, 4, 2 } sphere { <0, 3, 0>, 2.5, -4 } pigment { color red 1 green 0 blue 0 } finish { ambient 0.2 diffuse 0.8 phong 1 } rotate <-30, 0, 0> }

POV-Ray takes the above program as input, and produces the image shown earlier. The meaning of the above program is pretty self-evident. The lines starting with // are comments. The global_settings command sets some parameter values. Then we specify the position of the camera and the light source. Finally, we describe the object itself as a blob, which is POV-Ray's term for an object of non-standard geometry. It is described in terms of standard geometric shapes like cylinders and spheres. In fact, the object is made by intersecting two cylinders, and then denting out a spherical cap. After describing the shape of the object, the program specifies the colour and texture of the material. Finally, it rotates the object for better viewing.
In the language of compilers we would say that the POV-Ray software is a compiler for the POV-Ray language. It converts the description of a scene to an image of the scene. If you want to know more about POV-Ray, click here.

Example: (Virtual Reality Modelling Language) This is another script-like language to describe 3-dimensional scenes. However, unlike POV-Ray, here the generated scene is not just a static image. It is an interactive world of virtual reality that you can manipulate with your mouse. For instance, you can walk around an object and look at it from different directions.
It is somewhat difficult to understand what I mean here unless you experience virtual reality yourself. For this you may download and install a VRML viewer like Cosmoplayer. Here is a sample program
#VRML V2.0 utf8 # A Cylinder Shape { appearance Appearance { material Material { } } geometry Cylinder { height 2.0 radius 1.5 } }
The program is not difficult to understand. It defines a cylinder of some unspecified material. If you have a VRML-aware browser (e.g., Internet Explorer after installing Cosmoplayer) then click here to see the above code in action. If you do not have a VRML-aware browser, then you have to be satisfied with the following screenshot.

Screenshot

Rather complicated virtual worlds may be made using VRML, including animation and interactivity.

Example: (Prolog) A completely different type of language is Prolog. This is a script-like language to describe facts and to pose questions. Here is an example. Consider the two facts:

If one studies math then one will pass the math exam.
Rita studies all subjects.
Based on these two facts, we want to answer the question: Will Rita pass in math? We represent the facts as the following Prolog program:
passes(Student,math) :- studies(Student,math). studies(rita, _).
To make sense of this read ":-" as "if." The "_" stands for "everything." We feed this program into a Prolog system. The system processes these and asks for queries answerable based on these facts. We wanted to know if Rita will pass in the math. So we type
?- passes(rita, math).
Prolog consults the facts, and answers "Yes."

C-like languages

Examples of C-like languages include C itself, C++, Java, Fortran, Basic etc. Many people think about only these when they hear about computer languages. These are general purpose languages. This tutorial assumes that you already know C, so we shall not give any example.

What's common in all these?

The languages described above are of widely differing types. Still there are some fundamental aspects common to all of them:

ABC

POV-Ray

low level implementation.

abc2midi

built into

description

What will this tutorial teach?

Suppose that we have a domain of problems, where each problem is easier to describe than to solve. However, we have enough domain knowledge to solve each problem (using a some suitable computer system). People might come to us with description of their problems (from that domain) and ask us to solve them. Then we can write a compiler to automate our work. First we shall design a language in which the description can be written easily. We shall expect our clients to supply their descriptions in this language. Thus each client comes with a program in this language. Next, we shall write a compiler to compile a program in that language to automatically produce a solution.

Apparently, this depends on the domain in question. However, it is a remarkable fact that compiler construction techniques are extremely general and hardly depend on the specific domain. Thus, most of the techniques that go into designing the ABC language and its compiler are basically the same as what can be used for POV-Ray.

This tutorial tries to teach you these domain-independent, common techniques. We shall of course use one particular domain for illustration purposes. But the techniques themselves are prefectly general, and may be applied to any other domain of your choice.

hit counter