Computational Chemistry: As I have Seen
this Creature
Computational chemistry (let’s here call it CC) must be meaning different things to different persons. I remember listening to a participant's lecture during one refresher course in 1999, delivered by an college lecturer pursuing active research in CC only, and I was astonished to have found that I could hardly relate to his research work, even though I had already by then gathered the basics of my CC-field! Likewise, I found that one of my old friend's post-doctoral works in computational heterogeneous catalysis also feels quite alien to me. So I now feel forced to believe that computational chemistry is like a huge elephant while we, the students of its diverse fields, are like a set of blind people proceeding to experience this huge animal by studying one of its different parts! Nevertheless I am daring to pen (keyboard) this introductory piece, just because I am emboldened with my awareness of this intrinsically one-eyed tendency within myself.
Computational chemistry must be making models
(simulations) of the chemical environment (and chemical processes as well) in
the computer - this is an obvious truism. As all natural and social sciences
do, it must also use these models to interpret the naturally occurring chemical
phenomena, and must try to predict yet undiscovered chemical phenomena as well.
But beyond these truisms, how the in-computer modelling is done, and how the
chemical phenomena are interpreted, differs amongst different fields of
computational chemistry; say between the two major classes of the
classical-mechanical and the quantum-mechanical approaches to computational
chemistry.
The ways of modelling chemical phenomena
invariably have to depend also on the materials under study, apart from the
choice (or whims) of the researcher. Thus while studying solids (say, metallic
solids) pure or ab initio (i.e., driven from fundamental principles) quantum
mechanics (QM) can never be applied, as the limited life-span of the human researcher
would prevent him from calculating about even one small piece of solid in his
entire lifetime. This is in spite of the fact that for all things big and
small, quantum mechanics provides the most accurate description (or
predictions), even though it may take much longer time to complete the
calculations. Similarly, for large bio-molecules such as DNA and proteins, ab
initio QM modelling can't be done for the whole macromolecule.
All of us have probably learnt in high school
that all matter is composed of too tiny (invisible) molecules, but at the
college level in chemistry we come to realise that this is just a plain lie:
there are metallic, ionic, macromolecular and covalent solids for which simply
no individual tiny molecules could be found! For all such non-molecular/
macromolecular solids and liquids, we have no option but to go for one of the
many different non-QM modelling techniques, or at least to mix non-QM empirical
ideas with QM analysis, thereby arriving at what are called semi-empirical QM
techniques.
Whatever be the different approaches of
modelling, computational chemistry most generally involves elaborate,
cumbersome and extremely lengthy calculations unimaginable to people unfamiliar
with such fields. One may rightly think that the term 'computational' in CC
mainly refers to the use of computers herein, but it is the absurdly lengthy
nature of the calculations that necessitate the use of computers. (Thus, use of
3-D visual molecular computer-models in stereochemistry teaching is not exactly
a CC activity, even though it is a very beneficial by-product of CC.) As an
example, we have an M.Sc. experiment about obtaining electronic energy of H2+
ion that requires using a computer for 2 seconds; had the student needed to
calculate this computer-performed part also herself, she would have needed
around 200 hours instead of the two seconds. On the other hand, researchers in
various CC-fields quite commonly performs calculations each requiring from
several hours to a few days in a similar computer! An exception to the enormity
of calculations, however, exist in situations where instead of a full-fledged
modelling of the whole chemical environment, modelling of a limited aspect of
chemical behaviour (say, of chemical equilibrium) is attempted, where the
tailor-made program for that aspect may run only for a few milliseconds (some
books such as Computers in Chemistry by K V Raman are full of such
special programs written in FORTRAN etc.).
As an example of modelling, researchers in the currently emerging
field of computational heterogeneous catalysis need to model the catalyst
surface with its pores & microstructures and the adsorbate substrates, as
well as the interactions between these two. Some sort of classical-mechanical
modelling is naturally favoured, yet the computations usually run at a stretch
for days. Similarly in case of macromolecular modelling, an approach known as
the force-field treatment remains useful, where the macromolecule's energy is
attempted to be considered as a function of the various bond distances and bond
angles etc. The structure and the physico-chemical properties of the
macromolecule is attempted to be interpreted thereby. On the other hand, for
reactions between small and medium-sized molecules particularly in the gas
phase, the most accurate ab initio QM methods can now be very successfully
used, thanks to the phenomenal growth in the computer-speed during the last one
or two decades. Thus recently we may come across CC studies such as for the
reactivity between CO & H2O2.
Let us now discuss the ab initio QM modelling of
small and medium-sized molecular systems in some details, keeping in view the
immense popularity of that in recent times. In this field, the mathematical
(numerical) model of the molecule is always associated with a visual model to
be viewed by the user: given the numerical model the corresponding visual model
may be obtained using a graphics software (e.g., PCModel, Ortep-3 etc.), while
every visual model drawn or generated anyhow can be saved as (transformed into)
the corresponding numerical model. The visual model helps in easy understanding
of the molecular structure and stereochemistry; it attempts to represent the
actual molecule as closely as possible in its shape and stereochemistry. There are
provisions to view the models in different styles such as stick, ball and
stick, electron dot surface etc. Besides, generally the visual model may be
viewed from different angles (orientations) and in different enlargements: the
structural figure drawn on a paper is thus a rather poor visual model, compared
to those made with chemistry-specific packages.
As a starting point, in this field we start with
molecules lying in the vacuum (gas phase); it is possible to introduce
corrections to this gas-phase model for the surrounding solvent medium etc.
From quantum mechanics, we know that there is no question of the electrons in
the molecule to be specified of their positions: we need to specify only the
number of electrons in the molecule, while the electron probability density
will be obtained from solution of the electronic Schrodinger equation (ESE). On
the other hand, there’s the necessity of specifying the positions of all the
nuclei (in addition to specifying the types and numbers of the nuclei) – as from
the Born-Oppenheimer approximation we know that to construct the ESE the
nuclear framework must be specified. Besides, with the same set of nuclei and
the same number of electrons different isomers may arise, if the relative
nuclear positions are allowed to vary. Thus, the molecular model must include
the nuclear coordinates, in addition to the types and numbers of nuclei and the
total number of electrons. Using such a molecular model, the molecular
electronic wavefunction and the electron probability density can be directly
found (it’s just a matter of time) by solving the ESE, thereby arriving at a
complete description of the molecule.
However, the total number of electrons may not be
explicitly mentioned in the molecular model (say, for Mopac), in which case it
is understood that the molecule is electro-neutral i.e., there are just
sufficient number of electrons (say 26 in ethanol) to keep the molecule
uncharged. In other cases (say, for Gaussian 94) the charge of the molecular
system needs to be specified, meaning that the number of electrons thus gets
understood. Coming to the question of nuclear-framework specification, we see
that the nuclear framework part of the molecular model is specified in mainly
two different formats. In one format, the type (say atomic symbol, such as H or
N) of each of the nuclei along with its Cartesian (x, y, z) coordinates
(separated by space) are specified one by one for all the nuclei. Generally,
the molecular model is in the form of a text file (a computer-file that looks
like an school essay on the cow), with one line each dedicated to the
description of each of the nuclear type & position. The unit of the
coordinates is, practically universally, Angstrom (not atomic unit). Thus, the
nuclear framework specification for a water molecule may be as follows:
O 0.000000 0.127174 0.000000
H 0.758132 -0.508697 0.000000
H -0.758132 -0.508696 0.000000
In the other format called the z-matrix
specification, the position of the first nucleus is kept unspecified. The
position of the 2nd is expressed in terms of the distance from the 1st. The
position of the 3rd is expressed in terms of the distance from the 2nd or the
1st, and in terms of the bond angle amidst the 3rd, the 2nd/ 1st, and the 1st/
2nd nucleus. All the rest of the nuclear positions require specification of one
bond distance, one bond angle and one dihedral angle (i.e., angle between two
planes, say between the 1-2-3 and the 2-3-4 nuclei-connecting planes). The
justification of such a (initially incomplete-looking) z-matrix specification
is most probably obvious to any student of chemistry: he should be knowing that
translating the whole nuclear framework to another position or rotating it
doesn't lead to a different molecule! It is the Cartesian coordinate
specification format where there is rather too much of coordinates
specification (over-specified by six degrees of freedom), but there also it is
understood that translation or rotation of the whole framework creates no
different molecule. This second format of nuclear framework specification also
has each line describing one nucleus, with unit of distance and angle being
Angstrom and degrees by convention, as exemplified for H2O:
O
H 1 0.989493
H 1 0.989492
2 100.024728
In this field of CC, naturally one starts with
preparations of the molecular models. Using model-drawing software packages
such as PCModel, ChemDraw, ISIS-Draw etc. such models may either be drawn from
scratch OR be modified from pre-existing models such as of aliphatic/ aromatic
rings, amino-acids, mono-saccharides, nucleic-acid bases etc. Such drawing or
modification generally involves quite user-friendly steps, as may be
exemplified in case of PCModel. In PCModel, there are onscreen buttons such as
Draw, Select, Del, Update, Show/Hide-H etc. To do an operation such as adding a
bond, one needs to click at Draw, then at the starting nucleus and then at the
new nuclear position. (To delete an existing nucleus, one needs to click at
Del, then at the nucleus.) A new nucleus drawn is assumed initially as a
C-nucleus, after which it may be altered by invoking a periodic-table entry.
The H-atoms needn't be drawn; they're just understood and may be shown/ hidden
at will. Gross structural mistake(s) made in drawing or modification (e.g.,
creation of absurd bond-lengths etc.) may now be corrected by invoking an
inbuilt, raw energy-minimisation procedure. After drawing or modification of
the visual model in this way, the mathematical model may be immediately
obtained in different CC formats such as Chem-3D (one of the former class, as
above) or Mopac (one belonging to the latter class) etc.
The mathematical models thus formed are then fed
into computational software packages such as Mopac or Gaussian etc., the
desired level of computational theory (say, Huckel/ Hartree-Fock/
Moller-Plasset 2nd-Order etc.) is specified or kept understood, and the lengthy
computational process is allowed to go on. Through such extensive computations,
modern computational packages such as Gaussian 94 (or say, Gaussian 03) can
predict many properties and reactions of molecules such as: molecular energy
& structures, transition-state energy & structures, vibrational
frequencies, reaction energies, potential energy surface (PES) & reaction
pathways etc. It may be noted here that such a computational package may
perform computations in two distinctly different ways: (i) the
nuclear-framework may be considered as exactly fixed and so no modification is
attempted into it (called a single-point calculation) OR (ii) the
nuclear-framework is considered to be modifiable, and so the optimum molecular
structure is sought for through its modification (called a
structure-optimisation calculation). It is the second way through which we can
look for theoretical prediction of chemical reactions, because we as chemistry
students know that it is the change in the relative nuclear coordinates that
imply a chemical reaction (e.g., H2 (g) + I2 (g) = 2HI
(g) implies that the H–H & I–I
distances have increased much and the H–I distances have decreased much).
At this point we should go into some more details
of how exactly the reaction between two or three molecules is modelled. To
proceed with such modelling, at first the model of a relevant supermolecule,
which includes all the reacting molecules, has to be constructed by joining the
individual molecular models. For example in case of the H2-I2
reaction, the supermolecule must include an H2 molecule and an I2
molecule. The work is best (most rigorously) done if both (or all three, as is
the case) individual molecular models are at first completely
structure-optimised, then the supermolecular model is constructed therefrom,
and then that is also completely structure-optimised. The tendency of the
reactants to form the products and this reaction’s pathway (i.e., mechanism or
intermediate steps) may be trivially judged during the structure-optimisation
path of the reactant supermolecule (say, H2+I2) in
situations where it gets auto-modified to the product supermolecule (say,
HI+HI). But the better QM practice would, however, be to scan the PES of the
supermolecule by freely varying its relative nuclear coordinates, and to
thereby judge its complete theoretical reactivity profile. The stabilisation
energy (DE) of the product, assessed as DE = EAB –
(EA + EB),
estimates in a rough way the product stability in contrast to the reactants
(where EAB, EA and EB are respectively the
structure-optimised energies of the product AB, the reactant A and the reactant
B).
To join the already constructed (and may be
structure-optimised) two or more molecular species into a supermolecule, and
also to form models of complicated molecules starting from the simpler building
blocks, there's available a model-joiner and model-manipulator software package developed by this author. Without such a package,
it's very difficult to construct models of molecular stacking, of
supermolecules and of complicated 3-dimensional molecules such as of ferrocene.
For example, to draw ferrocene in PCModel, one can easily draw one
cyclopentadiene, but to place one cyclopentadiene just above another, and to
place an iron atom just in between, would be very difficult. The reader might
think that the mistakes in drawing may get corrected during
structure-optimisation, but actually a grossly wrong structural model is hardly
accepted by a computational package even as a first starting model! So the best
strategy for ferrocene would be to get a perfect cyclopentadiene model with
PCModel etc., and to join two such models and a Fe-atom using such a
model-joiner. It may be noted here that many CC packages such as ChemDraw,
Ortep, Pluton RasMol, Protein Explorer etc. (as well as the structural
databases for proteins/ DNA/ RNA etc.) are freely available from the Internet
(just need to search for the name via google.com etc.), and this joiner
package is planned to be no exception. PCModel and Gaussian are, however,
priced packages (now at USD 400 and USD 750, respectively).
Finally, a volley of questions would naturally
come to our minds: is CC really successful in modelling chemical phenomena? Can
it predict the real-life reactions without doing the experiments? Can it really
be used to eliminate many useless drug-candidates without going for clinical
trials about each of them? Is the computer-time required for meaningful
predictions realistic? To get some first-hand answer about this last-discussed
(ab initio QM) field of CC, I have played with two optimised models of NH3
and BF3, made a supermolecule with the two molecules kept far apart,
and structure-optimised the supermolecule. In a Pentium-II PC using Gaussian
94, it took only around five minutes to find that the two molecules will form
between them a (coordinate-covalent) bond of around 2 A0, with DE around 65 kJ mol–1. Thus, for a very small
supermolecule-system, the prediction is quite satisfactory and fast. But can
this ab initio QM field of CC be used to predict whether a 50-atom drug molecule
will properly act with a macromolecular (thousands-atom) DNA or protein?
Obviously not, but some other fields of CC must be predicting similar questions
with more success, otherwise why would the pharmaceutical industry be spending
so much in CC research?
26 Jan, 2005