Computational Modeling for DNA Oligonucleotides
Obtainable from the PDB Resource

Rituraj Kalita 

For studies about the important biological actions of various DNA-binding drugs as well as potential drug-candidates, computational modeling of drug-DNA interaction has of late become an important part of the set of investigative efforts[1-3]. An obvious prerequisite for such modeling of drug-DNA interaction is acquisition of proper computational models for the DNA. In practice, however, an actual polynucleotide DNA model is rarely used, rather we mostly remain satisfied instead with a model for a short DNA sequence[1,3], i.e., an DNA oligonucleotide (typically a dodecanucleotide duplex).

A computational model of a molecule, whether a small molecule or a macromolecule, essentially implies its structural specification (as found within an XMol XYZ, a PDB, a Gaussian Job or a GAMESS Input computer-file): it implies a specification of its atom types and their nuclear space-coordinates, generally along with its bond connectivity pattern, and including the net molecular charge that decides the number of electrons in the molecule. If the above set of data is specified, the remaining significant properties of the molecule may be computed therefrom[4]. Presently, after the publicity earned by the human genome project, it is common knowledge that similar structural specification for DNA and RNA sequences are freely available within the protein data bank (PDB) online resource[5], a misnomer for a magnificent facility that disseminates data not limited to proteins only. Many of the so-called DNA structures found there are actually oligonucleotide ones[6-9].

However, for computational modeling purposes the online-stored macromolecular structures are, unfortunately, not directly usable. Most of them are of crystallographic (i.e. XRD) origin, and so they totally lack information about the numbers and positions of the hydrogen atoms within the actual molecules. Without incorporating the appropriate number of hydrogen atoms at their appropriate locations, a meaningful computational study is generally unthinkable, say in case of an ab initio quantum mechanical investigation. In addition, the experimental oligonucleotide samples contain extraneous moieties such as bound ligands (small molecules), bound metal cations and bound water molecules, all of which get reflected within the reported (PDB file) structures. However, theoretical modeling frequently requires purely oligonucleotide models, and so such extraneous moieties may need to be removed from the structural models. The net molecular charge, generally required for computational investigations, is also not immediately obvious from a reported PDB file, and so need to be cautiously counted.

A very useful Windows-based software tool to remove extraneous moieties from and to add the missing hydrogen atoms (to an oligonucleotide model) is ArgusLab. Developed by the theoretical chemist Mark Thompson, and being distributed as a user-friendly freeware (from http://www.arguslab.com), it is primarily meant for making three-dimensional nuclear-framework models of small and medium-sized molecules, whether by drawing them from scratch or via modification of related existing structures. It may, however, open existing molecular structures saved in the PDB and XMol XYZ formats, and in case of proper PDB-format files, it can distinguish among the individual chains and the various extraneous bound moieties. With a modern (say with Pentium IV, 1 GHz or higher speed processor) computer, dealing with a large oligomer using ArgusLab is not anymore inconveniently slow either.

At first we start with the downloaded model (obtained from http://www.rcsb.org/pdb) for the composite, unrefined oligonucleotide, say the dodecameric one d(CGCGAATTCGCG)2 (containing extraneous moieties) coded 1FMS. The downloaded file (say, 1FMS.pdb) is then opened with ArgusLab, and then the Molecule Tree View therein is activated (by clicking at its icon, found at the second icon-bar). Within the left-side tree view, need to click at the plus (+) sign to the left of the molecule name, then again at the plus sign left of the Residues item, yielding a overall view as given below:

To remove the extraneous ligands and metal ions etc., we now click at the plus sign before the Misc. item, expanding the Miscellaneous residues. We may now identify (within the resulting tree view) the various nucleotide moieties coded as 5DA, 6DA, 1DC, 3DC etc., as well as the non-nucleotide moiety coded 25D35 (a ligand, here the DB249 drug molecule) and the metal ion coded 26MG (here a Mg2+ ion). We may select both these non-nucleotide moieties with mouse-clicks (keep Ctrl key on keyboard pressed to make such a multiple selection), getting the appearance as shown in the view below, and then may press Del key to delete these two from the molecular model.

To delete the extraneous water molecules as well, next need to similarly expand the Water item in the Tree view. Then may select all the water molecules by first keeping Shift key pressed, then selecting the top water molecule (e.g., here 27HOH) and then the bottom one (as in figure below, with the bound water molecules getting nicely selected in the right-shown model also). May press Del key now to delete all these water molecules.

The resulting molecular model (i.e., of d(CGCGAATTCGCG)2, the cleaned dodecanucleotide duplex) is still rather big for QM (quantum mechanical) investigations (though okay for molecular mechanics ones), whereas it has been known here that the drug to be investigated here gets bound only to its middle portion. So the three base-pairs on each side may be deleted to get the truncated hexanucleotide duplex d(GAATTC)2. To conveniently do that in ArgusLab, we need to again expand the Misc. item in the Molecule Tree view, locate (by toggling mouse-clicks) the twelve end-lying nucleotide groups (here 1DC, 2DG, 3DC etc.), select them together thus getting the following view, and then delete them (selecting group by group in this way is obviously much easier than selecting atom by atom within the right-lying modelling screen).

We now need to add the missing hydrogen atoms, and may now ask ArgusLab to add H-atoms to the carbon atoms and the heteroatoms that clearly lack them. However, it should be noted that ArgusLab is not at all an expert system knowing the positions of double bonds within the nucleotide moieties, rather it is one that can only blindly detect atomic positions where the quadrivalency of carbon (or trivalency of nitrogen) remain unsatisfied, and can fill in H-atoms to satisfy them. So, during H-atom addition, it would ignore the positions of double bonds within the nucleobases and so would fill in more H-atoms indiscriminately. To correct this, that H-added model need to be carefully investigated manually (viewing it from several angles), knowing the exact positions of double bonds within the four possible nucleotide fragments, taking help from standard representations including ones within some authoritative web pages[10-11], and excess pairs of H-atoms are to be deleted manually wherever the structure so demands.

Even after this manual rectification, there remains the task of further correcting the added H-atom positions. Its need arises because of two reasons. First is that, ArgusLab would add missing H-atoms with their new bond angles and distances determined by very rudimentary and generalised rules of thumb: so the new H-atom positions, at this stage, are not tailor-suited to the specific molecule or macromolecule in hand. Secondly, the manual deletion of excess H-atom pairs obviously implies that the remaining H-atoms connected to the same pair of (C or N) atoms are directed along wholly wrong directions: i.e., along directions consistent with sp3 geometry of C (or N) atom instead of with its sp2 geometry! So, to correct the H-atom positions so as to make them consistent with the actual whole-molecular structure in hand, a structure-optimisation calculation at the (faster) MM (molecular mechanics) level is sought to correct the H-atom positions, using the Update (only) Hydrogen atom positions option in ArgusLab. To perform this, after opening the H-corrected molecule in ArgusLab, need to choose the Optimize Geometry... option from the Calculation menu, and within the resulting dialogue box shown in the following figure, may choose UFF (universal force field) from MM option at the top-left, and tick the Update Hydrogens option at the bottom-right. We also should change the Maximum Steps Taken option (at the top) to 999 (from original 100) to allow for lengthier calculations. Also, for better corrections, set the (Molecular) Net Charge option (at bottom-left) to its appropriate value (see next paragraph), which is –10 in this case. Now need to click at Start button at the top-right part to start the optimisation, and need to wait till the calculations are over. Finally saving the (optimised) molecule as an XMol XYZ file, we would get the H-atom refined model for the clean and thoroughly rectified hexanucleotide duplex ready for QM investigations!

To proceed for a QM investigation (such as one using PC GAMESS) we need to further ascertain exactly the net molecular charge and the electron-spin multiplicity (i.e., ICHARG and MULT respectively in PC GAMESS). For a usual oligonucleotide, the ground state is a singlet, so that the (electron spin) multiplicity is 1. Every phosphate group contains a negative charge on an oxygen atom[11], so that for the hexanucleotide duplex the charge is –10, i.e., –1 each for each of the ten phosphate groups (five groups on each chain) within the oligomer (for a dodecanucleotide, the charge will obviously be –(12–1)*2 = –22, one negative charge each for the eleven phosphate groups on either of the chains).

Note: However, for removing extraneous moieties and for adding the missing hydrogen atoms, there are also available two web-servers (namely, MolProbity and WHAT IF) that work, in a fast and automated way, on PDB-format molecular model files (e.g., on 1FMS.pdb mentioned above). To use them, we need to state the known PDB ID or upload the PDB-format model on the opening web-page of MolProbity or the Hydrogens (bonds) web-page of WHAT IF. Then need to choose the desired operation (with mouse-clicks), and wait a few seconds for the resulting, modified PDB-file to appear at the (same) web-page (this file may then be downloaded).

 

References:

01. Shaikh S.A., Ahmed S.R., and Jayaram B. A molecular thermodynamic view of DNA-drug interactions: a case study of 25 minor-groove binders. Archives of Biochemistry and Biophysics, 2004, 429, 81-99. 

02. Greenidge P.A., Jenkins T.C., and Neidle S. DNA minor groove recognition properties of pentamidine and its analogs: a molecular modeling study. Molecular Pharmacology, 1993, 43, 982-988.

03. Chang D.-K., Cheng S.-F., and Chien T.-L. Molecular mechanics calculations on the complexes between analogues of Hoechst 33258 and d(CGCGAATTCGCG)2: influence of bulky group substitution on base pair preference of DNA minor groove binders. Canadian Journal of Chemistry, 1995, 73, 878-884. 

04. Kalita R. Computational chemistry via educational tools: Modelling molecules and reactions. Proceedings of the National Workshop on Computers in Chemistry, Cotton College, Guwahati (India), 2006, pp. 164-167. <http://www.geocities.com/riturajkalita/ccworkshop.htm>. 

05. RCSB PDB Protein Data Bank: An Information Portal to Biological Macromolecular Structures. <http://www.rcsb.org/pdb>

06. Trent J.O., Clark G.R., Kumar A., Wilson W.D., Boykin D.W., Hall J.E., Tidwell R.R., Blagburn B.L., and Neidle S. Targeting the minor groove of DNA: crystal structures of two complexes between furan derivatives of berenil and the DNA dodecamer d(CGCGAATTCGCG)2. Journal of Medicinal Chemistry, 1996, 39, 4554-4562. 

07. Mazur S., Tanious F.A., Ding D., Kumar A., Boykin D.W., Simpson I.J., Neidle S., and Wilson W.D. A thermodynamic and structural analysis of DNA minor-groove complex formation. Journal of Molecular Biology, 2000, 300, 321-337.

08. Guerri A., Simpson I.J., and Neidle S. Visualisation of extensive water ribbons and networks in a DNA minor-groove drug complex. Nucleic Acid Research, 1998, 26, 2873-2878.

09. Simpson I.J., Lee M., Kumar A., Boykin D.W., and Neidle S. DNA minor groove interactions and the biological activity of 2,5-bis-[4-(N-alkylamidino)phenyl] furans. Bioorganic & Medicinal Chemistry Letters, 2000, 10, 2593-2597.

10. Calladine C.R., Drew H., Luisi B., and Travers A. Understanding DNA: The Molecule & How it Works. Academic Press, 2004. ;
      Wikipedia contributors. DNA. Wikipedia, The Free Encyclopedia. 18 September 2008, 06:34 UTC, <http://en.wikipedia.org/w/index.php?title=DNA&oldid=239209067>.

11. Farabee M.J. DNA and molecular genetics. 2007. <http://www.emc.maricopa.edu/faculty/farabee/BIOBK/BioBookDNAMOLGEN.html>.