Site hosted by Angelfire.com: Build your free website today!

Atom labels and other chemically-significant text

[Main Page]

Introduction

Need an introduction.

Elemental atom labels

Atom labels consisting of a single non-hydrogen element are the most universally understood type of chemical information after bonds themselves. Elements should be indicated by their approved element symbols, using proper case (the first letter of a symbol should be capitalized, and subsequent ones lowercase).

RECOMMENDED AVOID

If hydrogens are present on a labeled atom, they should be indicated in the atom label. A single hydrogen should be indicated by the letter "H" immediately after the other element symbol. Multiple hydrogens are further indicated by a subscripted number following the "H", indicating the total number of hydrogens present. A labeled atom without hydrogens should be interpreted as having zero hydrogens. Such an atom might represent a free radical, however it would be better to indicate the radical explicitly in that case. Wholly unlabeled atoms represent carbon atoms with the proper number of hydrogens to satisfy a valence of four.

RECOMMENDED RECOMMENDED AVOID AVOID AVOID

Charges

A positive charge is indicated by a plus sign. The plus sign should follow the most postively charged atom in an atom label, and should also follow any hydrogens associated with that atom. The plus sign should be superscripted.

RECOMMENDED RECOMMENDED AVOID AVOID AVOID

If an atom is multiply charged, a number indicating the magnitude of the charge should be included directly before the plus sign.

RECOMMENDED AVOID AVOID

Negative charges should be indicated similarly to positive charges, although a minus sign should be used instead of a plus sign. A proper en-dash is preferred to a hyphen when possible.

RECOMMENDED ACCEPTABLE AVOID

[Do we want to comment on charges-in-labels versus floating charges?]

Radicals

Similar to charges. Do we want to prefer a center-dot or a bullet? Should we specify a way to distinguish a singlet from a triplet?

RSC says: "Free radicals are expressed by a raised bold dot, e.g. HO. In linear formulae, the dot can be entered above the relevant atom. In charged radicals, the charge and radical dot can appear in either order (e.g. R-• or R•-).

Isotopes

Isotopic substitution is indicated by a superscripted mass number appearing directly to the left of an element symbol. The mass number should indicate the isotope's total mass, and should not indicate its deviation from the element's nominal mass at natural abundance. The hydrogen isotope of mass 2 may also be indicated by the symbol "D". The hydrogen isotope of mass 3 may also be indicated by the symbol "T". [The ACS Style Guide says that if other isotopes are present in the structure, the explicit 2H and 3H forms are prefered. Is that a preference that we want to echo? I thought so at first, but on further thought, I think I like seeing D and T in those cases as well...]

RECOMMENDED RECOMMENDED RECOMMENDED RECOMMENDED RECOMMENDED

The creation of atom labels containing multiple isotopes of one element should be avoided, but if such a label cannot be avoided, atoms in natural abundance should be listed first, followed by other atoms in increasing isotopic mass number

RECOMMENDED RECOMMENDED AVOID AVOID

Isotopic labeling (partial rather than complete replacement of the atom by its isotope) is indicated similarly, but the isotopically-labeled atom should additionally be enclosed in square brackets. Note that only the single element symbol should be so enclosed; if there are other elements (including hydrogens) described in the atom label, they should be located outside the brackets.

RECOMMENDED

Oxidation numbers

When required, oxidation numbers should be indicated by a superscripted roman numeral (or Arabic zero) immediately following the the atomic symbol. Oxidation numbers should be omitted when the oxidation state is clearly indicated by by the remainder of the structure, which is usually the case in structures fully specified with explicit bonds.

RECOMMENDED RECOMMENDED
(when emphasizing oxidation state)
RECOMMENDED AVOID

Labeling of carbon atoms

Carbon atoms are traditionally left unlabeled, especially in ring systems, with the presence of the carbon atom implied by the "bend" in the bonds. Accordingly carbon atoms must always be labeled explicitly in cases where they cannot reliably be implied, such as when connected to two double bonds or when connected to no bonds at all.

RECOMMENDED RECOMMENDED RECOMMENDED RECOMMENDED

[Do we want to say something about the labeling of terminal carbons? I'm pretty sure I prefer to leave them unlabeled, but I go back and forth. It certainly doesn't look bad to label them, but it can add a decent amount of clutter to complex structures...]

Positioning of atom labels

Atom labels should be positioned so that all bonds connecting to the atom label point directly at the element symbol of the atom to which they are bonded. The bonds should approach the label closely but should not impinge on the actual characters.

For single-character atom labels, the bonds should point to the center of the character.

RECOMMENDED RECOMMENDED RECOMMENDED RECOMMENDED RECOMMENDED AVOID AVOID AVOID AVOID

For multiple-character atom labels, the bonds should usually point to the center of the first character, but should instead point to the center of the entire symbol when the bonding pattern is highly symmetric.

RECOMMENDED RECOMMENDED AVOID AVOID AVOID

Atom labels should be oriented so that avoid congestion with other portions of the structure. When the atom label consists of only a single character, this is not an issue. When the atom label contains several characters and there are no bonds to the right of the label, the atom label should start with the primary element symbol located at the end of a bond and the rest of the label extending to the right as shown in several of the examples above. However, when the atom label contains several characters and there are bonds to the right, things get more difficult. It is quite possible for an atom label to be longer than its associated bonds. Atom labels should never be positioned so that they obscure a bond completely.

If a multi-character atom label has bonds to its right but does not have any bonds to its left, the atom label may be reversed. If the atom label contains more than one non-hydrogen atom (for example, NO2 or COOH), the label must be reversed; it is recommended that the label be reversed in all cases. If the label is reversed, the main element symbol must still be positioned so that the bond points to its center; however, any subsequent symbols proceed to the left of the first element, rather than to its right. Repeat counts remain to the right of the element to which they apply. Charges and radicals remain at the end of the label, although in this case the end is the extreme left instead of the extreme right.

RECOMMENDED RECOMMENDED RECOMMENDED AVOID AVOID

For structures drawn in the linear drawing style, a bond on the right side of an atom label will always point to the rightmost atom in that label, even if it is attached to some earlier atom. This is acceptable when using that drawing style, although use of that style should be avoided in general as discussed elsewhere.

RECOMMENDED (for linear drawings)

Atom labels with bonds both to the right and to the left may stack vertically. As above, the primary element symbol must remain positioned so that the bonds point to its center. The additional characters in the label may then be positioned on a second line either above or below the first symbol, according to whichever location has more room. Alternatively, atom labels in these positions may remain on a single left-to-right-reading line as long as the atom label does not completely obscure any bonds.

RECOMMENDED RECOMMENDED ACCEPTABLE

Structural abbreviations

In addition to element symbols, atom labels may contain substituent abbreviations. The abbreviations shown in this table may be used without further definition. Other abbreviations of may be used if they are accompanied by a clear explanation of what structural fragment they are intended to represent.

Several structural abbreviations, as shown below, are identical to an element symbol. The use of these abbreviations should be restricted to situations where they are unlikely to be mistaken for the element symbols. Since these abbreviations conflict only with relatively uncommon metals, they are fairly safe to use in strictly organic contexts.

Abbreviation Element name
Ac Actinium
Cm Curium
Nb Niobium
Np Neptunium
Pr Praseodymium

Single-letter abbreviations

There are many other sets of abbreviations that are used with specific compound classes. In particular, the use of single-character abbreviations can be extremely confusing. Surely nobody would interpret BENZENE as anything other than C6H6, but it could be Asx-Glu-Asn-Glx-Glu-Asn-Glu. The use of single-character abbreviations should be limited to contexts where their intended meaning is clear.

The use of single-character abbreviations in conjunction with other structural features (atoms and bonds) should be avoided; abbreviations of this type are best restricted to running text. If structures of this type are necessary (for example, to highlight small structural modifications of large proteins), care should be taken to differentiate single-letters-representing-amino-acids from single-letters-representing-elements. In structures of this type, adjacent amino acids should always be aligned vertically or horizontally, with or without explicit single bonds between them. Disulfide bridges should also always be aligned vertically or horizontally, and should be positioned so that no more than four characters (the two sulfur atoms and the amino acids on either end) are colinear, lest the sulfur atoms be mistaken for serine residues. Bonds from the amino acids to other structural features should primarily be offset from horizontal or vertical by 30 or 60 degrees. Additional clarification, including explanatory text and/or color coding might be appropriate for complex cases. [See example from Chemical & Engineeering News, mirrored here in case that page is removed]

Three-letter amino acid abbreviations

Three-letter amino acid abbreviations -- and other abbreviations with two or more distinct points of attachment -- should also be used with care, because the nickname itself gives no indication of the intended attachment order:

 

Although abbreviations of this sort will generally have a preferred orientation -- in the case of amino acids, the NH2 end is usually on the left, and the acid end is usually on the right -- this is far from universal. Cyclic peptides are common case that violates this general rule:

Should this peptide be read "clockwise" or "counterclockwise"? Different people will make different assumptions. Ambiguous structures should be avoided when possible, or be accompanied by sufficient explanatory text to remove the ambiguity.

As with single-character abbreviations, adjacent amino acids represented by three-letter abbreviations should always be aligned vertically or horizontally, with or without explicit single bonds between them. Disulfide bridges should also always be aligned vertically or horizontally, and should be connected to the central character of the three-letter abbreviation. Bonds from the amino acids to other structural features should primarily be offset from horizontal or vertical by 30 or 60 degrees. Additional clarification, including explanatory text and/or color coding might be appropriate for complex cases.

Atom labels representing more than one non-hydrogen atom

When clarity is critical and space is not a concern, fully expanded structures (showing an explicit bond between every pair of non-hydrogen atoms) are always preferable to structures showing more complex atom labels. However, space often is a concern, particularly when preparing structures for publication. The following recommendations should provide some guidelines for producing complex atom labels that are likely to be understood correctly in most circumstances.

General guidelines

Atom labels representing more than one non-hydrogen atom -- also sometimes known as "contracted" labels -- rely on the fact that many elements have very consistent and well-understood bonding patterns. The most common description of these patterns is the popular "octet rule", which states that atoms tend to prefer having eight electrons in their valence shell, and hence tend to exist with the the orders of all attached bonds totalling to four. The elements shown in green below obey the octet rule most of the time, and are fairly safe to use in contracted labels, with a few exceptions as discussed below. The elements shown in orange are increasingly less safe, as they all have common forms that violate the octet rule in addition to other forms that obey it. The remaining uncolored elements have highly variable bonding patterns and should not be used in contracted labels, but always drawn with explicit bonds.

Orientation of symbols within contracted labels

Contracted atom labels attached to only one bond should be read outwards from that bond, usually from left to right if the bond is on the left of the label. If the bond is instead attached to the right of the label, the label will normally be read from right to left, but ambiguities can result. Accordingly, contracted labels with a bond on the right should be avoided except for simple cases, usually limited to relatively small labels containing four or fewer combined element symbols and abbreviations. Contracted labels with a single bond attached to an interior atom or with multiple connecting bonds should always be read from left to right, but these also are prone to ambiguity and should similarly be avoided except for simple cases.

Relatively long labels with bonds to the rightmost character should be avoided, since their interpretation can be extremely difficult. The following two cases demonstrate this problem. Although the labels appear superficially very similar, the first must be interpreted from left to right, and would represent an acetoxymethyl substituent, while the second must be interpreted from right to left, and would represent a methyl ester of a carboxylic acid.

AVOID AVOID

In extreme cases, a single label could represent different structural fragments depending on whether it was interpreted from right to left or from left to right.

AVOID

Interpretation of contracted labels

In general, contracted labels are interpreted to fill as many valences as possible, as quickly as possible. Considering a simple case such as shown. A carbon atom is the first atom in the label, and it is connected to a single bond, so it has three remaining open valences. The next atom in the label is bromine, and is repeated three times. Bromine has one open valence, so together the three bromine atoms fill the three open valences on the carbon atom. There are no more atoms, and no open valences, so this label has been interpreted completely. This is a very simple case.

The common carboxylic acid group provides a more complex example. Again, the first carbon has three remaining valences. In this case, the next atom is oxygen, which has two available valences, and both of those are used to form a double bond with the carbon, leaving the carbon with one remaining valence. The third element is another oxygen, but this time only one of its valences is used to create a single bond to the carbon. That fills all of the available valences for the carbon, but leaves one remaining valence on the oxygen, which is in turn filled by the fourth atom, a hydrogen.

In the similar case of a peroxide, two valences on the first carbon are filled immediately by two hydrogens. With only one valence remaining on the carbon, the first oxygen has no option but to chain with the second, forming a very different bonding pattern from that of the carboxylic acid.

Divalent structural fragments may be enclosed in parentheses and followed by a repeat count to represent repeating fragments concisely. [Hm. The latest Red Book draft says that square brackets should be used here, and it implies that parentheses may not be used. I don't know if I like that...]

As discussed above, a valence-based interpretation of atom labels will be successful only for elements with predictable bonding patterns. Some elements, including sulfur, commonly exist in a variety of valences. Contracted labels containing these elements should be avoided, particularly when those element symbols are immediately followed by multiple chalcogens or halogens.

AVOID

When used as part of a larger label, the textual fragments SO2, SeO2, and TeO2, should be used only to represent sulfones, selenones, and tellurones, respectively, and should never be used to represent the linear allomorphs or any other branching form

Do we want to talk about oxyacids in general as a special case of acceptable contracted labels? They may be represented in several ways in accord with the inorganic nomenclature chapter on formulas (need to flesh this out if we want to discuss further).

---SO3H  ---SO2OH   ---AsO3H2   ---AsO(OH2)   ---OClO3, etc

Even in the presence of other atoms with variable valence, the CH2 fragment should always be interpreted as a chaining moiety, even when followed by a repeat count. Structures containing branching methylidene fragments should be drawn with explicit atoms and bonds.

Some very common contracted labels cannot be interpreted with a simple application of valence rules, but also need some implicit charges to be added. These labels are shown in the list of common contracted atom labels.

Branching

Simple branching patterns may be implied by the basic valence rules described above, and do not require special notation. More complex branching may be clarified by placing parentheses around all elements within a branch. One valence for the first element within the parentheses is used for connecting the previous atom outside the parentheses; subsequent atoms within the parenthesized section are then bound to the first or subsequent atoms, even if an atom outside the parentheses has remaining open valences.

Branching chains where the branch is connected to the main chain by a double bond should be indicated by placing an equals sign directly within the opening parenthesis. However, such labels can often be rewritten to avoid the necessity for the equals sign by swapping the text within the parentheses with the text after. [Do we want to state a preference?]

AVOID RECOMMENDED

Parentheses may be nested, but highly complex labels of this type can be extremely difficult to understand, and should be avoided. [Hm. The latest Red Book draft seems to say that this nesting order should be used: (), {()}, ({()}), {({()})}, etc. I don't think I've ever seen that used in atom labels, and I'm pretty sure I don't like it...]

AVOID

Explicit single bonds

Explicit single bonds are generally not necessary within atom labels and should be avoided. They should strongly be avoided in contexts where they might be mistaken for negative charges. In cases where explicit single bonds are desired for clarity, it would likely be even more clear to draw the structure out fully, rather than trying to denote the single bonds inline using text.

If explicit single bonds are desired anyway, they should be represented using the en-dash character, not the hyphen.

RECOMMENDED RECOMMENDED AVOID AVOID

Structural abbreviations

Abbreviations that contain a single attachment point may be freely used in contracted atom labels. To avoid any possible misinterpretation about whether such abbreviations truly do contain only a single attachment point, they should always appear last within a label or last within a parenthesized portion of a label.

RECOMMENDED RECOMMENDED RECOMMENDED RECOMMENDED AVOID

Ordering of multiple symbols attached to the same atom

If a single atom is bound to both hydrogens and non-hydrogens, any hydrogens should be presented adjacent to the first atom, followed by the others, reading outward from the bond.

RECOMMENDED RECOMMENDED AVOID AVOID

If more than one single-element symbol or abbreviation is attached to the same non-hydrogen atom, the element symbols should be presented first in alphabetical order (after any hydrogens), followed by the abbreviations in alphabetical order, reading outward from the bond.

RECOMMENDED RECOMMENDED RECOMMENDED RECOMMENDED RECOMMENDED

Fragments containing multiple symbols should appear only after all hydrogens, single-element symbols, and abbreviations, since that arrangement will often avoid the use of parentheses.

RECOMMENDED AVOID

Charges, radicals, and isotopes

Charges, radicals, and isotopes, if present, should be placed adjacent to the element symbol to which they apply. Isotopic masses are always superscripted to the left of the element symbol, without exception. Charges and radicals are usually positioned to the right of the element symbol, but may appear to the left instead, if the atom label is reversed. Because placement of a charge or radical may may result in ambiguity within reversed labels, it is best to include charges and radicals within reversed labels only at the first or last character of the label.

RECOMMENDED RECOMMENDED RECOMMENDED RECOMMENDED AVOID

Charges may also be indicated outside the text of the atom label; in that case, the charge symbol should be positioned directly above the symbol of the element that is charged. This style can be used without ambiguity even for reversed labels

RECOMMENDED RECOMMENDED

Formulas

Formulas may be considered as atom labels not connected to any bonds. They may be preferred to structural diagrams for simple compounds such as NaCl and MeOH. Formulas should always be interpreted from left to right, but otherwise observe restrictions similar to other kinds of atom labels.

[There is an entire chapter devoted to formulas in the new Red Book draft. We should probably incorporate it by reference, or repeat portions, or something]