Site hosted by Angelfire.com: Build your free website today!

Variable attachment points and ligands

[Main Page]

Introduction

Due to the nature of chemical research and reporting, it is a common need to discuss a collection closely-related compounds. In cases where storage space is not a concern, it is generally preferable to depict each substance individually, for example as a series of discrete records in a chemical database. However, there are also many cases where storage space is indeed at a premium; this is common in journal articles, for example. Sometimes also it is necessary to refer to the closely-related compounds as a single unit; this is common in patents. These guidelines suggest some ways to refer to collections of chemical structures.

In general, variable structures are depicted by providing a parent or "core" structure with marked locations of variability, accompanied by lists of atoms, substructures, or classes of substructures that might be present at each of those marked locations. When the core structure has only one variable location, the total number of structures equals the total number of replacements listed for that location. Core structures with more than one variable locationmay  represent the combinatorial product of the number of replacements available for each location. That is to say, a core structure with two variable locations having three possibilities for the first location and seven possibilities for the second location would represent 21 total structures (the final number of structures may be less due to symmetry). Collections of these types are commonly found in journal articles and are an excellent space-saver when discussing collections of closely-related compounds such as are produced in a series of similar reactions.

In addition to listing specific replacements, it is also possible to describe entire classes of ligands, for example saying that a certain location may contain "an aryl group, or a heteroaryl group of no more than 7 atoms". Diagrams that define broad classes of structures such as these are known as Markush structures, after Eugene Markush, the first inventor to include them successfully in a U.S. patent (US 1,506,316, issued in 1924). In structures of this type, it is generally impossible to list all possible members of the collection. Markush structures continue to be common in patents, and are increasingly being used to generate combinatorial libraries for electronic analysis.

Small substituents

When all variable substituents are small, they can be enumerated as a simple list. The list may be provided as a separate caption near the parent structure, or it may be included as a complex atom label directly in the parent structure itself.

When the variable list is provided separate from the parent structure, the point(s) of variability in the parent should be indicated by an unambiguous label. This label should chosen so that it cannot easily be confused with other valid element symbols or fragment abbreviations. Traditionally, the letter "R" is widely used, followed by a superscripted number (R1, R2, etc.) or prime(s) (R', R'', etc.) if necessary to distinguish different groups. Other common symbols include X, Z, and G. A single structure with multiple sites is assumed to have no restrictions between those sites (that is, the representation includes all combinatorial products).

A label may also be used without an explicit list of allowed substituents. Such labels may be used to indicate that substitution is permissible or likely, without specifying anything about the substituents themselves. If more than one substituent is indicated on the structure and those substituents may be different, unique labels (R1, R2, etc.) should be used to emphasize the possibility of that difference.

Most commonly, variable substituents are connected to the parent structure by only one bond. Variable substituents connected to more than one bond should only be used when all fragments within the variable list are monatomic or otherwise symmetric in terms of substitution.

In the absence of an external connection point drawn explicitly on the members of the variable list, each item in that list is assumed to connect to the parent structure according to the standard rules of valence, or by the first atom in the item if more than one atom has free valences.

RECOMMENDED RECOMMENDED RECOMMENDED

Harry: we should discuss doubly-bonded variable lists, which may (or may not?) become two separate substituents on expansion:

AVOID

When the variable list is provided directly within the parent structure as an atom label, each list should be enclosed in brackets to make it clear that the terms are associated. This should only be considered for small lists that can be provided without impinging on other portions of the parent structure.

RECOMMENDED RECOMMENDED AVOID

Unadorned lists are simplest to understand and should always be preferred to textual descriptions of the variable substituents.

RECOMMENDED AVOID

Parent structures with a ligand that is attached at one of several ring atoms may be depicted by drawing that ligand attached to a bond that extends into the center of a ring. In printed works, it is assumed that the ligand may then be attached to any of the ring atoms as is allowed by normal bonding rules. When working in electronic formats, it may be possible to specify that the ligand can be attached to certain of the ring atoms and not to others, depending on the capabilities of the software being used. If more than one ligand may be independently connected to various ring atoms, the bonds attached to those ligands should both extend toward the center of the ring but should not themselves be connected to each other.

Harry: Problem for OCR software; should deprecate.

Predefined substituent classes

Sometimes a substitent can be defined only by describing the general class of compounds to which it belongs. Some classes are common enough that specific labels have traditionally been used to define them. When used without accompanying text to further describe the substituents, the following labels should only be used to indicate the following specific substituent classes.

Ar aryl
Aryl aryl
E electrophile
EWG electron-withdrawing group
M metal
Nu nucleophile
Q heteroatom (not hydrogen, not carbon)

Use of those labels in conjunction with accompanying descriptive text should be avoided. However, if descriptive text is used with one of these labels, it should be used only to further restrict one of the listed classes ("Ar = halogen-substituted aryl") and should not be used to describe an entirely unrelated set of substituents ("Ar = methyl, ethyl, or propyl").

If more than one of these labels is required in a given structure, they may be differentiated by the use of superscripted numbers or primes as described above.

The use of the "Ar" label should be strongly avoided in any context where it might possibly be interpreted to represent an argon atom. Since argon forms few compounds, contexts of this sort are exceedingly uncommon.

The "R" label has in some cases been used to indicate alkyl substituents, and in other cases to indicate non-hydrogen substituents. However, it is more frequently used in the fully generic sense to indicate "some unknown substituent" without further restriction. Similarly, "X" has been used to represent halogens or, sometimes, also halogen-like substituents such as tosyl. We recommend that these labels be used either in conjunction with descriptive text ("R = alkyl") or in the fully generic sense.

Variable chain length and ring size

It is often convenient to specify that a chain or ring must be present, but that its actual length or size may vary. This may be accomplished by specifying the ring or chain atom, generally CH2, within brackets, then by following the brackets by a subscripted range of values to indicate the minimum and maximum number of those atoms that may be present. Only homogenous chains and should be represented in this way. If multiple element types or unsaturated bonds may be present, they should be represented outside of the variable section or another type of notation should be used entirely.

RECOMMENDED RECOMMENDED RECOMMENDED

The use of curved bonds should be avoided. Harry: Why?

AVOID

Variable attachment location

In addition to having the type off attachment vary, it may also be convenient to indicate that the attachment's location is variable as well. This type of notation should be restricted to ligands that are known to be bound to a specific ring, but at an unspecified or unknown atom of that ring. The ligand will always replace a hydrogen atom on one of the ring atoms, and cannot be bound to any atom that lacks an attached hydrogen.

RECOMMENDED RECOMMENDED

Harry: do we want to say something about this case:

CAUTION: Contentious recommendation

From discussion with Alan McNaught, it sounds like IUPAC explicitly would like us to address at least some of the differences between drawings-for-print and drawings-for-electronic-use. Andrey points out that the next paragraph "seems too software oriented and is close to some drawing program guide or help file". He's right, but how can we present this issue (and many other similar ones) in a better way?

CAUTION: When working electronically, it is extremely important to specify the variable attachment correctly, according to the capabilities of the software program you are using. If the variable attachment is specified incorrectly, the structure may be interpreted as two disjoint fragments, with the variable bond being interpreted as a normal bond to a carbon atom. In addition to losing the intended variability, this misinterpretation will also add an additional CH4 to the structures perceived formula: CH3 for the "methyl group" at the center of the ring, and one more H for the location in the ring that wasn't substituted as intended.

End of contentious recommendation

 

Large substituents

Physically large substituents and large collections of substituents are most conveniently described in tabular form. Unfortunately, at the time of this writing there is no software that is able to interpret this sort of tabular data. Nonetheless, we present this as an acceptable way of depicting large substituents and large collections of substituents in printed form, and we hope that chemical information software will be enhanced over time to be able to interpret this data as well.

RECOMMENDED

From Journal of Medicinal Chemistry, 1998,
Vol. 41, No. 26, page 5204, Table 3:

[A table shows the meanings of R.
The relevant columns of this table are shown below:]

compd R
6 Ph
9a
9g

 

 

RECOMMENDED

Harry: I hate this example. Relative stereo? Absolute? What's up with the two methyl substituents in the piperidinyl ring? The nitrogen in that ring is fluxional, not guaranteed to be equatorial. The bond to R1 could have variable stereochemistry. The X1-X2 moiety should be simplified, since it is almost always CH2-CH2...

From Journal of Medicinal Chemistry, 1998,
Vol. 41, No. 26, page 5190

[Values of R1, R2, X1, X2, S1, S2, and S3 are given in a table.
This table is reproduced below:]

compd R1 R2 X1 X2 S1 S2 S3 %
inhibition
at 100 nM
8 i-Pr H CH2 CH2 H H OH 71
9 i-Pra H CH2 CH2 H H OH 11
10 i-Pr H CH2 CH2 H H H 28
11 i-Pr H CH2 CH2 H OH H 20
12 i-Pr H CH2 CH2 OH H H 25
13 i-Pr H CH2   H H OH 6
14 i-Pr H CHb CHb H H OH 15
15 i-Pr H CH2 CH2 H H F 26
16 i-Pr H CH2 CH2 H H OH 31
17 i-Pr H CH2 CH2 H OCH3 OH 42
18 i-Pr H CH2 CH2 H H OCH3 16
19 H H CH2 CH2 H H OH 11
20 CH3 H CH2 CH2 H H OH 20
21 H CH3 CH2 CH2 H H OH 0
22 CH3 CH3 CH2 CH2 H H OH 1
23 C6H5 CH3 CH2 CH2 H H OH 7
DMSO               4

a The carbon to which the i-Pr group is attached has the opposite stereochemistry from that in 8.
b Trans double bond

 

RECOMMENDED

From Journal of Medicinal Chemistry, 1998,
Vol. 41, No. 26, page 5232, Table 9:

[A table shows the meanings of X and Y.
The relevant columns of this table are shown below:]

compd -X-Y-
6e -NH-CO-
51 -CO-NH-
52 -NH-SO2-

In cases where the external connection points must be indicated for the variable ligands, such indication must be unambiguous. Some attachment types are shown:

Note that the asterisk-without-arrow type is more commonly used for polymer notation. Do we want to deprecate it for external connection points outside of polymers?

Philosophically, an external connection point represents a "null" atom. That is, the associated bond is connected only to one atom. When the full structure is enumerated, the site represented by the external connection point is filled by an atom in the parent structure.

When a structural fragment is drawn with explicit bonds, the location of its attachment point should be shown explicitly. The attachment point should not be implied simply by the absence of a hydrogen on a drawn structure.

AVOID

From Journal of Medicinal Chemistry, 1998,
Vol. 41, No. 26, page 5231:

[A table shows the meanings of R, W, R1, and R2.
The relevant columns of this table are shown below:]

Compd R1 R2 R3 W
5i Cl H COOH
5o Cl OEt COOH
39 H Cl H COOH
5m H H COOH
40 Cl H H

The point of attachment should not be indicated in a subtle manner such as by varying the length of a bond.

AVOID

From Journal of Medicinal Chemistry, 1998,
Vol. 41, No. 26, page 5228:

[The many values of R and R4 are defined in a large table. Possible values of R listed in the table are: H, OEt, OH, or OMe. Values of R4 are listed as: Me, Et, n-Pr, i-Pr, n-Bu, i-Bu, n-Pent, Cyclohexyl, Phenyl, Benzyl, 2-Phenethyl, 3-Buten-1-yl, n-Hex, or one of the following structures (Note the reduced size of the bond at the attachment point, which may easily be mistaken for a negative charge):]