Klotho: a Tutorial and Manual |
William B. Wise, Jason Holcomb, and Toni Kazic |
September 14, 2000 and most recently revised June 8, 2002 |

where CompoundName and CompoundSpecifications are variables denoting the name of the compound and its Klotho specifications. The period at the end is absolutlely essential. CompoundSpecifications is a list of terms, where the syntax of the list is
config(ethanol,CompoundSpecifications). |
We now discuss the arguments that constitute the
CompoundSpecifications.
Once the CompoundName has been entered, the next item to be specified is the form or architecture of the molecule. Here there are three main options to consider: chain, ring, and ring system.
Chain is used for all noncyclic molecules and substituents, such as aldehydo-sugars and fatty acids. (There is an older way of describing chains which is useful for very small molecules; we illustrate it briefly in Section 2.8.4.) Ring is used for molecules containing a single cyclic structure such as benzene, cyclohexane, imidazole etc. Ring system is chosen when one wishes to code a molecule containing adjacent multiple rings, such as steroids, polynuclear aromatics, and purines. Molecules containing more than one type of architecture, such as ATP, are built from substituents (see Section 2.8.3). Obviously for ethanol, chain is the molecular form to use. So now our config rule is
config(ethanol,[chain([ ... ])]). |
Notice how each parenthesis ( ``('' , ``)'' ) and each bracket ( ``['' , ``]'' ) balances its mate, starting from the innermost pair and going outward to both the left and right ends of the config rule. Every time you use one of these grouping delimiters, you must put its mate in the right place --- otherwise your config rule will be rejected! Thus we have
([([ ... ])]) |
([([ ... )]]) |
Organic chemists look at the longest path through non-hydrogen atoms to help determine how to name a molecule. In the same way, entering a molecule's description into Klotho requires that one recognize a long (not necessarily the longest, though that can help) path through the molecule of interest. For example, ethanol is a chain of two carbon atoms, each of which is bonded to three other groups (hydrogens and hydroxyls) and the other carbon. So ethanol's CompoundSpecifications is a list of words, where each word designates a moiety in the order in which it occurs as one moves through the molecule along the path. For rings and ring systems, one picks an atom on the ring itself and then ``walks around'' the ring, describing the atoms and substituent groups one encounters on the walk.
For any particular molecule, the number of words in the
CompoundSpecifications list
will depend on one's choice of words: lists with words denoting large
substituent groups will be shorter than those using words for small
substituent groups or atoms. This means there is no one ``right'' way
to describe a molecule: only ways which are more concise than others!
A terminal can take none, one or two arguments, depending on whether one needs to include a specific number for the atoms and whether one needs to add substituent groups that are bound to that particular atom and are not represented elsewhere in the molecule's path. For example one could write
chain([...car... chain([...car(1)... |
chain([car(1,hyd&&hyd&&hydroxyl)... |
chain([hydroxy,methandiyl,... chain([hydroxy,methandiyl(1),... |
chain([hydroxymethyl,... chain([hydroxymethyl(1),... |
As a rule of thumb, we pick terminals that describe the largest, most inclusive moiety. Thus we prefer the last pair of descriptions using hydroxymethyl --- a methyl group with an hydroxyl substituted for one of methyl's hydrogens --- to listing the hydroxyl explicitly in the molecule's path (any of the first three sets; the first set would have to have that hydroxyl somewhere in the molecule's path). But all of these forms are correct.
The choice of whether to add an argument is decided by whether we want to ensure this particular carbon is numbered ``1'' in the rule and in Klotho's output term form. In each of the four sets of examples given above, the bottom form ensures that this particular carbon will be numbered ``1''.
If the atom has substituents which are not in the molecule's path, one adds a second argument describing those substituents and their orientation around the numbered atom (see Section 2.4 for details on specifying chirality). There is no form which has a substituent argument but not an atom number argument --- that's why the second set of examples has only one example! Thus
chain([car(hyd&&hyd&&hydroxyl),... |
car(2,hyd&&amino) |
| ~ | one sigma bond (single) |
| ~~ | one sigma & one pi bond (double) |
| ~~~ | one sigma & two pi bonds (triple) |
| & | aromatic bonds |
| # | C-N amide bonds |
| ? | bonds exhibiting P-O, C-O resonance |
| ?? | single-bond:triple-bond resonance |
Klotho assumes the bonds between different terminals are of the
sigma type unless otherwise specified; thus "~" need not
be explicitly included. Aromatic type bonds describe
situations involving multiple p orbital interactions (pi
systems) which are not amides or alternating double and single bonds.
Examples include compounds like benzene, pyrrole, adenine, etc.
The carbon-nitrogen bonds in amides are denoted by ``#''; we use a
special symbol to denote their resonance over a smaller
pi system than in aromatic rings. Alternating single and
double bonds in alkenes (e.g., -C=C-C=C-) are simply indicated with
"~~" and "~" bonds: no special symbol is needed.
The "?" bond type represents the dative bonding that occurs in ions like
phosphate and sulfate, or groups like carboxyl and phosphoryl.
The resonant bonding that occurs in carboxylate is coded with this bond type,
while the bonding in ketones and aldehydes is indicated by the ordinary
double bond.
Bond terms are always placed after the second group involved in that particular bond. Thus for ethylene one has
config(ethylene,[
chain([
car(1,hyd&&hyd),
car(2,hyd&&hyd)~~])]).
|
At last we have discussed the topics necessary to writing a config rule for ethanol, but as we stated at the outset, there is no one right way to specify a structure!
Perhaps the easiest imaginable rule would involve writing down the path through the molecule starting with the oxygen in the hydroxyl group. In this instance, the molecule's path would be oxygen, carbon, carbon, and the hydrogens attached to each of the path's atoms. One rule using this path is:
config(ethanol,[
chain([
oxy(1,hyd),
car(1,hyd&&hyd),
car(2,hyd&&hyd&&hyd)])]).
|
In this case the carbons and the oxygen are numbered in two series (1, 2 and 1). This is Klotho's default procedure. The carbons' numbering corresponds to the way an organic chemist would most likely number them, but other numberings can be forced by specifying them (see Section 2.3 for a more complete discussion of forcing atom numbering). For example:
config(ethanol,[
chain([
oxy(3,hyd),
car(2,hyd&&hyd),
car(1,hyd&&hyd&&hyd)])]).
|
The rules so far are quite verbose, though, and shorter rules are easy to write. For example,
config(ethanol,[
chain([
hydroxyl,
car(1,hyd&&hyd),
methyl])]).
|
config(ethanol,[
chain([
hydroxymethyl(1),
methyl])]).
|
config(ethanol,[
chain([
hydroxymethyl,
methyl])]).
|
Once you have created a config rule you will want to determine if Klotho can process it and and if all of the output files are correctly generated. First we'll show the output of our ethanol example, and then describe the mechanics of running Klotho on a config rule.
So what are the outputs of our ethanol rules? The outputs are the same (except for the example where we forced the numbering backwards). First look at the config rule Klotho ran:
config(ethanol,[
chain([
oxy(1,hyd),
car(1,hyd&&hyd),
car(2,hyd&&hyd&&hyd)])]).
|
and now the term form:
% ethanol
c(1,12,(0,nonchiral))-[c(2,left)~,h(2,right)~,o(1,up)~,h(3,down)~],
c(2,12,(0,nonchiral))-[h(4,left)~,c(1,right)~,h(5,up)~,h(6,down)~],
h(1,1,(0,nonchiral))-[o(1,nil)~],
h(2,1,(0,nonchiral))-[c(1,left)~],
h(3,1,(0,nonchiral))-[c(1,up)~],
h(4,1,(0,nonchiral))-[c(2,right)~],
h(5,1,(0,nonchiral))-[c(2,down)~],
h(6,1,(0,nonchiral))-[c(2,up)~],
o(1,16,(0,nonchiral))-[h(1,nil)~,c(1,down)~]
We describe how to read a term form in more detail
below, but here is a quick synopsis.
Every atom is represented at least twice --- once (and only once!) to
the left of the dash,
e.g.
c(1,12,(0,nonchiral))
(we call atoms these ``keyatoms'' in
Klotho jargon)
and one or more times in the list of atoms bound to that atom:
e.g.
[c(2,left)~,h(2,right)~,o(1,up)~,h(3,down)~]
(a ``keylist'' in
Klotho jargon).
Thus
h(4)
appears twice, first as bonded to
c(2)
and in its own right as
h(4,1,(0,nonchiral)).
So immediately, you can see ethanol has two carbons, six hydrogens,
and one oxygen; that carbon one
(c(1,12,(0,nonchiral))) is bonded to
carbon two to its left, hydrogen two to its right, oxygen one above,
hydrogen three below, etc.

When checking config rules, one should always rely most on the term form. The term form contains
information not found in other output --- such as the biochemically correct numbering of the
atoms and reasonably accurate partial charges. This information is lost for outputs that rely on
SMILES strings, such as the PDB files generated by CONCORD, which arbitrarily numbers atoms.
Our simple example of ethanol illustrated several important topics, but
we still need to explain how to specify atom numbers and orientations of
groups about atoms.
By default, Klotho numbers atoms beginning with the first term in the config rule and numbers each element in its own series. If the molecular path you pick corresponds to the standard convention for numbering that type of molecule, odds are you won't have to force the numbering at all.
However, sometimes atoms must be numbered specifically. This occurs most often when the standard scheme numbers two elements in the same series (such as the heterocyclic carbons and nitrogens in purines); or when the molecule is a branched chain; or when you want to use the molecule as a substituent of a larger molecule (and all substituents are eventually so used!). Klotho's grammar allows the user to specifically number particular atoms simply by putting the atom number in parenthesis after the moiety's terminal. Drawing from our ethanol example, we wrote:
car(1,hyd&&hyd) |
hydroxymethyl(1) |
Each example uses a pair of parentheses to enclose the argument giving the number of the atom (for hydroxymethyl, the carbon included in that group is assigned the number 1. In general, numbering a moiety's terminal will number that moiety's carbon atom, or failing that its atom of highest atomic mass).
Numbering can be forced for any atom or moiety in the config rule. For example, given
car(2,hyd&&hydroxymethyl) |
car(2,hyd&&hydroxymethyl(3)) |
In the purine case,
config(purine,[
ring_system([
ring([
car(6,hyd)&,
car(5)&,
car(4)&,
nit(3)&,
car(2,hyd)&,
nit(1)&]),
ring([
nit(7)&,
car(8,hyd)&,
nit(9,hyd)&,
car(4)&,
car(5)&])],
[conjugate(1,pseudopos([car(4),car(5)]),
2,pseudopos([car(4),car(5)]))])]).
|
(We'll discuss the [conjugate(1,pseudopos([car(4),car(5)]),2,pseudopos([car(4),car(5)]))] in Section 2.7 below: essentially it tells Klotho to paste the two rings together at carbons four and five, which are shared between them.)
In many cases you need only force the numbering at a few atoms, and Klotho will be able to guess how you wanted the intervening atoms numbered. Consider the isomer of pentane:
config('2-methylbutane',[
chain([
methyl(1),
car(2,hyd&&methyl),
ethyl(3)])]).
|
config('2-methylbutane',[
chain([
methyl,
car(2,hyd&&methyl(5)),
ethyl])]).
|
In the terms
car(1,hyd&&hyd)
and
car(2,hyd&&hydroxymethyl) each
car
has two arguments. The first is obviously that carbon's number, but what
is the second?
The second argument specifies substituent groups bound to the carbon atom
which are not otherwise indicated in the config rule. Recall that the config
rule describes the ``maximal'' path through the molecule. Most atoms
in the path (``path atoms''), however,
have valences greater than two --- for example, tetrahedral carbons! So to
indicate the groups not on the path, we insert them as the second argument of
the path atom's terminal, separating them by the
&& symbol. The resulting term gives the
orientation of groups local to the path atom. If the groups are
different, the atom is chiral; so this is the fundamental mechanism
used to specify an atom's chirality.
The orientation of the groups around the path atom is given by the order of the groups around the && symbol (left_group&&right_group), and is interpreted by Klotho's grammar depending on the overall architecture of the molecule. If the molecule is a chain viewed running vertically in a Fischer project, the left_group is to the left of the chain's axis and the right_group to the right. Equivalently, if the chain runs horizontally across the page, the left_group is above the chain in the Fischer projection, the right_group to the right.
If the molecule is a ring or ring system, the left_group is above the mean plane of the ring, the right_group is below it.
If three groups are shown
(e.g. hyd&&hyd&&hyd),
Klotho understands these to be rotationally equivalent. These
relationships are summarized in the following table, showing as an
example just the non-path groups around a tetrahedral carbon.
| structural diagram | order of terminals | resulting structure |
| chain running vertically | left&&right | Left -- C -- Right | chain running horizontally | left&&right | ![]() |
| ring or ring system | left&&right | ![]() |
For example, consider L-alanine and D-alanine. If we imagine the molecular path running vertically from the carboxyl to the methyl, a rule for L-alanine is
config('L-alanine',[
chain([
carboxyl,
car(1,amino&&hyd),
methyl])]).
|
config('D-alanine',[
chain([
carboxyl,
car(1,hyd&&amino),
methyl])]).
|
If one imagines the molecular path running horizontally from the amino to the alpha hydrogen, one could alternatively code L-alanine as:
config('L-alanine2',[
chain([
amino,
car(1,carboxyl&&methyl),
hyd])]).
|
Local chirality can be very complex, particularly in fused aliphatic
ring systems. For an illustrative example and thorough discussion, see
5-beta-perhydrocyclopentanophenanthrene below.
Branched chain compounds or substituents can be defined by nesting one moiety inside another. Carbon 4 of isobutanol is defined by the following config rule as being bound to Carbon 2 (which also binds a hydrogen atom).
config(isobutanol,[
chain([
hydroxymethyl(1),
car(2,hyd&&car(4,hyd&&hyd&&hyd)),
methyl(3)])]).
|
car(2,hyd&&car(4,hyd&&hyd&&hyd)) |
config(isobutanol,[
chain([
hydroxymethyl(1),
car(2,hyd&&methyl(4)),
methyl(3)])]).
|
(oxy(3,____ )~) to car(5,____;
(car(8,___)~) to oxy(3, ____; and
(methylene~~) to (car(8, ____.
config('5-enolpyruvylshikimate-3-phosphate',[
ring([
car(1,carboxyl(7)),
car(6,hyd&&hyd),
car(5,(oxy(3,(car(8,carboxyl&&(methylene~~)))~)~)&&hyd),
car(4,hyd&&hydroxyl),
car(3,hyd&&phosphate),
car(2,hyd)~~])]).
|
config('pyridoxal-phosphate',[
ring([
nit(1)&,
car(2,methyl(7))&,
car(3,hydroxyl)&,
car(4,aldehyde(8))&,
car(5,car(9,hyd&&hyd&&phosphate))&,
car(6)&])]).
|
For some other examples of nesting, see
shikimate
and
prephenate.
Klotho provides a very easy means to specify the isomerism that can occur in molecules which contain double bonds. Fumarate and maleate, because they are geometric isomers, illustrate how easily one config rule can be converted to another through one simple change.
Consider fumarate. The rule given below indicates a double bond between car2 and car3 with the two carboxyl groups being trans to one another. Note that the chain is declared as usual, but before closing the config statement a comma is inserted and then the trans declaration is made.
config(fumarate,[
chain([
carboxyl(1),
car(2,hyd)~~,
car(3,hyd),
carboxyl(4)]),
trans(carboxyl(1),carboxyl(4),bond(car(2),car(3)))]).
|
config(maleate,[
chain([
carboxyl(1),
car(2,hyd)~~,
car(3,hyd),
carboxyl(4)]),
cis(carboxyl(1),carboxyl(4),bond(car(2),car(3)))]).
|
![]() |
![]() |
| fumarate | maleate |
So far all of our examples have involved linear paths of atoms, singly or joined together in a branched structure. Essentially the same thinking is used to build ring and ring systems: define a path through the molecule and describe the groups encountered along it. The technique is the same whether the molecule is aliphatic or aromatic: only the bond types and the number and orientation of substituent groups changes. By convention Klotho assumes one walks clockwise around a ring; if atom numbers do not increase in a clockwise direction they must be forced.
For aliphatic rings, the description assumes the ring is visualized in a Haworth projection. Consider beta-D-ribofuranose:

config('beta-D-ribofuranose',[
ring([
oxy,
car(1,hydroxyl&&hyd),
car(2,hyd&&hydroxyl),
car(3,hyd&&hydroxyl),
car(4,hydroxymethyl&&hyd)])]).
|
In this version we've used the usual terminals. However for cyclic sugars, we usually use the term anomeric for the anomeric carbon. Though this isn't necessary for specifying the structure, for writing rules for the other anomers we've found it helpful to pinpoint which carbon's groups will vary depending on the orientation of hemiacetal formation (i. e., whether the sugar is the alpha or beta anomer). Local chirality at each atom is indicated as described in Section 2.4.
config('beta-D-ribofuranose',[
ring([
oxy,
anomeric(1,hydroxyl&&hyd),
car(2,hyd&&hydroxyl),
car(3,hyd&&hydroxyl),
car(4,hydroxymethyl&&hyd)])]).
|
For aromatic rings, the description assumes the ring is in planar projection and that all ring members participate in the pi electron system. Therefore all bonds in the ring proper are '&', not alternating double and single bonds. This is true whether or not the ring is heteroaromatic; Klotho's grammar recognizes pi-deficient and pi-excessive rings in calculating the approximate charge at each atom. As an example consider pyrimidine.

config(pyrimidine,[
ring([
nit(1)&,
car(2,hyd)&,
nit(3)&,
car(4,hyd)&,
car(5,hyd)&,
car(6,hyd)&])]).
|
For ring systems, either aliphatic or aromatic, the system is built by enumerating each contributing ring as if it was an intact ring, and then specifying how the contributing rings are to be conjugated together. For example, consider purine:

config(purine,[
ring_system([
ring([
car(6,hyd)&,
car(5)&,
car(4)&,
nit(3)&,
car(2,hyd)&,
nit(1)&]),
ring([
nit(7)&,
car(8,hyd)&,
nit(9,hyd)&,
car(4)&,
car(5)&])],
[conjugate(1,pseudopos([car(4),car(5)]),
2,pseudopos([car(4),car(5)]))])]).
|
The rule shows how substituents are joined when more than two atoms must be ``simultaneously'' connected --- that is, when rings must be conjugated together into a ring system. In the case of purine, the pyrimidinyl and imidazoyl rings are conjugated together. The graph-theoretic (not the chemical!) version of this operation can be imagined as pasting together two separate rings, each numbered as they will be in the final ring system. Since carbons four and five are shared between the pyrimidinyl and imidazoyl moieties, these numbers are shared by both rings. Notice that we number each contributing ring as it will be numbered in the final ring system, forcing the numbering to produce a single series of numbers for nitrogen and carbon according to the biochemical convention for numbering purines. The conjugation is specified by a list of
terms, where each PseudoPosition is of the form:
The rings are numbered in the order in which they appear in the ring system list.
Despite the name, conjugation is not limited to aromatic ring systems: see Section 4.2.4 for an example of an aliphatic ring system (the steroid nucleus). The form is exactly the same as this, however.
Notice it's perfectly ok to form a ring system which has both aromatic and aliphatic rings. Consider 1,3,9-trimethylxanthine and for the sake of illustration assume that the six-membered ring is aliphatic. Here's the rule:
% '1,3,9-trimethylxanthine'
config('1,3,9-trimethylxanthine',[
ring_system([
ring([
car(4)&,
nit(9,methyl(12))&,
car(8,hyd)&,
nit(7)&,
car(5)&]),
ring([
car(4)~,
nit(3,methyl(11))~,
car(2,oxy~~),
nit(1,methyl(10))~,
car(6,oxy~~)~,
car(5)&])],
conjugate(1,pseudopos([car(4),car(5)]),2,pseudopos([car(4),car(5)]))])]).
and here's the term form:
% '1,3,9-trimethylxanthine'
c(2,12,(0,nonchiral))-[n(1,left)~,n(3,right)~,o(1,nil)~~],
c(4,12,(0,nonchiral))-[n(3,left)~,c(5,flat)&,n(9,flat)&],
c(5,12,(0,nonchiral))-[c(6,right)~,c(4,flat)&,n(7,flat)&],
c(6,12,(0,nonchiral))-[c(5,left)~,n(1,right)~,o(2,nil)~~],
c(8,12,(0,nonchiral))-[h(4,nil)~,n(9,flat)&,n(7,flat)&],
c(10,12,(0,nonchiral))-[h(5,left)~,h(6,right)~,h(10,up)~,n(1,down)~],
c(11,12,(0,nonchiral))-[h(7,left)~,h(8,right)~,h(9,up)~,n(3,down)~],
c(12,12,(0,nonchiral))-[h(1,left)~,h(2,right)~,h(3,up)~,n(9,down)~],
h(1,1,(0,nonchiral))-[c(12,right)~],
h(2,1,(0,nonchiral))-[c(12,left)~],
h(3,1,(0,nonchiral))-[c(12,down)~],
h(4,1,(0,nonchiral))-[c(8,nil)~],
h(5,1,(0,nonchiral))-[c(10,right)~],
h(6,1,(0,nonchiral))-[c(10,left)~],
h(7,1,(0,nonchiral))-[c(11,right)~],
h(8,1,(0,nonchiral))-[c(11,left)~],
h(9,1,(0,nonchiral))-[c(11,down)~],
h(10,1,(0,nonchiral))-[c(10,down)~],
n(1,14,(0,nonchiral))-[c(6,left)~,c(2,right)~,c(10,up)~],
n(3,14,(0,nonchiral))-[c(2,left)~,c(4,right)~,c(11,up)~],
n(7,14,(0,nonchiral))-[c(5,flat)&,c(8,flat)&],
n(9,14,(0,nonchiral))-[c(12,up)~,c(4,flat)&,c(8,flat)&],
o(1,16,(0,nonchiral))-[c(2,nil)~~],
o(2,16,(0,nonchiral))-[c(6,nil)~~]
Klotho provides three basic means of compound coding: direct
enumeration of the compound's atoms, bonds, and stereochemistry; model/diff;
and linkage of substituents. Direct enumeration implies
that the entire compound is coded for in one config rule: designating
the structure as a chain, ring, or ring_system. This is what we've done in
the examples so far. Alternatively, one can code a compound using
molecules or substituents that are already known to Klotho
through model/diff or substituent/linkage rules. We call each of these three
forms of expression --- direct enumeration, model/diff, and substituent/linkage
---
locutions.
With model/diff one uses a molecule that is already in Klotho, and that is similar to the molecule one wishes to code. Linking substituents is one of the easiest means to create a config rule and may be appropriate for coding a molecule that can be formed by linking two or more substituents that are already in Klotho. In either case, if Klotho does not contain the substituents or the model that one requires, they must be entered before the model/diff or substituent/linkage rules.
Once a structure is built it can be reused indefinitely, and any
of the above methods can be combined. We presently have a fairly
large library of substituents, covering most of the major biochemical
building building blocks (see Section
4.4
for a list), and one
can build new ones at will. It's usually most efficient to build a
molecule using the largest possible substituents, pre-defined or one's own.
Naming substituents or molecules correctly is critical to
making molecules easy to find and build. For example, biochemists often refer to a
substituent by the name of the complete molecule from which it's
derived. That's fine for ordinary conversation, but terrible for
databases: how would the database distinguish between a molecule and
a substituent related to it if both had identical names? Similarly,
each substituent built from a molecule needs to have a distinct name
--- there are many substituents that can be derived by single changes
from beta-D-ribofuranose! This can create even more problems, because
often there is no commonly recognized systematic name for the various
substituents.
For Klotho we have adopted the following conventions for naming.
Molecules are given the most descriptive biochemical name. Thus you will not find a molecule named "glucose" in Klotho, but rather:
| D-glucose |
| L-glucose |
| alpha-D-glucopyranose |
| beta-D-glucopyranose |
| alpha-L-glucopyranose |
| beta-L-glucopyranose |
| etc. |
If a biochemical name for that particular substituent exists in the literature (e. g. ``adenyl''), that name is used. If such a commonly understood name does not exist, one is created from the most descriptive biochemical name. For atoms whose valence is incomplete because they will be joined to another substituent (in effect, the atom has a ''dangling bond''), the atom is indicated by its atom number followed by ``-yl''. For example, the ribosyl moiety of the nucleosides and nucleotides is D1-dehyroxy-5-oxy-ribofuranosyl
config('D1-dehyroxy-5-oxy-ribofuranosyl',[
ring([
oxy,
anomeric(1,hyd),
car(2,hyd&&hydroxyl),
car(3,hyd&&hydroxyl),
car(4,oxymethyl&&hyd)])]).
|
config(adenine,[
model('purine',[
diff(car(6,hyd),car(6,amine(10)))])]).
|
In a similar manner a rule for guanine could be created from the purine model
by replacing the hydrogen atoms on carbon atoms number two and six with keto and
amine groups. Model/diffs (and all other locutions) can be used iteratively (see
Section
2.8.3
for an example building on adenine).
In biochemistry, many molecules are no more than combinations of two or more smaller moities, and the most common use of substituents is in an explicit substituent/linkage rule that names the component moieties and describes how they are put together. The components need not have the same architecture and chains, rings, and ring systems can be freely mixed together. As an example, here is how to take adenine and D1-dehyroxy-5-oxy-ribofuranosyl, which were coded above, and link these together with the triphosphoryl terminal to produce ATP.
config('ATP',[
substituent(adenyl),
substituent('D1-dehyroxy-5-oxy-ribofuranosyl'),
substituent(triphosphoryl),
linkage(from(triphosphoryl,pho(1),
to('D1-dehyroxy-5-oxy-ribofuranosyl',
attach_to([oxy,car(5)])),
nil,single),
linkage(from('D1-dehyroxy-5-oxy-ribofuranosyl',car(1)),
to(adenyl,nit(9)),
up,single)]).
config(adenyl,[
model(adenine,[
diff(nit(9,hyd),nit(9))])]).
|
The FromSubstituent and ToSubstituent have the form
The SubstituentName must be identical to that used in the substituent term, for each substituent in the rule.
Klotho assumes a linkage is always between two specific atoms. There are two ways these atoms are indicated in the linked substituents. The first way, used for the phosphorus, is simply to give the atom's element and its number in the rule defining that substituent (not the numbering Klotho will eventually produce for the final molecule). Since we happen to know the terminal for triphosphoryl,
triphosphoryl -->
{ list_expansion([diphosphoryl,oxy,oxy],
[DiPhosphoryl,Oxy1,Oxy2]) },
[p(_P,31,0)-[(Oxy1,nil)?,
(Oxy2,nil)?,
(o(_O1,16,0)-[(DiPhosphoryl,nil)~,_Other~])~,
_Other~]].
|
Often, though, a config rule doesn't explicitly number every atom. One can either run the substituent's rule in Klotho to determine the relevant atom's number in the substituent, or use the second way of indicating an atom's number: by the number of the atom to which it is attached. Thus we have:
to('D1-dehyroxy-5-oxy-ribofuranosyl',attach_to([oxy,car(5)])),
|
That finishes up the from and to terms. The BondDirection is given from the FromSubstituent to the ToSubstituent, and uses the regular Klotho bond direction indicators (left, right, up, down, isomeric(up), isomeric(down)). The BondType is given by a term, rather than the usual symbols (yes, it's an historical remnant ;-)). In the table below list of bond type terms is shown in the first column, together with their symbols in the config rules (shown already in Section 2.2.5):
| term in linkage rule | symbol in config rule | chemical bond |
| single | ~ | one sigma bond (single) |
| double | ~~ | one sigma & one pi bond (double) |
| triple | ~~~ | one sigma & two pi bonds (triple) |
| aromatic | & | aromatic bonds |
| cn_resonant | # | C-N amide bonds |
| resonant | ? | bonds exhibiting P-O, C-O resonance |
| double_resonant | ?? | single-bond:triple-bond resonance |
The adenyl rule uses a model/diff locution on the rule for adenine (which is itself a model/diff on purine). The rule illustrates iterative modification of a molecule: adenyl is modeled on adenine which is modeled on purine.
Unlike some procedural languages which must have
a subroutine defined before it is called later in that file, config rules need
not be loaded into Klotho in any particular order. They just
all need to be present in the database at the same time.
We've described the chain locution, and that's the one we most commonly use. However, there is an older locution for small chains built on an explicit or implicit central atom --- the sides/center locution. For example, here is a version of L-alanine, which has an explicit center atom:
config('L-alanine',[
left(amino),
right(hyd),
top(carboxyl),
bottom(methyl),
center(car(1))]).
|
This locution can also be used for chains with only two path atoms. In this case the central atom is implicit --- it's as if a dummy atom was inserted into the bond separating the two path atoms. For example, another rule for hydroxyethyl is
config(hydroxyethyl,[
top(car(1,hyd&&hydroxyl)),
bottom(methyl)]).
|
Left/right direction indicators could also be used:
config(hydroxyethyl,[
left(car(1,hyd&&hydroxyl)),
right(methyl)]).
|
We gave a quick synopsis of the semantics of the term form in Section
2.2.7
above. It's time now to explain the details.
% ethanol
c(1,12,(0,nonchiral))-[c(2,left)~,h(2,right)~,o(1,up)~,h(3,down)~],
c(2,12,(0,nonchiral))-[h(4,left)~,c(1,right)~,h(5,up)~,h(6,down)~],
h(1,1,(0,nonchiral))-[o(1,nil)~],
h(2,1,(0,nonchiral))-[c(1,left)~],
h(3,1,(0,nonchiral))-[c(1,up)~],
h(4,1,(0,nonchiral))-[c(2,right)~],
h(5,1,(0,nonchiral))-[c(2,down)~],
h(6,1,(0,nonchiral))-[c(2,up)~],
o(1,16,(0,nonchiral))-[h(1,nil)~,c(1,down)~]
Each term in the (implicit) list is of the form
where the KeyAtom is of the form
and
is of the form:
Each atom is represented at least twice in the term form: once as a KeyAtom and at least once as a member of a ListOfBondedAtoms, the number of times being determined by the number of substituents attached to that atom. Thus hydrogen (and halogens) should be represented only once in all the ListOfBondedAtoms lists of the term form, and other atoms more often. Atoms in the ListOfBondedAtoms are of the form
Naturally, an atom will have the same Element and AtomNumber in the KeyAtom and ListOfBondedAtoms. The BondTypes and BondDirections are exactly the same ones used in the high-level config rules: sigma bonds are explicitly indicated. Each atom in the ListOfBondedAtoms is bonded to that list's KeyAtom. Thus the meaning of the bond terms in
c(1,12,(0,nonchiral))-[c(2,left)~,h(2,right)~,o(1,up)~,h(3,down)~],
is simply (if a bit ungrammatically):
| ``Carbon 1 is bonded on its left to carbon 2 by a sigma bond; |
| carbon 1 is bonded on its right to hydrogen 2 by a sigma bond; |
| carbon 1 is bonded on its `up' to oxygen 1 by a sigma bond; and |
| carbon 1 is bonded on its `down' to hydrogen 3 by a sigma bond.'' | tr>
Thus the BondDirection is always described from the KeyAtom's ``perspective'', as if one were sitting on the KeyAtom and looking out.
When one looks at the KeyAtom for each atom in a ListOfBondedAtoms, the types of the bonds remain the same but their directions are reversed, since the perspective has shifted to a different KeyAtom. For example, for each of the atoms bonded to carbon 1:
c(2,12,(0,nonchiral))-[ . . . ,c(1,right)~, . . . ],
h(2,1,(0,nonchiral))-[c(1,left)~],
o(1,16,(0,nonchiral))-[ . . . ,c(1,down)~]
h(3,1,(0,nonchiral))-[c(1,up)~],
What if the KeyAtom
is trigonal (sp2)? Consider this extract from the
term form for
sphingosine
(at left, and
config rule and
PDB file):
c(3,12,(0,chiral))-[
h(8,left)~,o(2,right)~,c(2,up)~,
c(4,(down,isomeric(down)))~],
c(4,12,(0,nonchiral))-[
h(10,(nil,isomeric(down)))~,
c(3,(up,isomeric(up)))~,c(5,trans)~~],
c(5,12,(0,nonchiral))-[
c(6,(right,isomeric(down)))~,
h(11,(nil,isomeric(up)))~,c(4,trans)~~],
c(6,12,(0,nonchiral))-[
c(5,(left,isomeric(up)))~,
c(7,right)~,h(13,up)~,h(12,down)~],
h(10,1,(0,nonchiral))-[c(4,(nil,isomeric(up)))~],
h(11,1,(0,nonchiral))-[c(5,(nil,isomeric(down)))~],
Trigonal atoms, and the atoms bonded to them, have two types of BondDirection arguments. The first type is just like the ones you have seen before, but the directions are now cis and trans.
The second type is a tuple of two arguments. The first argument, the ReciprocalTetrahedralBondDirection, is simply the BondDirection reciprocal to the adjacent tetrahedral atom. For example, since C3 considers C4 to be ``down'' relative to itself, C4 considers C3 to be ``up'' relative to itself:
c(3,12,(0,nonchiral))-[ . . ., c(4,(down, . . . ))~],
c(4,12,(0,nonchiral))-[ . . ., c(3,(up, . . . ))~, . . . ],
The second argument is the
LocalBondDirection. (In the case of
trigonal atoms, one can think of it as the
TrigonalBondDirection; for
hexagonally coordinated atoms, it would be the
HexagonalBondDirection; etc.)
This gives the direction of the bonds from the trigonal (or hexagonal) atom
looking outward, and is
used to indicate the geometric isomerism around double bonds. For trigonal atoms, the only
directions allowed are
In the figure at left, we've shown the relationships among the
ReciprocalTetrahedralBondDirection
and
LocalBondDirections for the double bond of sphingosine,
abbreviating ``up'' as ``u'', ``down'' as
``d'', ''nil'' as ``n'', and ``isomeric'' as ``i''. The directions in red are from the viewpoint
of carbons 4 and 5 respectively, while those in blue are the pertinent reciprocal directions from
the atoms bonded to C4 and C5. The
LocalBondDirections give the geometric isomerism around the
double bond. In the case of sphingosine, C3 and C6 are
trans to each other, so those atoms are
isomeric(up) from C4 and
isomeric(down) from C5, respectively. Both C4 and C5 show the same
``direction'' to each other
(trans so Klotho knows how to place the atoms). Once two of a
trigonal atom's bond directions are determined, the other is moot; so in the case of the
hydrogens, Klotho assigns a
nil
ReciprocalTetrahedralBondDirection to simplify its (ok, her!)
bookkeeping.
As a second example, consider this part of the term form for arachidonate (below left, and config rule and PDB file), or just look at the bond direction diagram.
|
c(4,12,(0,nonchiral))-[
c(3,left)~,h(5,right)~,
h(6,up)~,
c(5,(down,isomeric(down)))~],
c(5,12,(0,nonchiral))-[
h(7,(nil,isomeric(down)))~,
c(4,(up,isomeric(up)))~,
c(6,cis)~~],
c(6,12,(0,nonchiral))-[
c(7,(right,isomeric(up)))~,
h(8,(nil,isomeric(down)))~,
c(5,cis)~~],
c(7,12,(0,nonchiral))-[
c(6,(left,isomeric(down)))~,
c(8,(right,isomeric(down)))~,
h(10,up)~,h(9,down)~],
c(8,12,(0,nonchiral))-[
c(7,(left,isomeric(up)))~,
h(11,(nil,isomeric(down)))~,
c(9,cis)~~],
c(9,12,(0,nonchiral))-[
c(10,(right,isomeric(up)))~,
h(12,(nil,isomeric(down)))~,
c(8,cis)~~],
c(10,12,(0,nonchiral))-[
c(9,(left,isomeric(down)))~,
c(11,(right,isomeric(down)))~,
h(14,up)~,h(13,down)~],
c(11,12,(0,nonchiral))-[
c(10,(left,isomeric(up)))~,
h(15,(nil,isomeric(down)))~,
c(12,cis)~~],
c(12,12,(0,nonchiral))-[
c(13,(right,isomeric(up)))~,
h(16,(nil,isomeric(down)))~,
c(11,cis)~~],
c(13,12,(0,nonchiral))-[
c(12,(left,isomeric(down)))~,
c(14,(right,isomeric(down)))~,
h(18,up)~,h(17,down)~],
c(14,12,(0,nonchiral))-[
c(13,(left,isomeric(up)))~,
h(19,(nil,isomeric(down)))~,
c(15,cis)~~],
c(15,12,(0,nonchiral))-[
c(16,(right,isomeric(up)))~,
h(20,(nil,isomeric(down)))~,
c(14,cis)~~],
c(16,12,(0,nonchiral))-[
c(15,(left,isomeric(down)))~,
c(17,right)~,h(22,up)~,
h(21,down)~],
h(7,1,(0,nonchiral))-[
c(5,(nil,isomeric(up)))~],
h(8,1,(0,nonchiral))-[
c(6,(nil,isomeric(up)))~],
h(11,1,(0,nonchiral))-[
c(8,(nil,isomeric(up)))~],
h(12,1,(0,nonchiral))-[
c(9,(nil,isomeric(up)))~],
h(15,1,(0,nonchiral))-[
c(11,(nil,isomeric(up)))~],
h(16,1,(0,nonchiral))-[
c(12,(nil,isomeric(up)))~],
h(19,1,(0,nonchiral))-[
c(14,(nil,isomeric(up)))~],
h(20,1,(0,nonchiral))-[
c(15,(nil,isomeric(up)))~],
|
As before, the
ReciprocalTetrahedralBondDirection
and
LocalBondDirections
are reciprocal (obviously the reciprocal of ``nil'' is ``nil''). Since all the
double bonds are
cis, the bond directions for all the hydrogens
bonded to C5, C6, C8, C9, C11, C12, C14, and C15 are all the same:
isomeric(down).
Klotho believes that all tetrahedral atoms should have all four directions assigned, even if the atom is not chiral and all four directions were originally ``nil''. Consider methane:
% methane
c(1,12,(0,nonchiral))-[h(1,left)~,h(2,right)~,h(4,up)~,h(3,down)~],
h(1,1,(0,nonchiral))-[c(1,right)~],
h(2,1,(0,nonchiral))-[c(1,left)~],
h(3,1,(0,nonchiral))-[c(1,up)~],
h(4,1,(0,nonchiral))-[c(1,down)~]
You might be tempted to think that C1 is chiral if you relied only on the bond directions of the hydrogens.
In the case of aliphatic atoms this arbitrary assignment of directions
doesn't cause any confusion. But it can be confusing when we consider carbon
atoms that are bound to atoms in an aromatic ring. For example, consider
theophylline.
Here's the config rule:
% theophylline
config(theophylline,[
model(purine,[
diff(nit(1),nit(1,methyl(10))),
diff(car(2,hyd),car(2,oxy?)),
diff(nit(3),nit(3,methyl(11))),
diff(car(6,hyd),car(6,oxy?)),
diff(nit(7),nit(7,hyd)),
diff(nit(9,hyd),nit(9))])]).
config(purine,[
ring_system([
ring([
car(6,hyd)&,
car(5)&,
car(4)&,
nit(3)&,
car(2,hyd)&,
nit(1)&]),
ring([
nit(7)&,
car(8,hyd)&,
nit(9,hyd)&,
car(4)&,
car(5)&])],
conjugate(1,pseudopos([car(4),car(5)]),2,pseudopos([car(4),car(5)]))])]).
Now consider the term form:
% theophylline
c(2,12,(0,nonchiral))-[o(2,nil)?,n(3,flat)&,n(1,flat)&],
c(4,12,(0,nonchiral))-[n(3,flat)&,c(5,flat)&,n(9,flat)&],
c(5,12,(0,nonchiral))-[c(4,flat)&,c(6,flat)&,n(7,flat)&],
c(6,12,(0,nonchiral))-[o(1,nil)?,n(1,flat)&,c(5,flat)&],
c(8,12,(0,nonchiral))-[h(8,nil)~,n(7,flat)&,n(9,flat)&],
c(10,12,(0,nonchiral))-[h(4,left)~,h(5,right)~,h(6,up)~,n(1,down)~],
c(11,12,(0,nonchiral))-[h(1,left)~,h(2,right)~,h(3,up)~,n(3,down)~],
h(1,1,(0,nonchiral))-[c(11,right)~],
h(2,1,(0,nonchiral))-[c(11,left)~],
h(3,1,(0,nonchiral))-[c(11,down)~],
h(4,1,(0,nonchiral))-[c(10,right)~],
h(5,1,(0,nonchiral))-[c(10,left)~],
h(6,1,(0,nonchiral))-[c(10,down)~],
h(7,1,(0,nonchiral))-[n(7,nil)~],
h(8,1,(0,nonchiral))-[c(8,nil)~],
n(1,14,(0,nonchiral))-[c(10,up)~,c(2,flat)&,c(6,flat)&],
n(3,14,(0,nonchiral))-[c(11,up)~,c(4,flat)&,c(2,flat)&],
n(7,14,(0,nonchiral))-[h(7,nil)~,c(5,flat)&,c(8,flat)&],
n(9,14,(0,nonchiral))-[c(4,flat)&,c(8,flat)&],
o(1,16,(0,nonchiral))-[c(6,nil)?],
o(2,16,(0,nonchiral))-[c(2,nil)?]
Each of the N1-C10 and N3-C11 bonds have directions, even though the nitrogens are trigonal and the carbons should lie in the plane of the five-membered ring.
So does Klotho think that N1 and N3 are chiral? Not at all! If you look at the third argument of each KeyAtom, you see that each atom is really nonchiral, e. g.
n(1,14,(0,nonchiral))-[c(10,up)~,c(2,flat)&,c(6,flat)&],
The lesson: Klotho marks chirality by the chiral/nonchiral argument, not just by the directions of the bonds around the KeyAtom.
| Rule doesn't compile into Klotho? | Check for syntax errors --- improperly balanced brackets and parentheses and names that should be quoted are the two likeliest offenders. |
| Rule doesn't generate a term form? | Check the error messages generated. In our experience so far, Klotho reliably rejects chemically dumb ideas. |
| If you think the chemistry is ok, make sure you have the right groups bonded to each other using the right bonds. | |
| Still no luck? Make sure all moieties are defined, either through a config rule or a terminal. | |
| Still? Check your atom numbering: you may be forcing numbering unnecessarily or improperly; or you may need to force it because the rule is ambiguous about what is bonded to what. | |
| Make sure you check the output of the term form for bonding, charges, chirality, and numbering! Just because you get a structure doesn't mean it's the right one ;-). | |
| Rule doesn't generate a Fischer projection? | No Fischers are generated for aromatic systems; extremely constrained or complicated molecules may crash the layout code (though normally, the picture will just be ugly). |
| Fischer generated, but looks funny. | Check your config rule. |
| Rule doesn't produce a PDB file? | Check to be sure a SMILES is produced. They usually are, but sometimes the SMILES-string generation code will fail. |
| Got a string? Check the error messages generated by CONCORD. It may complain about the string or complain about the structure itself --- for example if it is an octolose. | |
| PDB generated, but looks funny. | Check to be sure you understand what the molecule looks like in three dimensions. For example, that vertical backbone you see in Fischer projections zigzags in and out of the plane in 3 space, so that groups on zagging atoms are reversed in orientation relative to the Fischer projection. |
| Still looks funny? Well, CONCORD's minimization algorithm doesn't always do the right thing. We've found nonplanar amides and amines bound to aromatic heterocycles; fatty acids flying away from each other in lipids; and even stranger things. | |
| Still looks funny? Check your config rule. | All the output is generated, but atom numbering and/or charges are wrong. | Check the config rule. If you've used the model/diff or substituent/linkage locutions, remember that atom numbers and charges are inherited through all contributing groups, no matter how many steps back they go. Be sure you haven't duplicated a substituent or molecule that's already present in Klotho. And finally, if you wrote it all from scratch, make sure you know how the molecule is numbered in biochemical nomenclature. |
To check a rule we use the compound checking and tracking capabilities built into The Agora, which freely communicates with all the components of Moirai including Klotho.
These directions presume you are securely logged on to the server, that you have cd'ed to a directory that is visible to the server, that the file of your config rules is in that directory, and that the machine at which you are sitting has X-services and secure login. Directions for using the Web interface, which doesn't require all this, will eventually be found here. Every command should be followed by a carriage return.
The second way lets you control exactly which compounds you wish to check, and lets you check each one at a time. To use this method, just type chk_cpds('/full_path_to_your_file/your_file.pl'). and answer the questions. Then for each compound you wish to check, type dqcpd(compoundname). and answer the questions.
When you hit return, Klotho will compute everything it can and send the results to the screen (with the exception of spooling any pdb files) as well as to output files in the ``results'' subdirectory of your directory. Thus you can check your results easily by scrolling back up to the ====== Computing properties of X ====== message and reading down. Error messages will also appear here, and you should read them carefully.
Every time you run dqcpd/1,2 with the same compound name (or chk_new_cpds/1 runs it for you), the files previously produced will be overwritten. You will want to vary the compound name or move the files to another name if you want to keep the earlier results. Novices often write slightly different rules for a compound, appending a number to the compound name, until they figure out how to write the rule.
The files will be named using the compound's name (the first argument of the config rule) as a prefix, followed by a period and a suffix (one of {config,term,eps,smiles,pdb}). You should always have a .config and a .term file. The .config file just reprints the config rule you used and any other config rules Klotho needed to calculate the structure (but not the terminals). If you see rules there you didn't expect this may be because you used a name for the compound that's already in use (and you can compare your rule to Klotho's), you didn't realize you were using substituents, or something is wrong.
The .term file gives the atom-by-atom, bond-by-bond description of the structure. This is generated entirely within Klotho and will be correct provided your rule is correct! We describe how to read this output in Section 2.9. You should check to be sure the right atoms are connected to each other using the right bonds; that the orientation of atoms around each atom is correct; that the atom numbering follows biochemical convention; and that the charges on each atom are (approximately) correct.
If the molecule is not aromatic, there's a good chance Klotho will generate a Fischer projection and write it to an encapsulated postscript file (compoundname_fischer.eps). These pictures certainly aren't publication quality, but they're adequate for checking the results. Atom numbering and charge will follow that of the term form, but you must remember the Fischer projection rules in order to check chirality reliably.
Klotho will attempt to generate an isomeric SMILES string (the .smiles file). See here for a brief tutorial on SMILES strings. The dialect we use gives stereochemistry unambiguously (the ``@'' signs in the string) and gives all hydrogens explicitly (since one cannot rely on structure computation programs other than Klotho to recognize when a structure is a substituent!). Atoms are not numbered in SMILES strings, so you won't be able to compare it easily to the term form (apart from making sure each atom is represented in both).
If a SMILES string is successfully generated, Klotho will send it to CONCORD, which will attempt to compute a quasi-energy-minimized three-dimensional structure. Dump the .pdb file into your favorite molecule viewer and have a look. Again, the atom numbering will not be the same as in the term form due to SMILES string syntax.
CONCORD often doesn't succeed in generating a structure for a variety of reasons. It will always fail to generate output if your compound is a substituent --- that is, it has unfilled valences. Molecules can fail for other reasons, too; you can check the compound's .log and .exc files to learn more about what causes CONCORD to hiccup. Moreover, the resulting structure may have errors due to poor minimization (we've seen nonplanar amides, unflat amines in adenine, and lipids with fatty acids decidedly askew. So use the CONCORD results with caution. We're working on ways to correct these problems by substituting something else for CONCORD. In the interim, remember that if the .term form is correct, then so is the rule.
Suppose you're in the official checking program; you check a rule; and then
later decide you misread the output and the rule isn't
correct? Then just uncheck the rule: type
unchk_cpd(CpdName).
while you are still in The Agora and answer the questions.
It's under construction ;-).
In writing the tutorial we intentionally selected examples which clearly illustrate the principal features of the grammar but which do not contain unecessarily complicated structural features. Most biological molecules are more complex than those in the tutorial. Consequently we have included in the section below examples of rules for more complex molecules.
Sample chain:
[GIF]
[
Fischer Projection]
config('D-ribose',[
chain([
aldehyde,
car(2,hyd&&hydroxyl),
car(3,hyd&&hydroxyl),
car(4,hyd&&hydroxyl),
hydroxymethyl])]).
|
Sample ring:
[GIF]
[
Fischer Projection]
config('alpha-L-xylopyranose',[
ring([
car(1,hydroxyl&&hyd),
car(2,hydroxyl&&hyd),
car(3,hyd&&hydroxyl),
car(4,hydroxyl&&hyd),
car(5,hyd&&hyd),oxy])]).
|
A ring_systems type config rule is useful for defining compounds with multiple rings such as 21-deoxycortisol. Simply put, ring_systems are declared by first defining the rings in the system, and then pseudoposing each ring to another.
Sample ring system:
[GIF]
config(perhydrocyclopentanophenanthrene,[
ring_system([
ring([
car(10,methyl(19)),
car(5,hyd),
methandiyl(4),
methandiyl(3),
methandiyl(2),
methandiyl(1)]),
ring([
(car(10),left)~,
(car(5),up)~,
(methandiyl(6),right)~,
(methandiyl(7),right)~,
(car(8,hyd),right)~,
(car(9,hyd),up)~]),
ring([
(car(8),right)~,
(car(9),down)~,
(methandiyl(11),left)~,
(methandiyl(12),left)~,
(car(13,methyl(18)),left)~,
(car(14,hyd),up)~]),
ring([
(car(13),left)~,
(car(14),down)~,
(methandiyl(15),right)~,
(methandiyl(16),right)~,
(methandiyl(17),up)~])],
[conjugate(1,pseudopos([car(10),car(5)]),2,
pseudopos([car(10),car(5)])),
conjugate(2,pseudopos([car(9),car(8)]),3,
pseudopos([car(9),car(8)])),
conjugate(3,pseudopos([car(13),car(14)]),4,
pseudopos([car(13),car(14)]))])]).
|
|
Sample linkage (assumes sphingosine-N-yl and stearyl have already been coded): [GIF] [ Fischer Projection]
config(ceramide,[
substituent('sphingosine-N-yl'),
substituent(stearyl),
linkage(from('sphingosine-N-yl',nit(1)),
to(stearyl,car(1)),
nil,single)]).
|
For coenzymeA each substituent is coded individually, then linked in a separate rule. Note that the substituents are not themselves complete molecules, but lack bonds where the linkage is to be formed.
config('beta-mercaptoethylamino-N-yl',[
chain([
sul(1,hyd),
car(1,hyd&&hyd),
car(2,hyd&&hyd),
nit(1,hyd)])]).
config('R-pantothenyl',[
chain([
car(1,(oxy)?),
car(2,hyd&&hyd),
car(3,hyd&&hyd),
nit(1,hyd)#,
car(4,(oxy)?),
car(5,hydroxyl&&hyd),
car(6,methyl&&methyl),
oxymethyl]),
trans(car(3),car(5),bond(nit(1),car(4)))]).
config('3''-phospho-ADP-beta-yl',[
substituent('D-1-dehydroxy-3-phospho-5-oxy-ribofuranosyl'),
substituent(adenyl),substituent(diphosphopentaoxygen),
linkage(from('D-1-dehydroxy-3-phospho-5-oxy-ribofuranosyl',
car(1)),
to(adenyl,nit(9)),up,single),
linkage(from('D-1-dehydroxy-3-phospho-5-oxy-ribofuranosyl',
attach_to([oxy,car(5)])),
to(diphosphopentaoxygen,pho(1)),nil,single)]).
config('coenzymeA',[
substituent('R-pantothenyl'),
substituent('3''-phospho-ADP-beta-yl'),
substituent('beta-mercaptoethylamine-N-S-diyl'),
linkage(from('beta-mercaptoethylamine-N-S-diyl',nit(1)),
to('R-pantothenyl',car(1)),
trans(car(2),car(2)),cn_resonant),
linkage(from('R-pantothenyl',oxy(4)),
to('3''-phospho-ADP-beta-yl',pho(3)),
right,single)]).
|
config(cyanuric_acid,[
ring([
nit(1)&,
car(2,hydroxyl)&,
nit(3)&,
car(4,hydroxyl)&,
nit(5)&,
car(6,hydroxyl)&])]).
config('2-chloro-4-hydroxy-6-amino-1,3,5-triazine',[
model('cyanuric_acid',
[diff(car(2,hydroxyl),car(2,cl)),
diff(car(6,hydroxyl),car(6,amino(7)))])]).
|
Sample model/diff:
[GIF]
config(testosterone,[
model(perhydrocyclopentanophenanthrene,
[diff(methandiyl(3),keto(3)),
diff(methandiyl(4),car(4,hyd)),
diff(car(5,hyd),car(5)~~),
diff(methandiyl(17),car(17,hyd&&hydroxyl))])]).
|
| Example 1 |
car(1,hyd)~~,car(2,hyd)
|
| Example 2 |
car(1,hyd)#,nit(2,hyd)
|
Carbamyl-phosphate is a simple compound that illustrates a peptide bond. The '#' specifies that there is C=N resonance, while the '?' specifies resonance between the oxygen and carbon. The oxy bound to car2 is an example of a nested atom in which case the bond is between oxy and car2 rather than oxy and the atom to the right (i.e. phosphate).
config('carbamyl-phosphate',[
chain([
amine(1)#,
car(2,(oxy)?),
phosphate(3)])]).
|
config(benzene,[
ring([
car(1,hyd)&,
car(2,hyd)&,
car(3,hyd)&,
car(4,hyd)&,
car(5,hyd)&,
car(6,hyd)&])]).
|
Stereospecificity is important to define when coding diastereoisomers.
Consider cis-aconitate. The rule given below indicates a double bond between car3 and car4 with the two carboxyl groups being cis to one another. Note that the chain is declared as usual, but before closing the config statement a comma is inserted and then the cis declaration is made (alternatively, a trans declaration may be used when appropriate).
config('cis-aconitate',[
chain([
carboxyl(1),
methandiyl(2),
car(3,carboxyl(6))~~,
car(4,hyd&&carboxyl(5))]),
cis(carboxyl(6),carboxy(5),bond(car(3),car(4)))]).
|
config(muconate,[
chain([
carboxyl(1),
car(2,hyd)~~,
car(3,hyd),
car(4,hyd)~~,
car(5,hyd),
carboxyl(6)]),
cis(carboxyl(1),car(4),bond(car(2),car(3))),
cis(car(3),carboxyl(6),bond(car(4),car(5)))]).
|
config('5-beta-perhydrocyclopentanophenanthrene',[
ring_system([
ring([
car(10,methyl(19)),
car(5,hyd),
methandiyl(4),
methandiyl(3),
methandiyl(2),
methandiyl(1)]),
ring([
(car(10),left)~,
(car(5),down)~,
(methandiyl(6),right)~,
(methandiyl(7),right)~,
(car(8,hyd),right)~,
(car(9,hyd),up)~]),
ring([
(car(8),right)~,
(car(9),down)~,
(methandiyl(11),left)~,
(methandiyl(12),left)~,
(car(13,methyl(18)),left)~,
(car(14,hyd),up)~]),
ring([
(car(13),left)~,
(car(14),down)~,
(methandiyl(15),right)~,
(methandiyl(16),right)~,
(methandiyl(17),up)~])],
conjugate(1,pseudopos([car(10),car(5)]),
2,pseudopos([car(10),car(5)])),
conjugate(2,pseudopos([car(9),car(8)]),
3,pseudopos([car(9),car(8)])),
conjugate(3,pseudopos([car(13),car(14)]),
4,pseudopos([car(13),car(14)]))])]).
|
| to | the | left | of | car10 | is | car5 |
| to | the | down | of | car5 | is | car6 |
| to | the | right | of | car6 | is | car7 |
| to | the | right | of | car7 | is | car8 |
| to | the | up | of | car8 | is | car10 |
Notice that each ring is specified independently: there is no need to specify the directions of connections among rings. These latter will be appropriately assigned during expansion of the rule.
When we look at the terminal form output, we find the directions are properly filled in. To read this, think: ``AtomX in the list is to the Direction of the KeyAtom to which it is bonded." So the first line of the terminal form ---
says ``C10 is to the left of C1, C2 is to the right of C1, H11 is up from C1 and H12 is down from C1''. The easiest way is to imagine oneself sitting at the KeyAtom and looking around at the adjoining groups.
Why does the first ring work? The starting point (which is most conveniently chosen to be one of the atoms ``joining'' the two rings) is specified in the normal way by walking clockwise around the ring. There are only two chiral atoms in that ring (C10 and C5), and their directions are specified in the second ring.
Notice that if direction is specified in this manner (called the chiral center direction indicators) ---
that the Bond must be explicitly given. This is not true for other ways of specifying direction ---
config(methandiylcarboxyl,[
top(car(1,hyd&&hyd)),
bottom(carboxyl)]).
|
config('alpha-D-erythrofuranose',[
ring([
car(1,hyd&&hydroxyl),
car(2,hyd&&hydroxyl),
car(3,hyd&&hydroxyl),
car(4,hyd&&hyd),oxy])]).
|
The Group1&&Group2 above-and-below-the-ring direction indicators can be combined with the chiral center direction indicators:
config('L-4-hydroxyproline',[
ring([
(nit(1,hyd&&hyd),down)~,
(car(1,carboxyl(2)&&hyd),left)~,
methandiyl(3),
car(4,hydroxyl&&hyd),
(methandiyl(5),left)~])]).
|
Bugs
We therefore recommend that a substituent not be joined to another at a chiral atom until this is fixed.
Desiderata
| grammar rule for car |
| grammar rule for hyd |
| grammar rule for nit |
| grammar rule for nitp |
| grammar rule for oxy |
| grammar rule for oxy |
| grammar rule for sul |
| grammar rule for pho |
| grammar rule for mg |
| grammar rule for mn |
| grammar rule for rb |
| grammar rule for k |
| grammar rule for me |
| grammar rule for zn |
| grammar rule for fe |
| grammar rule for cu |
| grammar rule for na |
| grammar rule for ca |
| grammar rule for cl |
| grammar rule for br |
| grammar rule for i |
| grammar rule for f |
| grammar rule for b |
| grammar rule for si |
| grammar rule for cs |
| grammar rule for tl |
| grammar rule for hg |
| grammar rule for H+ |
| grammar rule for Hg++ |
| grammar rule for Mg++ |
| grammar rule for Mn++ |
| grammar rule for Zn++ |
| grammar rule for Fe++ |
| grammar rule for Fe+++ |
| grammar rule for Cu++ |
| grammar rule for Cu+ |
| grammar rule for Rb+ |
| grammar rule for Ca++ |
| grammar rule for K+ |
| grammar rule for Me++ |
| grammar rule for Na+ |
| grammar rule for Cs+ |
| grammar rule for oxy- |
| grammar rule for oxy-- |
| grammar rule for Cl- |
| grammar rule for Br- |
| grammar rule for I- |
| grammar rule for F- |
| grammar rule for r |
| grammar rule for left_r |
| grammar rule for right_r |
| grammar rule for carbon_r |
| grammar rule for x |
| grammar rule for proton |
| grammar rule for carbanion |
| grammar rule for carbonium |
| grammar rule for e- |
| grammar rule for amino |
![]() |
| grammar rule for amine |
![]() |
| grammar rule for amine_plus |
![]() |
| grammar rule for amine_double |
![]() |
| grammar rule for amine_resonant |
![]() |
| grammar rule for nitro |
![]() |
| grammar rule for imine |
![]() |
| grammar rule for iminyl |
![]() |
| grammar rule for guanidyl |
![]() |
| grammar rule for guanidiyl |
![]() |
| grammar rule for carboxyl |
![]() |
| grammar rule for prot_carboxyl |
![]() |
| grammar rule for aldehyde |
![]() |
| grammar rule for keto |
![]() |
| grammar rule for prepeptidal_carboxyl |
![]() |
| grammar rule for ketyl |
![]() |
| grammar rule for ketenyl |
![]() |
| grammar rule for polyhydroxymethylene |
![]() |
| grammar rule for hydroxyl |
![]() |
| grammar rule for hydroxymethyl |
![]() |
| grammar rule for oxymethyl |
![]() |
| grammar rule for methyl |
![]() |
| grammar rule for methandiyl |
![]() |
| grammar rule for methantriyl |
![]() |
| grammar rule for methylene |
![]() |
| grammar rule for methine |
![]() |
| grammar rule for ethyl |
![]() |
| grammar rule for phosphoryl |
![]() |
| grammar rule for link_phosphoryl |
![]() |
| grammar rule for phosphodiyl |
![]() |
| grammar rule for diphosphopentaoxygen |
![]() |
| grammar rule for phosphate_class |
![]() |
| grammar rule for phosphate |
![]() |
| grammar rule for diphosphoryl |
![]() |
| grammar rule for diphosphate |
![]() |
| grammar rule for triphosphoryl |
![]() |
| grammar rule for triphosphate |
![]() |
| grammar rule for inorganic_phosphate |
![]() |
| grammar rule for carbonylphosphate |
![]() |
| grammar rule for methylphosphate |
![]() |
| grammar rule for anomeric |
![]() |
| grammar rule for empty |
| grammar rule for secondary_amide_nucleus |
![]() |
| grammar rule for carboxamide |
![]() |
| grammar rule for isopropyl |
![]() |
| grammar rule for propyl |
![]() |
| grammar rule for methoxyl |
![]() |
| grammar rule for ether_oxy |
| grammar rule for acetyl |
![]() |
| grammar rule for carboxylmethandiyl |
![]() |
| grammar rule for butyl |
![]() |
| grammar rule for sec_butyl |
![]() |
| grammar rule for tert_butyl |
![]() |
| grammar rule for ethoxyl |
![]() |
| grammar rule for acylate |
![]() |