THE ORIGIN OF THE GENETIC CODE* MASAHIRO
ISHIGAMI
and K E I N A G A N O
Jichi Medical School, Minamikawachi-machi, Tochigi-ken, Japan 329-04
(Received 29 April, 1975) Abstract. A new approach to the origin of the genetic code is proposed based on some regularities in the nucleotide distribution pattern of the code. The relative amounts of various amino acids in primitive proteins were possibly different from those in organisms living today. The primordial ratio was supposed to shift to the modern one guided by the action of primitive nucleotides. Each primitive t R N A had a discriminator site and, distinguished from it, an anticodon site. It is also postulated that primordially each amino acid could correspond to a wide variety of codons. During the course of the evolutionary change, a selective mechanism worked among the protobionts so that less frequent nucleotides became associated with more abundant amino acids in the primordial conditions, thus finally leading to the present codon catalogue.
The organization of informed polypeptide synthesis and its transmission using nucleic acids as templates was an important step from chemical evolution towards biochemical and biological evolution. Many facets of the origin of these genetic mechanisms still remain unsettled. An especially vexing question is what was the factor(s) determining the precise amino acid-codon correspondence as seen in the present coding list. Various models have been proposed to account for the origin of several regularities seen in the codon catalogue. Some of the proposals were reviewed and named by Woese (1969) as 'vocabulary expansion model' (Crick, 1968), and 'translation error-ambiguity model' (Woese, 1965). None of these models successfully explained the universality of the correspondence between codons and amino acids. 'Codon-amino acid pairing model' was the only one which tried to explain the universality (Lacey and Pruitt, 1969). Experimental results, however, did not support the idea implied there (Saxinger and Ponnamperuma, 1971, 1974; Weber and Fox, 1973). An interpretation of the universality is that the correspondence between amino acids and codons was decided and established forever, at least in principle, by chance (Monod, 1970). Another rather bold proposal is that an ancestral organism arrived from outer space and its unique coding was spread throughout the primordial earth (Crick and Orgel, 1973). The basic idea common to the two theories is that the coding pattern was determined, so to speak, at a single stroke. We feel that gradual elaboration during the course of evolution may be a realistic alternative. Here we will discuss the problem based on this evolutionary conception and try to outline a mechanism for the gradual shift to, and establishment of, the present coding catalogue. Sulston et aL (1968a, b) and Schneider-Bernloehr et aL (1968) reported that polynucleotides enhanced the rate of nonenzymatic polymerization of the complementary * Presented at The International Seminar: The Origin of Life held in Moscow, August 2-7, 1974.
Origins of Life 6 (1975) 551-560. All Rights Reserved Copyright 9 1975 by D. Reidel Publishing Company, Dordrecht-Holland
552
MASAHIRO ISHIGAMI AND KEI NAGANO
nucleotides by forming base pairings between them. For example, poly C enhanced the rate of polymerization of G, although it had no effect on polymerization of A, U or C. These results suggest that polynucleotides might be reproduced nonenzymatically in the prebiological soup. On the other hand, Paecht-Horowitz and Katchalsky (1973) found that amino acyl-AMP and amino acyl-ADP were formed noneynzmatically from amino acids and ATP on the surface of zeolite which worked as a catalyst. Furthermore, when montmorillonite coexisted in the system, polypeptides were formed from the activated amino acids. This process of polymerization of amino acids might have been enhanced by the presence of polynucleotides which had already formed as suggested above. If so, it may not be unreasonable to suppose that polypeptide synthesis became more and more dependent on the presence, and ultimately on the base composition, of the polynucleotides. One of the problems here was what determined the specific correspondence between a given amino acid and the corresponding set of codons as seen in present-day organisms. I. Data of Relative Amino Acid Content
To get a clue to this problem, the relative amounts of amino acids found in some simulation experiments of prebiological synthesis by Yoshino et al. (1971), Harada and Fox (1964), and Miller (1959) are summarized in Table I (a, b, c). Amino acids are arranged here according to the amount formed in the experiment of Yoshino et al. (1971). Two examples of amino acid contents of living organisms (d, e) as well as their codons are also included in the table. Only the first two letters of the codons are shown, because their third letters are often interchangeable and seem to be less significant in the present context. Amino acids containing sulfur were excluded from consideration since a sulfur source was not supplied in the simulation experiments listed. From inspection of the table, a tendency is suggested concerning the distribution of the codon letters: G and C appear more frequently in the upper half of the table, while in the lower half A and U are dominant. The pattern of the distribution is visualized in Figures 1 and 2, where the area of each circle represents the relative amount of the amino acid specified by the codon. Two examples of the amino acid composition of living bacteria (Sueoka, 1961) are chosen here for the low and high GC contents of their DNA, respectively. Sueoka (1961) pointed out a significant correlation between DNA base composition of bacteria and the amino acid composition of their bulk proteins: Ala, Arg, Gly and Pro were positively correlated with the GC content of DNA; while Ile, Lys, Asp plus Asn, Glu plus Gln, Tyr and Phe, negatively correlated. 2. Shift of Relative Amino Acid Content
The relative amount of each amino acid in the living organisms were compared with the relative yield in the prebiological synthesis experiment and the calculated ratios
553
THE ORIGIN OF THE GENETIC CODE
TABLE I Relative amounts ( ~ ) of amino acids found in some simulation experiments and those in living bacteria. Amino
acid
Amino acid composition (moles/moles, ~ ) Prebiological synthesis (a)
(e)
(d)/(a)
(e)/(a)
Position 1st 2nd 1st
11.2 5.5 4.0 15.4 12.7 9.1 4.5 1.7 8.5 5.9 4.7 8.3 4.0 1.4 2.4 0.8
9.6 4.3 4.2 10.1 12.3 9.7 6.1 2.0 8.7 5.6 3.6 7.6 6.8 3.1 3.8 2.5
0.19 0.56 0.48 t.97 3.02 2.60 2.25 1.06 7.08 5.90 5.22 11.86 6.67 14.0 24.0 -
0.17 0.44 0.51 1.29 2.93 2.77 3.05 1.25 7.25 5.60 4.00 10.86 11.33 31.00 38.0
G C U G G G A C U A C G A U U
100.0 100.1
100.0
(b)
(c)
(d)
Gly 57.9 Arg 9.8 Ser 8.3 Ala 7.8 Glx 4.2 Asx 3.5 Lys 2.0 His 1.6 Leu 1.2 TIn1.0 Pro 0.9 Val 0.7 lle 0.6 Tyr 0.1 Phe 0.1 Met, Cys -
51.6 4.7 18.5 6.1 7.0 2.9 1.5 2.1 1.8 1.4 1.1 1.2 -
77.7 21.1 0.8 0.4 -
Total
99.9
99.7
Genetic code
Living organisms
G G C C A A A A U C C U U A U
2nd
A A
G G
C A
A A
C
U
-
(a) Relative amounts of amino acids produced in the thermal synthesis experiment (Yoshino et al., 1971). CO, H2 and NH3 were allowed to react at 200-700~ for several hours in the presence of alumina, baked meteorites or some another metal compounds as catalysts. The data in this column is the average of 56 runs. (b) Relative amount of amino acids produced in the thermal synthesis experiment by Harada and Fox (1964). CH4, NH3 and H 2 0 were allowed to flow through silica powder in the temperature range of 950-1050 ~ Products were trapped in an aqueous solution of NH3. The data is the average of 3 runs. (c) Relative amounts of amino acids formed in the simulation experiment by Miller (1959). A mixture of gaseous CH4, NH3, H~O and H were subjected to spark and silent electric discharges for several days. The value is the average of 2 runs. (d) Relative molar content of amino acids from bulk protein of the bacteria, Micrococcus lysodeikticus (GC c o n t e n t = 7 2 ~ ) after Sueoka (1961). (e) Relative molar content of amino acids from bulk protein of another bacteria, Bacillus cereus (GC c o n t e n t = 3 5 ~ ) after Sueoka (1961).
a r e s h o w n i n c o l u m n s ( d ) / ( a ) a n d ( e ) / ( a ) in T a b l e I. I f t h e v a l u e s a r e s m a l l e r t h a n u n i t y , t h e a m i n o a c i d in q u e s t i o n m a y h a v e t e n d e d t o b e e x c l u d e d f r o m t h e p r i m i t i v e p o l y p e p t i d e s i n t h e c o u r s e o f p r i m o r d i a l e v o l u t i o n . I f t h e v a l u e is l a r g e r t h a n u n i t y , the corresponding
amino acid might have been incorporated
more often than was
expected from mere chance. Both the preferential incorporation
a n d e x c l u s i o n a r e p o s t u l a t e d h e r e as h a v i n g
arisen from a general tendency of the primitive proteins to incorporate more varied and/or functionally active amino acids, thus making themselves more diverse in their
554
MASAHIRO
ISHIGAMI
G
2nd ---->
AND
KEI NAGANO
C
A
U
i i
,
A
(Arg
er ). . . . . . . . . . . .
~
...........
~. . . . . . . . . . . . . ( ~
:Phe
/
U
.............
'
"\\
Ser /,,. . . . . . . . . . . . . ___ /
~ ...............
Tyr
Leu
Fig. 1. The relation between the amount of amino acids formed in the prebiological synthesis and their codons. Values are recalculated from the data of Yoshino e aL (1971). The area of each circle represents the relative amount of the amino acid specified by the codon. The amount of Glx and Asx are divided equally into Asp and Asn, and Glu and Gin, respectively.
functions as well as conformations. In short, proteins became more versatile in the course of molecular selection. In the primitive polynucleotides, there was supposedly no sharp distinction between D N A and R N A (Orgel, 1973). A m i n o acids were present and utilized as the activated form, i.e., aminoacyl A M P or aminoacyl A D P (Paecht-Horowitz and Katchalsky, 1973), as discussed above. They would combine and condense with each other nonenzymatically to become polypeptides aided by the presence o f primitive polynucleotides. A m o n g such polynucleotides, longer ones played the role o f primitive m R N A s . Shorter ones with m a n y intramolecular hydrogen bondings and with a terminal nu-
THE ORIGIN OF THE GENETIC CODE
2od-
G
C
o
555
A
,@
l
U
.........
1
l i
U
k Ser.../;":"............
........
. . . . . . . . . . . . . . . . ~./ ; . . . .
'~ 1/,
~[et Phei
Fig. 2. The relation between the amount of amino acids formed in the prebiological synthesis and their codons. Values are recalculated from the data of Harada and Fox (1964).
cleotide which was prone to accept the activated amino acid were used as primitive tRNAs (see Figure 3). Tile relative frequency of various amino acids incorporated into the primitive polypeptides would be identical with the relative frequency of amino acids in the surrounding soup if there were no selective or discriminating principles working. However, some principles must have been working since we presently have proteins whose amino acid compositions are different from those of the prebiological soup (Table I). What then were the principles resulting in such a shift? 3. Proposed Mechanism of the Shift In order to make a hypothesis about the shift, let us make two assumptions concerning
556
MASAHIRO ]SHIGAM1 AND KEI NAGANO
AMINO ACID
I
DISCR IMINATOR SITE
PRIMITIVE t-RNA
ANTICODON
PRIMITIVE m-RNA CODON
Fig. 3. A schematic model of primitive tRNA with a discriminator site and a triplet anticodon. Revised from Orgel (1968).
the relative nucleotide distribution in the prebiological systems as well as in the primitive RNAs (the plausibility of these assumptions is discussed later): (1) The order of relative frequency of nucleotides in the primitive m R N A was
A)>(C, G). (2) A primitive t R N A had both a discriminator region for an amino acid and, distinguished from it, an anticodon region (Figure 3). Thus, possibly several different kinds of anticodons were combined with a given discriminator. These assumptions, combined with the idea introduced above, that proteins tended toward greater versatility, make the following scheme a possible course in the shift of the coding pattern during the initial phase of biochemical evolution: Suppose a primitive tRNA which had a discriminator site for Phe and an anticodon XAA, for example. This t R N A would be very likely to find a codon UUJ(' on a primitive m R N A since U was one of the abundant base species (assumption 1). This primitive tRNA had the effect of increasing the frequency of Phe in the primitive polypeptides. On the other hand, a primitive tRNA with a discriminator site for Phe and an anticodon XCC (codon: GGX') was likely to abate the relative content of Phe. Judged from the versatility tendency indicated above, UUX' would be one of the more advantageous codons for Phe, an amino acid not abundant in the primitive environment.
THE ORIGIN OF THE GENETIC CODE
557
Contrariwise, combination of the anticodon X A A with a discriminator site for an abundant or 'less valuable' amino acid, e.g., Gly, was probably not profitable since it would result in more monotonous proteins. On the other hand, combination of anticodon X C C with Gly was preferential, since, C being a less frequent nucleotide, incorporation of Gly would be suppressed by being coded by X C C . Protobionts which had a set of tRNAs convenient for the versatilization of polypeptides might have had a selective advantage. Through these selections, the correspondence between amino acids and their codons were believed to have been gradually shifted and ultimately established as seen today. 4. B a s e R a t i o s in P r i m i t i v e R N A
We tried to estimate the relative amount of each nucleotide (PA,PC, Pv and Pc; the sum = 1.0) in the primitive mRNAs using the model proposed above. The probability of appearance of each codon was obtained by multiplying those for three nucleotides constituting the codon ( p A w = p A X p v XpG). Summation of these 'probabilities for triplets' for several codons corresponding to an amino acid (q~) was taken to indicate the tendency of this amino acid to be incorporated into primitive uninformed proteins. For glycine, e.g. : q G l y : PGGU q'- PGGC q- PGGA q- PGGG
= P c x PG x Pv + P a x P a x Pc +PG x PG x Pa + P c x PG x Pc.
(1) Amino acid composition of the supposed primitive proteins (columns a, b, and c in Table I) differs from that of the modern proteins (columns d or e in Table I). If we postulate the ratio of these two values (columns (d)/(a) or (e)/(a) in Table I; represented in the following as, e.g. for glycine, racy) to be parallel to the tendency of selective incorporation for various amino acids, we have: k x r i = qi,
(2)
where i varies from 1 to 15 for each of the amino acids to be considered (cf. Table I). The set of fifteen equations, which are obtained when i varies from 1 to 15 in Equation (2), does not give us an unequivocal set of solutions for Pa, Pc, Pc, and Pv, however, we can find the most plausible set of values (p~, p~, pb, and p~) based on the least squares method: (A, G, C, U)
qi' =
2
Iron
Pt' x Pm ' x pn, '
(3)
E (q~ - qi) 2 = minimum,
(4)
2 Pl = 1.
(5)
In Equation (3) the right term implies an appropriate combination of four p " s for A, G, C, and U according to the code catalogue (cf. Equations (1)).
558
MASAHIRO t
ISHIGAMI
AND
KEI NAGANO
!
P'a, P'~, Pc, and Pv were obtained by a computer program (Table II). These values (p') have the same physical implication as the initial p values, that is, they imply the relative probability of occurrence. In this context, the small negative value for Pc must be considered as deviation from a very small positive value. Similar calculated values based on the results by Harada and Fox (1964) are also shown in Table II. We find here a clear tendency: (pb, p~)> (pb, p;). TABLE II Relative amount of each nucleotide in primitive mRNA, estimated from the data in Table I Prebiological synthesis
Living organisms
p'G
p'c
Yoshino et al. (1971)
Bacillus cereus (GC=35 %) Micrococcus lysodeikticus (GC=72%)
0.078
(--0.039) 0.235
0.160
(--0.030) 0.190 0.679
0.160
0.112
0.334
0.393
0.190
0.208
0.302
0.300
Harada and Fox (1964)
Bacillus cereus (GC=35%) Micrococcus lysodeikticus (GC=72%)
p'a
p'r: 0.726
In the above discussion, the shift of amino acid composition from that of primitive proteins to modern proteins was supposed to be completed in a single stroke. The real course of protein evolution, however, must have been a very gradual one. In that ! ! ! case the difference between PA, Pvt and Pc, PG might be much smaller than is seen in Table II. The plausibility, or at least possibility, of assumption (1) is now briefly considered. Primitive RNAs would contain many other kinds of nucleotides than G, C, A, and U. Thymidine and some other minor components, e.g., inosinic acid, might not be infrequent in primitive mRNAs and tRNAs. However, adenine was always the predominant base in the simulating synthesis experiments (Ord and Kimball, 1961). Guanine was formed in far less quantity (Ponnamperuma, 1965). Although pyrimidines were synthesized in lesser amount than purines, the process of complemental duplication might increase their content in the polynucleotides up to the same, or comparable, level as that of the corresponding purines. In that case more U than C would be expected, since we have more A than G as the counterpart. So we can suppose (U, A)> > (G, C) in the primitive RNAs. 5. Discriminator Sites and Codons
Concerning the second assumption above, one relevant point will be discussed. In contemporary organisms, each t R N A incorporates its specific amino acid aided by an enzyme, amino acyl-tRNA synthetase. In our proposed scheme, however, primitive tRNAs are imagined to have had discriminator sites not for synthetases but for amino acids themselves, since specific enzymes had not yet been evolved.
THE ORIGIN OF TILE GENETIC CODE
559
In connection with this, Crothers et al. (1972) found that the fourth nucleotides from the 3'-end of tRNAs corresponding to the same amino acid were identical among different organisms and, furthermore, for several amino acids which were chemically similar in some respects, the tRNAs combining several amino acids which are akin had the same base species in common at their fourth nucleotide position from the Y-end (e.g., A for Ala, Ile, Leu and Val; G for Glu, Asp, Gin and Asn). The fourth position was supposed to be a sort of 'vestigial' molecular feature which had been used for direct physical interactions between tRNAs and amino acids. Crothers et al. (1972) supposed also that more than one nucleotide was involved in the primitive discrimination. As the evolution of synthetases as well as tRNAs provided a more precise and intricate recognition mechanism, restrictions might have decreased for nucleotide positions other than the fourth. Epstein (1966) and Crick (1968) pointed out the resemblance of codons for related amino acids. For instance, all codons with U in the second position code for hydrophobic amino acids. This regularity might also be interpreted as a 'vestige', suggesting the gradual establishment of the discriminator-codon relationship. Proponents of the 'vocabulary expansion model' (Crick, 1968), 'translation errorambiguity model' (Woese, 1965) and 'lethal mutation model' (Sonneborn, 1965) have noticed this regularity. Our model also interprets it as a remaining feature of the broad assortment of amino acids by inchoate discriminators. Studies of physicochemical interactions between amino acids and polynucleotides (Saxinger and Ponnamperuma, 1971, 1974; Raszka and Mandel, 1972; Weber and Fox, 1973) are expected to develop further and to offer useful information on this point. Finally, the problem of codon universality will be briefly considered in our context. The course of establishment of amino acid-codon correspondence as considered here was based on the rule of probability. If it was the case, it would seem quite likely that several different sets of coding patterns would have been established and would have survived among different groups of living organisms, in sharp contrast to the real situation in the present living world. A special set of codons might have been selected out from a plural coding list during the course of evolution. However, the nature of the selective force is as yet uncertain. In summary, we have proposed a mechanism for the establishment of a genetic code based on a comparison between the relative amounts of amino acids formed in the prebiological synthesis and the nucleotide composition of the present, as well as the supposedly primordial, nucleic acids. The modern codon catalogue was thought to be determined through selection among primitive tRNAs. The correspondence of a group of chemically related amino acids to a group of related codons was also explained in this model by the evolution of discriminator sites.
Acknowledgements We wish to thank Dr O. Aono for his help with estimation of relative amount of nucleotides and to Prof. H. Ishikura for his suggestions concerning tRNA. We also
560
MASAHIRO ISHIGAMI AND KEI NAGANO
wish to t h a n k the m e m b e r s of our l a b o r a t o r y for their interest in o u r p r o b l e m a n d for useful discussions.
References Crick, F. H. C.: 1968, J. Mol. Biol. 38, 367. Crick, F. H. C. and Orgel, L. E. : 1973, Icarus 19, 341. Crothers, D. M., Seno, T , and S011, D. G." 1972, Proc. Nat. Acad. Sci. 69, 3063. Epstein, C. J.: 1966, Nature 210, 25. Gottikh, B. P., Krayevsky, A. A., Tarussova, N. B., Purygin, P. P., and Tsilevich, T. L." 1970, Tetrahedron 26, 4419. Harada, K. and Fox, S. W.: 1964, Nature 201, 335. Lacey, J. C. and Pruitt, K. M.: 1969, Nature 233, 799. Miller, S. L. : 1959, in A. I. Oparin (ed.), The Origin of Life on Earth, I.U.B. Symp. Series, Pergamon Press, New York, pp. 123. Monod, J. : 1970, Le hasard et le ndcessite, Translated into Japanese by Watanabe and Murakami. Orgel, L. E.: 1968, J. Mol. BioL 38, 381. Orgel, L. E.: 1973, The Origin of Life-Molecules and Natural Selection, John Wiley Sons, pp. 156. Or6, J. and Kimball, A. P.: 1961, Arch. Biochem. Biophys. 94, 217. Paecht-Horowitz, M. and Katchalsky, A." 1973, J. Mol. EvoL 2, 91. Ponnamperuma, C. : 1965, in S. W. Fox (ed.), The Origin of Prebiological Systems and of their Molecular Matrices, Academic Press, New York, pp. 221. Raszka, M. and Mandel, M.: 1972, J. Mol. EvoL 2, 38. Saxinger, C. and Ponnamperuma, C.: 1971, J. Mol. EvoL 1, 63. Saxinger, C. and Ponnamperuma, C. : 1974, Origin of Life 5, 189. Schneider-Bernloehr, H., Lohrmann, R., Sulston, J., Weiman, B. J., Orgel, L. E., and Miles, H. T.: 1968, J. MoL BioL 37, 151. Sonneborn, T. M. : 1965, in V. Bryson and H. Vogel (eds.), Evolving Genes and Proteirs, Academic Press Inc., New York, pp. 377. Sueoka, N.: 1961, Proc. Nat. Acad. Sci. 47, 1141. Sulston, J., Lohrmann, R., Orgel, J. E., and Miles, H. T." 1968a, Proc. Nat. Acad. Sci. 59, 726. Sulston, 3".,Lohrmann, R., Orgel, J. E., and Miles, H. T. : 1968b, Proc. Nat. Acad. Sci. 60, 409. Weber, A. L. and Fox, S. W.: 1973, Biochim. Biophys. Acta 319, 174. Woese, C.: 1965, Proc. Nat. Acad. Sci. 54, 1546. Woese, C.: 1969, J. MoL BioL 43, 235. Yoshino, D., Hayatsu, R., and Anders, E.'. 1971, Geochim. Cosmochim. Acta 35, 927.