J: Molec. E v o l u t i o n 1, 8 4 - 9 6 (t971) © b y S p r i n g e r - V e r l a g 1971
Rate of Change of Concomitantly Variable Codons W. M. FITCH D e p a r t m e n t of Physiological C h e m i s t r y , U n i v e r s i t y of W i s c o n s i n Madison, W i s c o n s i n 53706 R e c e i v e d M a y 3, 1971
Summary. I t was p r e v i o u s l y s h o w n t h a t a b o u t I 0 % of t h e codons ill c y t o c h r o m e c are v a r i a b l e ill a n y one m a m m a l i a n species a n d a n y o n e p o i n t i n t i m e a n d t h a t t h e p o s i t i o n s of t h e s e concomitantly variable codons (covarions) m u s t c h a n g e as m u t a t i o n s are fixed. V a r i a b i l i t y implies t h e e x i s t e n c e of a n a l t e r n a t i v e , n o n - d e l e t e r i o u s a m i n o acid t h a t differs b y o n l y one n u c l e o t i d e r e p l a c e m e n t f r o m t h e o n e p r e s e n t l y encoded. T h i s work, ill a d d i t i o n to o b t a i n i n g a n i n d e p e n d e n t e s t i m a t e of t h e n u m b e r of covarions, i n v e s t i g a t e s t h e q u e s t i o n : W h a t is t h e l i k e l i h o o d t h a t a c y t o c h r o m e c c o v a r i o n will lose its v a r i a b l e s t a t u s as a r e s u l t of t h e f i x a t i o n of a m u t a t i o n i n a n o t h e r c o v a r i o n ? T h e r e s u l t s s h o w : 1, t h e n u m b e r of c o v a r i o n s is ill t h e r a n g e of 4 t o t 0 in a g r e e m e n t w i t h t h e earlier r e s u l t of 10 b u t s u g g e s t i n g t h e v a r i a b i l i t y m a y b e e v e n m o r e c i r c u m s c r i b e d t h a n originally t h o u g h t ; a n d 2, t h e likelihood of a c o v a r i o n loosing its v a r i a b l e s t a t u s as a r e s u l t of f i x a t i o n s elsewhere in t h e gene m a y b e g r e a t e r t h a n 0.75, s u g g e s t i n g a h i g h t u r n o v e r r a t e a m o n g t h e covarions. Key-Words: E v o l u t i o n a r y R a t e s - - M o l e c u l a r G e n e t i c s - - C y t o c h r o m e c - - M u t a tions -- Codon Variability.
It is abundantly clear in the gene for cytochrome c that selective forces narrowly restrict the number of codons that are capable of surviving and fixing a mutation that alters the amino acid encoded. It has been shown that the number of covarions in cytochrome c in mammalian species (i.e. concomitantly variable codons that can fix mutations at any one point in time and in a given particular species) is only about ten (Fitch and Markowitz, 1970). Since more than ten codons have fixed mutations, it follows that a different (at least partially) set of codons must be concomitantly variable in different species. This too has been demonstrated b y showing that the variable codons in the fungi are largely independent of the variable codons in insects and fish (Fitch, 1971 a). But if the concomitantly variable codons (covarions) in one species m a y not be identical to those in another, it follows that the nature of the new amino acid resulting from a fixation must affect which codons m a y be capable of fixing the next mutation, a biologically reasonable conclusion. Moreover, if each mutation fixed m a y alter the codons that belong to the group of covarions, we can immediately understand w h y there are so few covarions in any one species' gene for cytochrome c and yet observe that more than 70 codons have fixed observ-
R a t e of Change of C o n c o m i t a n t l y Variable Codons
30/5
Neurospora--------__ Candida 23/4 .._.______._ 16/2 Saccharomyces
F
t
Moth ~
y
85
15/2
~
9/1
~,/~19/ 3
Tuna
Turtte ~ 5,,o~"---,,~t, Z~,.. ~..~4/I Duck x~/~kw,,'(m ~
Pigeon \ " ~
Penguin " Kangaroo -,x4/0
70
x,)/" / 4,'2
Rabbit "~/2% Dog "x~Ox~\210/ -Pig ~'~" 'k,/ Horse ~y~',~"/
Donkey /
/6/1
Minimum number of mutations(weighted average) Fig. 1. Phylogeny from cytochrome #. T h e p h y l o g e n y is i d e n t i c a l to t h a t p r e v i o u s l y d e t e r m i n e d for the t w e n t y species s h o w n utilizing o n l y the a m i n o acid s e q u e n c e d a t a for its c o n s t r u c t i o n (Fitch and Margoliash, t968). T h e h e i g h t of a n y g i v e n n o d e is the w e i g h t e d average of m u t a t i o n s f i x e d in the d e s c e n t from t h a t node. For s e g m e n t s for w h i c h t w o or more m u t a t i o n s were fixed, the lines c o n t a i n v a l u e s ill t h e form m/I) w h e r e m is the t o t a l n u m b e r of m u t a t i o n s f i x e d and D is the n u m b e r of double m u t a t i o n s , i.e. the n u m b e r of codons t h a t contain t w o of the f i x a t i o n s
able mutations upon examining the cytochromes c in species ranging from fungi to man. This raises, however, an interesting question: What is the rate of turnover of codons in the covarions ? Expressed another way, what is the likelihood that a fixation in one covarion will cause other covarions to loose their ability to fix the next mutation ? To answer this question, we first define a double mutation as two observable fixations occurring in a single codon between two successive branch points (nodes) on a phylogenetic tree (Fitch and Margoliash, t 968). For example, residue 19, which is next to the histidine that binds to the heine iron in all known cytochromes c, is threonine (encoded ACX) and occurs in all known cytochromes c except in Neurospora which has a glycine (encoded GGX). Given the phylogeny of these species, both the first and second nucleotide must have mutated since the most recent ancestor of Neurospora that is depicted on that tree. Indeed, it is one of the 5 double mutations on the branch labelled 30/5 in Fig. 1. The "must" depends upon assuming that no mutations will be postulated other than those that are forced upon us in order to account for any observed divergence. Excluding the possibility that the change A C - + G G occurred as a single mutational event (which is
86
W.M. Fitch:
reasonable in view of the non-existence of such double changes in in vitro studies of point mutations), it then becomes clear that the ability of a codon to fix the second mutation must have depended upon its being among the covarions after fixation of the first mutation. Now tile likelihood that a second mutation will be fixed in the same codon as an earlier mutation depends upon two characteristics: (a) the number of covarions, and (b) the persistence of a codon among the covarions. Obviously, the fewer the number of covarions, the greater the probability that the next fixation will occur in any given covarion and, in particular, in a previously mutated covarion. On the other hand, the probability that a covarion remains in the group drops as mutations are fixed in other covarions. The number of covarions, c, and the persistence of variability, v, are the two independent variables that are used to derive the equations given in the appendix. These equations enable one to estimate, depending on the values of c and v assumed, how many double mutations would be expected for any particular number of mutations Table. Frequencies of double fixations Observed m
Expected D
n
D
c=4.5
c=7
c=10.0
0.147 0.403 0.533
o.091 0.418 0.510 0.611 0.720 0.838 1 .I00 1.244 2.085 2.278 2.478 2.901 3.837 5.749
0.063 0.439 0.511 0.594 0.690 0.798 1.050 1.193 2.076 2.284 2.503 2.972 4.028 6.232
0.12 3.34
0.23 3.62
2 3 4
6 2 4
0.167 o 1.25
5
2
1.0
0.667
6 7 9 10 15 16 17 19 23 30
1 t t 2 1 1 2 1 1 1
1.0 1.0 1.0 0 2.0 2.0 2.5 3.0 4.o 5.0
0.804 0.946 t .243 1.397 2.231 2.409 2.592 2.969 3.769 5.314 v = 0.04 Z2 = 3.16
The values of m are t h e n u m b e r of n m t a t i o n s o b s e r v e d to h a v e b e e n fixed in a g i v e n i n t e r n o d a l i n t e r v a l a n d n is t h e n u m b e r of i n t e r n o d a l i n t e r v a l s c o n t a i n i n g m fixations. T h e c o l u m n m a r k e d "observed D " c o n t a i n s t h e t o t a l n u m b e r of codons o b s e r v e d t o h a v e fixed t w o m u t a t i o n s in an i n t e r v a l of size m (see Fig. 1) d i v i d e d b y t h e n u m b e r of such i n t e r v a l s to give t h e o b s e r v e d f r e q u e n c y of d o u b l e fixations p e r i n t e r v a l of size m. T h e r e w e r e no triple f i x a t i o n s in a single interval. T h e o b s e r v e d f r e q u e n c y is to b e c o m p a r e d t o t h e e x p e c t e d n u m b e r of double f i x a t i o n s p e r i n t e r v a l (D) as calc u l a t e d b y t h e p r o c e d u r e given ill t h e m e t h o d s section. E x p e c t e d values of D are c a l c u l a t e d for t h r e e values of t h e n u m b e r of covarions c, a n d are g i v e n for t h a t p e r s i s t e n c e of v a r i a b i l i t y v t h a t o p t i m i z e s t h e fit to t h e o b s e r v e d d a t a . F o r t 2 degrees of f r e e d o m , Z ~ 3.57 implies t h a t t h e o b s e r v e d values would, b y chance, differ f r o m t h e e x p e c t e d values t o t h e e x t e n t s h o w n ~ 99 % of t h e time.
Rate of Change of Concomitantly Variable Codons
87
m occurring along some observed interval. These expected values m a y then be compared to values actually observed and the goodness of fit determined by the chi-square statistic. Fig. t shows the 20 species whose cytochromes c were used for this analysis. The topology, the mutations and the double mutations are identical to those given in Fig. t of Fitch and Margoliash (t968). Legs containing only one or no mutations can not have double mutations and therefore no numbers are shown for those legs. The remaining legs have two numbers: the first is the total number of mutations observed to have been fixed in that portion of the phylogeny. The second number (following the slash) is the number of double mutations observed in that same portion. In the Table we see the distribution of double mutations found as a function of the number of mutations observed. It is our purpose to find that number of covarions, c, and persistence of variability, v, that would most closely approximate that distribution. A value of v near I would mean t h a t a covarion's ability to fix a mutation is largely independent of fixations in the other covarions while a value of v<<(l/c) would mean that a codon's variability is very unlikely to survive more than one fixation in other covarions. The value of c has been estimated as t 0 by an independent method (Fitch and Markowitz, t970) and the closeness of the agreement between 10 and the number found here will be one indication of the adequacies of the procedures.
Results The formal procedures are given in the appendix. In this section are presented results intended to provide a feeling for how the variables interact as well as results on cytochrome c in particular. In asking how m a n y double mutations would be expected after 30 mutations have been fixed, one recognizes that the answer depends upon the number of covarions into which they m a y be fixed and how long a covarion lasts. Fig. 2 shows the curve of expectation for five covarions as a function of the persistence of variability. If the persistence is 1, then we expect the same five covarions to be always present and by the time 30 mutations were fixed, we expect that all five covarions would have fixed at least two mutations and hence the left-most value is essentially five. As the persistence of variability drops, the curve rises to a maximum, then falls. This is simply a reflection of the fact that if the persistence value is too large, the codons remain variable too long after they have fixed a second mutation, whereas if the value is too small, they do not remain variable long enough to get a second mutation. In practice we would choose a persistence of variability that would yield the number of double mutations actually observed in the line of descent containing 30 fixations. However, there are other line segments containing
88
W.M. Fitch: ISl
~13
E c
-o7 ~5 t.kl
I
T
I
[
I
I
[
I
f
I
I
1
1.0.7.5 .3 .2 .1 .07 .03.02 .01 .005 .002 .0~01 Persistence of varictbitity
,9~--
.0001
Fig. 2. N u m b e r of d o u b l e m u t a t i o n s as a f u n c t i o n of t h e p e r s i s t e n c e of v a r i a b i l i t y . T h e c u r v e gives t h e e x p e c t e d n u m b e r of d o u b l e m u t a t i o n s (D) for v a r i o u s v a l u e s of t h e p e r s i s t e n c e of v a r i a b i l i t y (v) w h e n 30 m u t a t i o n s h a v e b e e n fixed a n d t h e n u m b e r of c o v a r i o n s is five
L~
eJ
"o" =1.(2
"6 /x
Z
0
m
X
5
I
I
1~ 15 2t0 2~5 Number of fixations
3r0
Fig. 3. D o u b l e m u t a t i o n s e x p e c t e d for five c o v a r i o n s as a f u n c t i o n of t h e n u m b e r of fixations. E a c h line shows t h e e x p e c t e d n u m b e r of d o u b l e m u t a t i o n s (as a f u n c t i o n of t o t a l fixations) for a d i f f e r e n t v a l u e of v. T h e p o i n t s are t h e a c t u a l d a t a (see t h e T a b l e ) t h a t m u s t b e fit
R a t e of Change of C o n c o m i t a n t l y Variable Codons
89
other numbers of fixations and we are constrained to choose the same persistence of variability throughout. Fig. 3 shows how the expected number of double mutations varies as a function of the total number of fixations. Each line is for a different persistence with the number of covarions being held at five. The points on the graph show the observational data to be fitted. Some curves fit better than others and chi-square is a measure of the goodness of fit with the lowest value of chi-square being the best fit. The goodness of fit as a function of the persistence of variability is presented, for five covarions, in Fig. 4. The best fit occurs at v-----0.06 which is one of the lines in Fig. 3. But there is nothing to say that five covarions is the correct number of covarions and similar curves m a y be drawn for other numbers of covarions and Fig. 4 also shows such curves for c = 2 and t0. The best fit will be t h a t number of covarions that gives the lowest minimum in such a plot. Portions of the curves in the region of their minimum for c-----3-+t0 are shown in Fig. 5 and are quite regular in shape. A plot of the minimum as a function of the number of covarions is given in Fig. 6. This minimum occurs for a value of c = 4.5. For c = 4.5, the minimum comes at a persistence of variab i l i t y = 0.04 which is plotted as the * in Fig. 5. That the optimum value of c is not a whole number is all right in view of our recognition t h a t it is only an average value and must be reasonably considered to vary somewhat from species to species and from time to time.
3o
\
25 ~ / x ' X
10
\ |r
[
×
\/ 11
I
I
T
I
I
I
r
I
P
1.0,7.5 .3 .2 .I.07 .03.02..01.005 .002 .001.0005
.0001
"~
Persistence of variability Fig. 4. Goodness of fit as a function of t h e persistence of v a r i a b i l i t y for several n u m b e r s of covarions. The goodness of fit, m e a s u r e d b y the chi-square test, is b e t w e e n t h e actual values of D observed and those e x p e c t e d u n d e r the various values of c and v assumed. F o r all cases, there are i 2 degrees of freedom. Values calculated only for t h e points shown. Curves were fit b y eye. N u m b e r of covarions are 2 ( x ) , 5 (0), and t0 (a)
90
W.M. Fitch: t09~
7
6
3
5
/
6
G~
e(D
3
4
2
15 .i .o% .o6.os.o4 .o3 Persistence of vGriabitity
.o2 .o,5
.o~3
Fig. 5. G o o d n e s s of fit as a f u n c t i o n of t h e p e r s i s t e n c e of v a r i a b i l i t y for several n u m b e r s of covarions. S a m e as Fig. 4 e x c e p t for r e s t r i c t i o n of t h e d a t a to t h e p o r t i o n of each curve ill t h e region of its m i n i m u m . The n u m b e r of covarions for each curve is s h o w n a t tile t o p of t h e graph. F o r c = 3, t h e m i n i m u m occurs a t v = 0 for w h i c h %2= 3.47
3.5
dJ
b 3.4
g i
T: (D E ~3.3 5r
3.2
3.] Number of covarions
Fig. 6. Chi-square for b e s t fit for various n u m b e r s of c o v a r i o n s
R a t e of Change of C o n c o m i t a n t l y Variable Codons
91
10
cm8
. ro-
~>7 ~6
g5 7
4
.05
,10 .15 Persistence of variability
.20
.25
Fig. 7. Persistence of v a r i a b i l i t y for best fit for various n u m b e r s of covarions
Finally, one m a y ask about the relationship between the number of covarions and the persistence of variability that minimizes chi-square for that value of c. This is shown in Fig. 7. There are two straight lines intersecting at a value of c = 4.5.
Discussion Previous work (Fitch and Markowitz, 1970) suggested t h a t the number of covarions for mammalian cytochromes c was about 10. This study suggests a value of 4.5. Since the methods are different, they verify the conclusion t h a t selective forces greatly restrict the permissible variability in cytochrome c. The actual discrepancy between the two values turns out not to be significant. Note that the minimum chi-square for c = 1 0 is only 3.62. A chi-square value of 3.57 means that if our assumptions are true (including c = 10) then we would expect real data to give by chance a worse fit t h a n these do 99 % of the time. We are suffering from the fact that the probability surface is more of a valley than a well. The steep rise of the curves in figure five shows, in effect, the cross sections of the valley, the minima show the gentle rise as one travels up the valley. Given the number of covarions, we can reasonably estimate the persistence of variability and vice versa, but these data do not readily permit the simultaneous estimate of both with great accuracy. The generally low values for the persistence of variability (optimum values of v are < 0.25 for all values of c checked) are particularly interesting. Most importantly, it means that the variable positions are largely interdependent and that a mutation at any one of them effects most of the others to a considerable degree. This reemphasizes the very limited tolerance that
92
W.M. Fitch:
cytochrome c has for variation in its structure. On the other hand, it should also be recognized that variability m a y be lost as a result of fixations in genes whose products interact with cytochrome c. We know nothing of these rates, b u t the longer the interval between fixations in the cytochrome c gene (and it is longer than most), the greater the likelihood that the loss of variability is the result of fixations elsewhere. This does not effect the computations but it must temper their interpretation. The problem of explaining a double mutation in the terms of a strict selectionist theory in which every mutation fixed confers an advantage is interesting in terms of these results. Consider the replacement of proline (CCX) b y valine (GUX) which occurs at position 44 of the rabbit. The intermediate possibilities are alanine (GCX) or leucine (CUX). The strict selectionist asserts that either alanine or leucine must have been superior to proline and replaced it and that subsequently valine was superior to the replacing amino acid and in turn replaced it. The valine superiority assures that the once mutated codon is among the covarions. On the other hand, this analysis suggests that covarions do not persist long so that in effect the two mutations must have generally followed each other successively or nearly so. But if that is true, then one of the intervening amino acids must have provided an intermediate degree of evolutionary fitness. This leaves us with an apparent contradiction. On the one hand, the genetic code seems to be fashioned so that single nucleotide replacements minimize deleterious changes and maximize the possibilities of advantageous changes. On the other hand, the optimum amino acid is very frequently two nucleotide replacements away and one of the two intervening amino acids has an intermediate fitness. In the present data there are 223 mutations of which 64 (32 doubles) are involved in this particular form of change. Thus 29 % of the mutations fixed were to get to a more fit amino acid two nucleotide replacements away from that originally encoded. Does this really square with the idea that cytochrome c is highly evolved and tolerates little change ? And what we see can only be those potential improvements for which an intervening amino acid has intermediate fitness. Surely there must be others for which the intervening amino acids are deleterious in which case " y o u just can't get there from here". But if these latter cases that we can't observe number anywhere near as many as those that we have been able to observe in the former, then the fraction of amino acid substitutions that confer an advantage but require two nucleotide replacements becomes unreasonable. From this study I would conclude: I, the number of covarions in the gene for cytochrome c averages between 4 and t 0; II, the turnover averages 75 % or more among the covarions not fixing the last mutation; and I I I the frequency with which codons incorporate replacements in two of their nucleotide positions in relatively close succession argues more for the flex-
R a t e of C h a n g e of C o n c o m i t a n t l y V a r i a b l e Codons
93
ibility permitted at that site than for successive selective improvements. This last conclusion is consistent with the observation that the genes for alpha hemoglobin (Fitch, 197tb) and cytochrome c and fibrinopeptide A (Fitch and Markowitz, 1970) are evolving at the same rate per covarion. Acknowledgements. T h i s p r o j e c t r e c e i v e d s u p p o r t f r o m N a t i o n a l Science F o u n d a t i o n g r a n t GB-7486. T h e U n i v e r s i t y of W i s c o n s i n C o m p u t i n g Center, w h o s e facilities were used, also receives s u p p o r t f r o m N S F a n d o t h e r g o v e r n m e n t agencies.
Appendix I t w a s p r e v i o u s l y s h o w n t h a t o n l y a b o u t t e n of t h e codons i n t h e gene for m a m m a l i a n c y t o c h r o m e s c could a c c e p t (fix) a m u t a t i o n a t a n y o n e p o i n t in t i m e ( F i t c h a n d M a r k o w i t z , t970). T h e s e were called t h e concomitantly variable codons (covarions). B u t t h i s n u m b e r is c o n s i d e r a b l y less t h a n t h e 70 or m o r e codons t h a t are knowr~ t o h a v e h a d f i x a t i o n s in t h e m . T h e s e t w o f a c t s are n o t i n c o n s i s t e n t u n d e r t h e h y p o t h e s i s t h a t , as m u t a t i o n s were fixed, t h e codons t h a t could fix s u b s e q u e n t m u t a t i o n s c h a n g e d so t h a t w h i l e few codons in a n y one species could c h a n g e a t a n y o n e t i m e , m a n y could c h a n g e d u r i n g t h e e v o l u t i o n of m a n y species. I t is t h e p u r p o s e of t h i s a p p e n d i x t o e x a m i n e t h e r a t e a t w h i c h p r e v i o u s l y c o n c o m i t a n t l y v a r i a b l e c o d o n s are r e p l a c e d b y n e w ones a n d t o e s t i m a t e t h e n u m b e r of s u c h c o v a r i o n s b y a p r o c e d u r e q u i t e d i f f e r e n t f r o m t h a t g i v e n originally ( F i t c h a n d M a r k o w i t z , 1970). T h i s is d o n e b y e x a m i n i n g t h e r a t e a t w h i c h second m u t a t i o n s are fixed i n codons t h a t h a v e a l r e a d y fixed a m u t a t i o n as a f u n c t i o n of t h e t o t a l n u m b e r of m u t a t i o n s fixed a n d s h o w i n g t h a t m u l t i p l e f i x a t i o n s in a single c o d o n h a v e t h e p r o p e r t y of a P o i s s o n f u n c t i o n s h o w i n g c o n t a g i o n .
Method T h e basic q u e s t i o n is, a f t e r m o b s e r v a b l e m u t a t i o n s h a v e b e e n fixed in a g i v e n gene, h o w m a n y codons in t h a t gene will h a v e f i x e d t w o or m o r e m u t a t i o n s ? T o calculate t h e n u m b e r of d i f f e r e n t codons fixing t w o or m o r e m u t a t i o n s in t h e course of a t o t a l of m m u t a t i o n s b e i n g fixed we n e e d to k n o w : (A), h o w m a n y c o v a r i o n s c t h e r e are in w h i c h t h e m u t a t i o n s c a n b e f i x e d ; (B), t h e p r o b a b i l i t y v t h a t , following a f i x a t i o n in a n o t h e r codon, a v a r i a b l e c o d o n will r e m a i n v a r i a b l e , t h a t is, t h a t i t will r e t a i n its a b i l i t y t o fix a m u t a t i o n ; (C) t h e e x p e c t e d n u m b e r e i of c o v a r i o n s t h a t are still v a r i a b l e a n d h a v e f i x e d a t l e a s t one m u t a t i o n a f t e r i f i x a t i o n s ; (D), t h e p r o b a b i l i t y fi t h a t t h e m u t a t i o n following t h e i th is fixed in a n y g i v e n one of t h e e i p r e v i o u s l y m u t a t e d c o v a r i o n s [this, in c o n j u n c t i o n w i t h (B), p e r m i t s us to d e t e r m i n e tile p r o b a b i l i t y ri+ 1 t h a t t h e m u t a t i o n following t h e i th is fixed i n a p r e v i o u s l y u n m u t a t e d c o d o n ] ; (E), t h e p r o b a b i l i t y Pk t h a t , following a c o d o n ' s f i r s t fixation, i t h a s n o t fixed a s e c o n d in k s u b s e q u e n t f i x a t i o n s ; a n d finally (F), t h e p r o b a b i l i t y Sih t h a t a t l e a s t o n e of h subs e q u e n t f i x a t i o n s occurs in t h e s a m e c o d o n as fixed t h e i th m u t a t i o n . O u r p r o b l e m is t o c a l c u l a t e t h e e x p e c t e d n u m b e r of d o u b l y m u t a t e d c o d o n s D t h a t will h a v e fixed t w o (or p o s s i b l y t h r e e ) o b s e r v a b l e m u t a t i o n s a f t e r a t o t a l of in f i x a t i o n s in t h e gene. I f we k n o w t h e p r o b a b i l i t y r i t h a t t h e i th f i x a t i o n is in a p r e v i o u s l y u n m u t a t e d c o d o n a n d t h e p r o b a b i l i t y sij t h a t a t l e a s t o n e of t h e e n s u i n g m - - i = j o b s e r v a b l e f i x a t i o n s will b e i n t h a t s a m e codon, t h e p r o d u c t risij is t h e e x p e c t a t i o n t h a t t h e ith f i x a t i o n is t h e first of t w o (or t h r e e ) o b s e r v a b l e f i x a t i o n s i n t h a t codon. T h e e x p e c t e d n u m b e r of o b s e r v a b l e d o u b l e f i x a t i o n s is s i m p l y t h e s u m of s u c h p r o d u c t s for all v a l u e s of i f r o m t t o m -- t. T h i s m a y t h e n b e c o m p a r e d w i t h o b s e r v e d f r e q u e n c i e s of m u l t i p l e f i x a t i o n s t o d e t e r m i n e t h e a d e q u a c y of t h i s model. T h e p r o b l e m n o w is to d e t e r m i n e r a n d s. T h r o u g h o u t t h e e n s u i n g discussion, t h e o n l y m u t a t i o n s b e i n g c o n s i d e r e d are t h o s e t h a t are fixed a n d are o b s e r v a b l e . W e a s s u m e t h a t t h e p a r t i c u l a r c o d o n s classified as c o n c o m i t a n t l y v a r i a b l e are h i g h l y d e p e n d e n t u p o n t h e n a t u r e of t h e p a r t i c u l a r a m i n o acids in v a r i o u s p a r t s of t h e p r o t e i n so t h a t we w o u l d n o t n e c e s s a r i l y e x p e c t t h e s a m e
94
Vq. M. F i t c h :
codons t o be c o n c o m i t a n t l y v a r i a b l e in t h e gene for a fungal c y t o c h r o m e c as in t h e gene for t h e dog c y t o c h r o m e c. N e i t h e r d o we e x p e c t t h a t t h e n u m b e r of covarions c is c o n s t a n t f r o m species to species n o r f r o m t i m e to t i m e , b u t we do a s s u m e t h a t t r e a t i n g as a c o n s t a n t t h e a v e r a g e value t a k e n b y c in m a n y species o v e r long periods of t i m e will give a useful a p p r o x i m a t i o n . As e a c h m u t a t i o n is f i x e d ill t h e p o p u l a t i o n , t h e r e m u s t b e a p r o b a b i l i t y (0 ~ v ~ t) t h a t a n y given c o d o n will r e m a i n a m o n g t h e c o n c o m i t a n t l y v a r i a b l e group i.e., v is t h e p e r s i s t a n c e of variability. All o t h e r v a r i a b l e s in this discussion are d e p e n d e n t solely u p o n c a n d v. T h e s e are our g i v e n p a r a m e t e r s . W e shall a s s u m e t h a t if a c o d o n is r e m o v e d f r o m t h e g r o u p of covarions, it is r e p l a c e d b y a n o t h e r c o d o n t h a t h a s n o t r e c e n t l y 1 b e e n a m o n g t h e c o n c o m i t a n t l y v a r i a b l e codons. W e a s s u m e t h a t v applies to e a c h c o d o n except t h e c o d o n fixing t h e m o s t r e c e n t m u t a t i o n . T h e c o d o n fixing t h e last m u t a t i o n we a s s u m e r e m a i n s variable. S u c h a n a s s u m p t i o n m u s t be c o r r e c t w h e n t h e last f i x a t i o n has no s i g n i f i c a n t selective a d v a n t a g e . I t n e e d not, b u t could also be t r u e if t h e last f i x a t i o n w a s s e l e c t e d for. T h e p r o b a b i l i t y t h a t a n y one of t h e covarions will fix t h e first o b s e r v a b l e m u t a t i o n is 1/c. H o w e v e r , we are only i n t e r e s t e d in o b s e r v a b l e m u t a t i o n s a n d our ability t o o b s e r v e s u b s e q u e n t fixations d e p e n d s u p o n p r i o r fixations. If a c o d o n h a s n o t fixed a m u t a t i o n previously, all m u t a t i o n s w h i c h c h a n g e t h e a m i n o acid e n c o d e d are o b s e r v a b l e . I f a m u t a t i o n has a l r e a d y b e e n f i x e d in t h a t codon, t h e n t h e n e x t one will be o b s e r v able o n l y if a n u c l e o t i d e is a l t e r e d in a p o s i t i o n d i f f e r e n t f r o m t h a t c h a n g e d in t h e first fixation. This is b e c a u s e successive n u c l e o t i d e fixations s u c h as A - + C - + G bet w e e n a n a n c e s t o r a n d a d e s c e n d a n t a p p e a r to t h e o u t s i d e o b s e r v e r as A - + G t h e r e b e i n g no record of t h e i n t e r m e d i a t e C. T h u s t h e p r o b a b i l i t y t h a t t h e n e x t observable f i x a t i o n will occur in a once m u t a t e d c o d o n is o n l y 0.604 t h a t of a c o d o n w h i c h h a s n o t p r e v i o u s l y f i x e d a m u t a t i o n 2. If fi is t h e p r o b a b i l i t y t h a t t h e n e x t m u t a t i o n is f i x e d in a c o v a r i o n t h a t has a l r e a d y f i x e d a m u t a t i o n a n d f} is t h e p r o b a b i l i t y t h a t it is f i x e d in a p r e v i o u s l y u n m u t a t e d covarion, t h e n f i = 0.604 f~. Since e i is t h e exp e c t e d n u m b e r of p r e v i o u s l y m u t a t e d covarions, c - e i is t h e e x p e c t e d n u m b e r of covarions w i t h o u t p r i o r m u t a t i o n s b e i n g f i x e d in t h e m . A n d since t h e n e x t fixation occurs s o m e w h e r e w i t h p r o b a b i l i t y one, field- fi (c - - el) = 1. Solving, fi = 0.604/(c - - 0.396ei).
(,)
T h u s fi is t h e p r o b a b i l i t y t h a t a f t e r i fixations, t h e n e x t f i x a t i o n occurs in a p a r t i c u l a r p r e v i o u s l y m u t a t e d codon. W e shall let g i = l - - f i - A f t e r t h e f i r s t f i x a t i o n t h e r e is precisely I c o v a r i o n t h a t h a s f i x e d a m u t a t i o n a n d t h e r e f o r e e 1 = 1, f r o m w h i c h fl m a y b e calculated. To find t h e s u b s e q u e n t values of fi r e q u i r e s t h e s u b s e q u e n t values of e i for w h i c h we n o w derive a r e c u r s i o n relation. I R e c e n t l y is h e r e a r e l a t i v e t e r m . The calculations are b a s e d u p o n m u t a t i o n s b e t w e e n a n a n c e s t r a l f o r m (the n o d e of some p h y l o g e n e t i c tree) a n d its n e x t i m m e d i a t e l y k n o w n d e s c e n d a n t (the n e x t n o d e or a p r e s e n t d a y species). T h e model, as f o r m a l l y s t a t e d , does n o t p e r m i t a once m u t a t e d c o d o n to be r e m o v e d f r o m t h e variable g r o u p a n d s u b s e q u e n t l y r e t u r n e d to t h a t g r o u p in t h e s a m e i n t e r n o d a l period. N e v e r t h e l e s s , e v e n if r e m o v a l a n d r e t u r n o c c u r ill t h e s a m e i n t e r n o d a l interval, t h e m o d e l still w o r k s if v is i n t e r p r e t e d as a f u n c t i o n t h a t reflects t h e a v e r a g e a m o u n t of t i m e a c o n c o m i t a n t l y v a r i a b l e c o d o n is likely to b e variable. 2 T h e v a l u e 0.604 was a r r i v e d a t as follows. E x c l u d i n g m u t a t i o n s i n v o l v i n g t e r m i n a t i o n codons, t h e r e are precisely 166, 176, a n d 50 w a y s , of altering t h e first, s e c o n d a n d t h i r d n u c l e o t i d e s of codons, r e s p e c t i v e l y , so as t o c h a n g e t h e coding f r o m one a m i n o acid t o a n o t h e r . This is a t o t a l of 392 w a y s . T h e p r o b a b i l i t y Pi t h a t t h e i m n u c l e o t i d e is i n v o l v e d in an o b s e r v a b l e m u t a t i o n is t h e r e f o r e a s s u m e d to be t 66/392, 176/392 a n d 50/392 or 0.423, 0.449 a n d 0.t28 for i = t , 2 a n d 3 r e s p e c t i v e l y . T h e p r o b a b i l i t y t h a t a second m u t a t i o n will be f i x e d in a n u c l e o t i d e position o t h e r t h a n 3
t h e p o s i t i o n of t h e first f i x a t i o n is ~, Pi (Pj d- Pk) w h e r e i 4= j ~= k 4= i. i=l
R a t e of C h a n g e of C o n c o m i t a n t l y V a r i a b l e Codons
95
To find ei+ 1 we n o t e t h a t t h e f i x a t i o n following t h e i th occurs a m o n g t h e p r e v i o u s l y m u t a t e d c o v a r i o n s w i t h p r o b a b i l i t y eif i. T h e r e m a i n i n g e i - elf i p r e v i o u s l y m u t a t e d c o v a r i o n s are s u b j e c t to loss of t h e i r c o v a r i o n s t a t u s b y v i r t u e of t h e effect of t h i s l a s t m u t a t i o n so t h a t o n l y e i (1 -- fi) v of t h e m r e m a i n v a r i a b l e . To t h i s g r o u p m u s t b e a d d e d tile c o v a r i o n t h a t fixed t h e l a s t m u t a t i o n so t h a t ei+l=ei(t--fi)v
+i =eigiv +1.
(2)
T h u s g i v e n a n y el, fi c a n b e c a l c u l a t e d f r o m e q u a t i o n I a n d g i v e n a n y e i a n d fi, ei+l c a n b e c a l c u l a t e d f r o m e q u a t i o n 8. Since e 1 is t, all e i a n d fi are o b t a i n a b l e . Now, since t h e p r o b a b i l i t y t h a t t h e i th + t f i x a t i o n occurs i n a p r e v i o u s l y m u t a t e d c o v a r i o n is elf i t h e n ri+ 1 = 1 - - e i f i (3) is t h e p r o b a b i l i t y t h a t t h e i Ch+ 1 f i x a t i o n occurs i n a p r e v i o u s l y u n m u t a t e d c o v a r i o n . T h e first f i x a t i o n m u s t o c c u r in a n u n m u t a t e d covarion, h e n c e r 1 = 1 . All o t h e r r i c a n b e o b t a i n e d f r o m Eq. (3). W e n o w proceed t o c a l c u l a t e t h e p r o b a b i l i t y Sih of a s e c o n d f i x a t i o n in a c o v a r i o n i n t h e h f i x a t i o n s following t h e i th fixation. C o n s i d e r a specific c o d o n w h i c h fixes its first m u t a t i o n a t t h e i th fixation. A f t e r t h i s i n i t i a l fixation, t h e r e follow k s u b s e q u e n t f i x a t i o n s in t h e gene. W e s h a l l let Pik b e t h e p r o b a b i l i t y (a), t h a t t h e i th f i x a t i o n o c c u r r e d in a p r e v i o u s l y u n m u t a t e d c o d o n A N D (b), t h a t t h e i th f i x i n g c o d o n was still v a r i a b l e be/ore t h e k th s u b s e q u e n t f i x a t i o n A N D (c), t h a t t h e k th s u b s e q u e n t f i x a t i o n was n o t fixed in t h a t s a m e i ~h fixing codon. F o r b r e v i t y we will t e m p o r a r i l y d r o p t h e s u b s c r i p t i. W e shall d e v e l o p a r e c u r s i o n r e l a t i o n for Pk- Since t h e k Chf i x a t i o n o c c u r r e d i n a d i f f e r e n t codon, t h e p r o b a b i l i t y t h a t t h e i th f i x i n g c o d o n r e m a i n s v a r i a b l e a f t e r t h e k th s u b s e q u e n t f i x a t i o n b u t h a s n o t fixed a s e c o n d m u t a t i o n is VPk , while (1 -- v) Pk is t h e p r o b a b i l i t y t h a t , as a r e s u l t of t h e k ~h s u b s e q u e n t fixation, i t will b e r e m o v e d f r o m t h e v a r i a b l e g r o u p a n d t h e r e f o r e c o u l d n ' t fix a f u t u r e m u t a t i o n . F u r t h e r m o r e , l e t t i n g j -= i + k, t h e p r o b a b i l i t y t h a t t h e n e x t m u t a t i o n will b e fixed in t h a t c o d o n is t h e n f j v P k a n d t h a t t h e n e x t o n e w o n ' t b e fixed t h e r e , e v e n t h o u g h i t is possible, is g j v P k . T o s u m m a r i z e r e g a r d i n g a c o d o n fixing its s e c o n d m u t a t i o n w i t h t h e n e x t fixation : Pk is t h e p r o b a b i l i t y i t could h a v e b u t h a s n ' t , 1 - - P k t h a t i t c o u l d n ' t or h a s already; Vpk is t h e p r o b a b i l i t y i t still can, ( 1 - - v ) p k t h a t it c a n ' t ; f j v P k is t h e p r o b a b i l i t y i t will, g j v p k t h a t i t w o n ' t b y c h a n c e . T h e p r o b a b i l i t y , Pk+l, t h a t a second f i x a t i o n will n o t h a v e o c c u r r e d in t h i s l o c a t i o n is t h e s u m of t h e p r o b a b i l i t i e s t h a t it c a n ' t (because t h e c o d o n is n o longer v a r i a b l e ) a n d t h a t i t w o n ' t (because of r a n d o m processes). T h u s , P k + l = P k ( 1 - - v + g j v ) or, r e t u r n i n g t h e subs c r i p t i, Pi, k+l = pi, k (1 - - fj V) (4) is t h e p r o b a b i l i t y t h a t t h e c o d o n fixing t h e i Chm u t a t i o n will n o t h a v e r e c e i v e d a s e c o n d m u t a t i o n a f t e r t h e k t h + l s u b s e q u e n t f i x a t i o n . Since we k n o w t h a t t h e m u t a t i o n following t h e i th is n o t fixed in t h e i th f i x i n g c o d o n w i t h p r o b a b i l i t y 1 - f i = gi, t h e n we also k n o w t h a t Pil = gi' T h u s , k n o w i n g Pil, v a n d all fi, p e r m i t s one to c a l c u l a t e all i+k--1 values of Pik" This p r o v e s to b e P i k = g i U (1 --fjV). j=i+l Since fi is t h e p r o b a b i l i t y t h a t t h e n e x t f i x a t i o n will b e fixed i n a n y p a r t i c u l a r o n e of t h e p r e v i o u s l y m u t a t e d covarions, i t m u s t t h e n also b e t h e p r o b a b i l i t y t h a t t h e f i r s t s u b s e q u e n t m u t a t i o n fixed will be in t h e s a m e c o d o n as fixed t h e i th. W e k n o w f u r t h e r t h a t fjvPi k is t h e p r o b a b i l i t y t h a t t h e s e c o n d f i x a t i o n will o c c u r in t h e i th f i x i n g c o d o n o n t h e j ~ h + l t r i a l g i v e n t h a t t h e i t~ fixing c o d o n failed to fix a s e c o n d 3 N o t e t h a t for large v a l u e s of i, e i + l ~ e i. T h u s s e t t i n g ~ = ~ g v + l l i m i t i n g v a l u e 6 = 1/(1 - - g v ) .
yields t h e
96
W . M . F i t c h : R a t e of C h a n g e of C o n c o m i t a n t l y V a r i a b l e Codons
m u t a t i o n i n t h e p r e v i o u s k trials. T h e r e f o r e , k--1
Sih = fi+k~= :jVPik
(5)
is t h e p r o b a b i l i t y t h a t , in t h e course of h s u b s e q u e n t fixations, a s e c o n d f i x a t i o n will o c c u r in t h e s a m e c o d o n as fixed t h e i th. As before, j = i + k . N o t e t h a t S i l = f i. T h e r e c u r s i o n r e l a t i o n for s is S i h + l = S i h + f j v p i h . N o t e also t h a t t h e p r o b a b i l i t y t h a t a s e c o n d f i x a t i o n occurs, does n o t p r e c l u d e a t h i r d o b s e r v a b l e f i x a t i o n in t h e s a m e codon. I f a t o t a l of m m u t a t i o n s h a s b e e n fixed, h o w m a n y codons will h a v e h a d t w o or m o r e f i x a t i o n s ? C o n s i d e r t h e i ~h fixation. T h e p r o b a b i l i t y t h a t t h e i th f i x a t i o n is a n i n i t i a l f i x a t i o n is r i a n d t h a t t h e i th f i x a t i o n will h a v e a s e c o n d f i x a t i o n in t h e s a m e c o d o n i n t h e following j = m - - i f i x a t i o n s is sij. T h u s , t h e e x p e c t a t i o n t h a t t h e i th fixat i o n is i n a p r e v i o u s l y u n m u t a t e d c o d o n a n d will b e followed b y a n o t h e r f i x a t i o n in t h e s a m e c o d o n is r i s i j a n d t h e e x p e c t e d n u m b e r of c o d o n s fixing t w o or m o r e m u t a t i o n s , g i v e n m t o t a l m u t a t i o n s fixed, is D ~
111--1 ~ risij. i~l
A c o m p u t e r p r o g r a m h a s b e e n p r e p a r e d t h a t c a l c u l a t e s D (double fixations) for all v a l u e s of m for a n y g i v e n s e t of v a l u e s for c a n d v. T h e s e D are c o m p a r e d w i t h t h e n u m b e r of d o u b l e m u t a t i o n s f o u n d in t h e d e s c e n t of 29 species of c y t o c h r o m e c t o d e t e r m i n e , b y a c h i - s q u a r e d e s t i m a t e , h o w well t h e a s s u m e d v a l u e s of c a n d v fit t h e o b s e r v e d d a t a . T r i a l a n d e r r o r t e c h n i q u e s were utilized t o o b t a i n t h e b e s t f i t t i n g v a l u e s of c a n d v.
References F i t c h , W . M . : S y s t e m a t i c zoology. I n press (1971 a). H a e m a t o l o g i e u n d B l u t t r a n s f u s i o n . M u n i c h : J. F. L e h m a n 1 9 7 t b (in press). -- Margoliash, E . : ] 3 r o o k h a v e n S y m p . in Biol. 21, 217-242 (1968). -- Markowitz, E . : B i o c h e m . G e n e t . 4, 579-593 (1970). -
-
W a l t e r M. F i t c h D e p a r t m e n t of P h y s i o l o g i c a l C h e m i s t r y U n i v e r s i t y of W i s c o n s i n Madison, W i s c o n s i n , 53706