J.Mol.Evol.4,2OI-247 © by Springer-Verlag
(1975) 1975
Population Genetics of Unequal Crossing over j. Krfiger and F. Vogel Institut f~r Anthropologie und Humangenetik der Universit~t Heidelberg Received December
I, 1973; in revised form July 22, 1974
Summary. The population genetics of unequal crossing over was examined for an infinite population with random mating. The following cases were considered: I. There is an initial portion of duplicated genes which offer the opportunity for unequal crossing over, but the primary event leading to the duplication does not occur any more (model ;a). 2. This primary event occurs with a certain (small) probability (model Ib). For both possibilities the long-term consequences for the distribution of "alleles" (i.e. the single gene, the duplicated gene, the triplicated gene etc.) were considered with the following additional assumptions: I. No selection. 2. Selection with maximum fitness for an optimum "allele length" (i.e. number of gene repeats). 3. For model la, selection with general advantage of longer alleles over shorter ones was also examined. The results are briefly the following: In model la under assumption 1 the distribution of allele length tends with increasing generation number to a stationary state which depends on the initial allele distribution (i.e. on the initial frequency of the duplicated gene) but not on the frequency, P, of unequal crossing over; the stationary frequencies of the alleles decrease with increasing allele length. Under assumption 2 there is likewise a stationary allele distribution, but this depends on P as on the strength of selection and not on the initial allele distribution; it is concentrated more or less tightly around the optimal allele length. Under assumption 3 no stationary state seems to be reached: the mean and the standard deviation of the allele distribution increase steadily with the generation number. In model Ib under assumption I, with certainty no stationary distribution exists. Under assumption 2 the situation is the same as that in model la; the stationary distribution of allele length is identical with that in model la for the same P and same selection strength, quite independent of the probability of the primary event. The results were discussed with respect to empirical examples in which unequal crossing over is expected to be important, for example human haptoglobins, immune globulin determining cistrons, and nucleolus organizer regions. The consequences of selection relaxation were considered. Key words: Crossing over - Unequal Crossing over - Gene Duplication Selection Relaxation.
201
|. The Problem
1.1. Unequal Crossing over
Sturtevant unequal of
(1925) was,
to the best of our knowledge,
crossing over. The object of his investigation was the bar mutation
Drosophila melanogaster,
a sex-linked
discovered by Tice
(1914). May
reverts
and Zeleny
to normal,
tensively,
(|9|7)
about
Zeleny also concluded
I in |,600 offspring
(or perhaps exclusively)
occurs exclusively
that the bar gene occasionally studying this mutation exis variable,
but than
from a pure bar stock receives
a
that the reversion probably occurs
in females.
Drosophila,
(In
crossing over
not in male germ cells.) He also found that
and which was renamed by Sturtevant
& Morgan
in one individual reported
in female,
which had been
bar gives rise to a new and more extreme allele, which he called
"ultra-bar",
Sturtevant
reported,
character,
found that the frequency of the reversion
in many stocks,
homozygous
dominant
(1919,1921,|922),
non-bar allele. chiefly
the first to describe
as "double bar".
(1923) showed that the combination of double bar and bar
also gives rise to reversions
six reversions which,
to the normal
at the same time,
crossing over in the same region.
Therefore,
state. They
gave genetic evidence
Sturtevant
for
planned an experiment
to test whether
this reversion was always connected with crossing over. This
being confirmed
in spite of the fact that both external mutations
the examination
of crossing over were located close to the bar locus, he
concluded
that the reversions
to an unequal crossing over:
as well as the double bar mutations were due In his opinion,
with two bar loci (double bar),
this leads to one chromosome
together with an other chromosome with no
bar locus at all (reverted round). a homozygous
used for
He estimated
the reversion
frequency
in
bar stock as | in 2,920 germ cells.
An explanation
of this unusually
first given by Wright
strong tendency to unequal crossing over was
(1929) who formulated
a little ambiguously:
that no difference has been detected between the demonstrable
loss of the bar gene,
"The fact
the round eye, which arises by
and ordinary round eye has suggested
that the relation of bar to round is a real example of presence and absence, which probably a translocation. behavior
implies
that the original mutation
from round to bar was
Such an origin might also be related to the so far unique
in crossing over".
This explanation was corroborated
(1936). Using the salivary gland giant chromosomes, dominant bar mutation
202
is due to a duplication
by Bridges
he showed that the simple~
of some bands.
The reversion
corresponds to the normal state, whereas double bar is caused by a triplication. Both types can be produced by one single event of unequal crossing over. In this paper, Bridges did not formulate clearly the obvious reason of this event: The mispairing of "structure homologous", but not "position homologous" chromosomes sites.
Smithies, Connell & Dixon (1962) seem to have been the first to invoke the process of unequal crossing over for a phenomenon in human genetics. First, they discovered that the haptoglobine cistron Hp 2 has almost twice the length of the alleles Hp IS and Hp IF, as evidenced by the length of the polypeptide chain. Secondly, they showed - and it has been confirmed later - that in the Hp 2 chain, the amino acid sequence of the Hp I alleles is repeated almost completely. They concluded that this allele might have been produced by unequal crossing over. Furthermore, they predicted that unequal crossing over might again occur with a relatively high probability between two Hp 2 alleles, rendering, on the one hand, an allele similar to Hp I, and, on the other hand, an allele containing the genetic information almost in triplicate. Repeated occurrence of this event might lead to still longer alleles, and, hence, to a polymorphism of allele length in the population. Smithies (1964), stressed clearly the essential difference between the first unique event producing the (almost) double cistron Hp 2 from the single cistron Hp I, and the unequal but homologous crossing over which becomes possible as soon, as the first duplicated cistron is present in the population. He concluded that this process would lead occasionally to almost triplicate cistrons, and explained the Johnson-type haptoglobins in this way. This prediction seems to have been confirmed by Dixon (1966). Nance (1963) and Smithies (1964) applied the concept to the hemoglobin cistrons, discussing especially the closely linked, and very similar $ and ~ cistrons. In this connection, Smithies explained the Lepore-type hemoglobins as due to the pairing of B and 6 cistrons and intracistronic unequal crossing over. These considerations were generally accepted (see for example Harris,
1970), and unequal crossing over is now
accepted as a relatively frequent event between duplicated cistrons. Black & Dixon (1968; see also Giblett, 1969) discussed the possibility that the first, unique event which led to the Hp 2 allele, might also be caused by mispairing and subsequent unequal crossing over. They examined the base sequences corresponding to the amino acids in positions 9-17 of the W-terminal portion, and 67-75 of the C-terminal portion of the haptoglobin I
chains. Considering
only the 18 unambiguous positions, that is, the first two bases in each of the nine corresponding codons, these authors found that there are II identities. According to their opinion, one would except from the presence of six
guanines and three cytosines that strong hydrogen bonding could occur between the transscribed DNA strand for positions 9-17 and the non-transscribed (complementary) strand for positions 67 to 75 (or vice versa).
1.2. Duplications
In the meantime different lines of evidence had led many authors to the opinion that duplication of genetic material is an important genetic mechanism of evolution. Haldane (1932) seems to have been among the first to stress this point. Metz (1947) made it the topic of his presidential talk, Stephens (1951) gave a review, and Lewis (1951) discussed the problem in connection with pseudoallelism. Recently it has been elaborated by Ohno (1970). He envisaged two main mechanisms of gene duplication: First, tandem duplication involving part of one linkage group at a time; second, duplication of the entire genome (polyploidization). As one important mechanism of the firstmentioned kind, he considered unequal, but homologous crossing over. In the meantime, evidence had accumulated that in higher organisms, including man, there are many duplicated DNA sites. This was shown directly by hybridization experiments (Britten & Kohne,
1969; Kohne, 1970) - and indirectly by compara-
tive analysis of related polypeptide chains which are obviously determined by different cistrons (for ref. see Dayhoff, 1972). Now, the process of biochemical and functional differentiation of duplicated genes and genomes is being followed up with different enzymes and isoenzymes (ref. in Ohno, 1970). One of the most conspicuous examples is the genetic determination of the immune globulins. There is ample evidence of gene duplication when the different stable parts of gammaglobuline heavy and light chains are compared with each other. Moreover, the leading hypothesis on the genetic determination of the labile parts of these chains (Hilschmann et al., 1969) assumes a succession of many up to a certain degree structure homologous, closely linked cistrons of which only one in each cell alone is active. From the similarities and differences in the labile parts of the myeloma K, % and H chains analysed so far, it can be anticipated that several hundreds or even thousands of different cistrons may exist in every individual, which can even be arranged into phylogenetic trees. Recent DNA-RNA hybridization experiments seem to confirm this hypothesis directly (Storb,
1972; Delovitch & Baglioni, 1973).
All this evidence together, only a part of which could be mentioned here, shows that gene duplication must have occurred on a large scale during evolution. Duplication, however, means chance for unequal crossing over. Therefore, reshuffling of genetic material by unequal but homologous crossing
over must have been extensive in the past, and is still frequent. It is all the more surprising that so little work seems to have been done to elucidate the population genetics of this phenomenon.
!.~._F!rme~ Wor_k_i~ Pop_ulation Genetics
The papers of Spofford (1969) and Mayo (1970) deal with the probability for incorporation of duplications into a population; a process which had been investigated first by Fisher (1922) for a single mutation, and by Nei et al. (;967) for inversions. Spofford showed that for the incorporation of a new duplication, an initial advantage is almost essential. As a possible mechanism for this advantage, she invoked single locus heterosis, i.e. the interaction of the two products of the duplicated cistrons, leading to the formation of heterodimers. Mayo showed that, even assuming duplications to be very rare events, their rates of incorporation are not negligible. He assumed either neutrality, or a reasonable selective advantage (IO-4). He also considered unequal crossing over deriving a formula for the frequency of reversion towards a single cistron b=B I in an infinite population in dependence of the probability of unequal crossing over. Besides, he carried out some calculations on the formation of triplications etc. in a finite population (of size 500). They were assumed to be either lethal or not, in the latter case the genotypes having fitnesses symmetrical about the "peak" genotype B2B2,
where B 2 is the duplication of b. The mean frequencies of
the disadvantagous allele B I for 50 successive generations were calculated with values 0.01 and O.001 for the probability r of unequal crossing over, each with four assumptions about selection. This frequency seems to be higher for the higher value of r and higher for lower values of the selection coefficient s. Due to the special nature of the assumptions and the exclusive consideration of the frequency of B I only, the results of these calculation are not very revealing.
Crow & Kimura (1970, l.c. pp.294-296) considered the formation of larger numbers of cistron repeats by unequal crossing over and algebraically derived a continuous approximation for the (discontinuous) stationary distribution of repeat numbers in the population under the influence of a special type of selection: They assumed that the fitness of the genotypes (measured in Malthusian parameters) has a maximum for an intermediate repeat number and decreases with the square of deviation from this optimum number. Thus they obtained a normal distribution. But their approach, intended only as an text-book example for stabilizing selection, is also not broad enough. 205
The problem seems to deserve a new approach. following.
In this paper,
the stochastic
This does not mean that we consider excepted
to give a clearer picture
It will be presented
aspect will deliberately
it unimportant.
However,
in the be neglected.
this reduction
is
in this early stage of the discussion.
2. Our Own Examinations
2.1. Definition Two cistrons
on homologous
chromosomes
they are located in corresponding be named structure-homologous, Position-homologous
cistrons
of these chromosomes.
if they have the same nucleotide
if
They may sequence.
are also more or less structure-homologous.
the other hand, when a chromosome of cistrons
on homologous
completely)
structure-homologous
case,
may be named position-homologous,
positions
site has been duplicated,
chromosomes,
On
there can be pairs
which are completely
(or almost
without being position-homologous.
In this
it may occur that two cistrons are pairing during synapsis which are
not position-homologous,
but structure-homologous.
to occur within this "incorrectly"
paired site,
(homologous)
(u.c.o.).
unequal crossing over
of chromosome portions
of unequal
If crossing over happens
then we use the term
The consequence
is an exchange
length.
2.2. The First Event At the onset, we assume a pair of homologous are chains of different
cistrons;
only these - are also structure-homologous. are only able to pair at meiosis impossible. Mechanisms
What we need,
Therefore,
in the ordinary,
cistrons - and
the two chromosomes
the simplest one
sites in adjacent homologous
crosswise
reunion.
is
of (at least) one cistron.
are known in cytogenetics,
being two breaks at slightly different and subsequent
Both chromosomes
classical way. u.c.o,
is an initial duplication
for such a duplication
during meiosis,
chromosomes.
the position-homologous
chromatids
Another mechanism,
mis-
pairing due to accidental homology of short base sequences within the same or different
cistrons,
has been discussed by Black & Dixon
(1968).
sites of breakage are separated just by the length of one cistron, results
this event
in two gametes which do not contain this cistron at all, together
with two other gametes containing simplicity,
we disregard
They may be lethal.
the cistron in duplicate.
the first-mentioned
On the other hand,
gametes
- For the sake of
(with the deletion).
if a gamete with the duplication
into a diploid individual by fertilization
206
If the
- and if this individual
comes
forms
a.)
Homologous unequal crossing over i f the primary duplication i5 heterozygous
cistron:
o
~bl)
I
b
bI
c
~
m
cells
b
c
i
I
i
m
I
I
I
I
I
I
i
i
b2
o
Germ
b2
o
c
1
c
or
or
b
o
i I b1
o
b2
I
I
I
I
o
c
c I
]
I
b1
c
b.) Homologous unequal crossing over H the primary duplication is homozygous o
I I
.
I
I
b2
a
a
)
c
b1
b1
I o
c
I
b2
o
13
, ....
b~ bl o
Germ cell5
~ m
.
bn
bk+ 1
I c
b1
c
I
]
t
I
I bI
I I
I
b2
b2
c
C
i
C
Bm
b._k+~"'b m
bk+l
- ,
o
I
over, (a) if the p r i m a r y d u p l i c a t i o n is d u p l i c a t i o n is homozygous. F o r m a t i o n of in case b. Diagrams on the left show the and r e u n i o n event
j bn_k
bk
b2
I
~
I o
Fig. ] a and b. Unequal crossing heterozygous, (b) if the p r i m a r y longer alleles is only p o s s i b l e c h r o m o s o m e s b e f o r e the b r e a k a g e
b1
I
bn
l ,---m
bl
c
,
~ ,-.- b
bk
bl
gn- k Bm+k
I b~
Fig. 2. M i s p a i r i n g of h o m o l o g o u s c h r o m o s o m e s w i t h s h i f t i n g of B n versus B m by k cistrons to the left, and s u b s e q u e n t unequal c r o s s i n g over. The u p p e r d i a g r a m shows the chromosomes b e f o r e the b r e a k a g e and r e u n i o n event
germ cells - there is for the first time a risk for m i s p a i r i n g homologous Fig.]:
cistrons,
and hence,
for u.c.o.
As long as the d u p l i c a t i o n
The c o n s e q u e n c e s
remains heterozygous,
of s t r u c t u r e -
are seen in
all gametes will
c o n t a i n either one or two copies
of the cistron.
comes homozygous,
types of gametes may be formed,
lead,
on the one hand,
to gametes copies
however,
containing
other
W h e n the d u p l i c a t i o n
to gametes w i t h only one copy, three,
and in subsequent
and,
generations,
be-
u.c.o,
may
on the o t h e r hand, more than three
(Fig.2).
207
Gene amplification by u.c.o, has started. The formal aspects will be examined in the following. As usual in population genetics, the real processes will be approximated by simplified models, which are necessarily up to a certain degree arbitrary. Later on, the results will be discussed considering especially the limitations imposed by the models themselves and the special assumptions made for the calculations.
2.3. The Process of Codon Amplification by u.c.o. In the following, the process of codon amplification by u.c.o, is described for an infinite population, i.e. in a deterministic way. With regard to the basic event described above, the first gene duplication, two different models are considered alternatively: Model
la. The duplication, which might be named bb, is already present in the
population, - either together with the single cistron b, or alone. It is not formed newly from b. Model
lb. The duplication is being formed in individuals homozygous for b
with a constant frequency which is, of course, independent of the frequency of u.c.o.
Let (i)
S
m
= b
Ib2 ""
.
b
m
be a consecutive sequence of repetitions of cistron b on a chromosome. Here, th the index i in b. (i = 1,2,...,m) means that b. is the copy of b on the i 1 1 position of the sequence. Neglecting possible small structural differences between these copies which could be produced by point mutations, we regard these copies of b as identical (structure-homologous). This means that, for example, also the cistron sequence blb2...bkblb2-..bm_ k which consists of the initial part of size k of the sequence Bm = blb2"''bm (k ~ m) and the initial part of size m-k of the sequence B n = blb2...b n (m-k ~ n) is identical with B . Special cases are: B 1 = bl, the simple m
208
cistron b itself;
B 2 = blb2, a duplication
of b; B 3 = blb2b3,
of b, etc. In spite of the fact that the B alleles
sensu strictiori,
"alleles"
(m=1,2,...)
m they will be regarded
of this polymorphism.
Accordingly,
a triplication
are, of course,
in the following
the constellation
not
as the
of the
alleles B and B on homologous chromosomes will be designated as genotype m n B B . The multiplicity m of cistron b in the allele B is called the length m n m of B . m During meiosis,
the genotype B B forms - apart from the "normal" gametes m n B m and Bn - by u.c.o, the gametes Bm+ k and Bn_ k (I < = k < n) or Bm_k and
Bn+ k (I _~ k < m). In the first case, cistron bk+ I of B n and, consequently,
cistron b I of Bm is mispaired with b 2 of B m with bk+ 2 of Bn, b 3 of B m
with bk+ 3 of Bn, etc. This means a shift of Bn versus Bm by k cistrons the left (Fig.2).
In the second case,
bk+ ! of B m and, accordingly,
to
cistron b I of B n pairs with cistron
b 2 of Bn with bk+ 2 of Bm etc. This means a
shift of B
versus B by k cistrons to the right I. u.c.o, must not occur at n m the point indicated in Fig.2.1t might take place anywhere within the misof the homologous
paired section blb2...bn_ k / bk+ | bk+ 2 ... b n Obviously,
the resulting
independent
chromosomes
of the frequency of u.c.o.,
into two separate events: event of unequal
m considered
I. Mispairing
- are
[Ik[ cistron units to the left reasonable
is independent
it is conveniently
subdivided
and Bn; 2. The m The probability of the first
is assumed to depend on the "size" k of the shift B
to presume
creasing k. On the other hand,
Fig.2)
the gametes
of the alleles B
crossing over which follows.
(mispairing)
versus B
therefore,
of the exact point of u.c.o.
For a discussion
event
- and,
chromosomes
(k
that this probability
the assumption
of the length of the mispaired
- and, hence,
(k>O)~.
n It is
decreases with in-
is made that this probability section
(n-k cistron units in
of the allele lengths m and n. For the sake of simplicity
the further assumption
is made that the conditional
probability
of u.c.o,
at
a given k (shift of Bn versus Bm) is independent
not only of k (this seems to
be plausible)
section - and this means,
but of the length of the mispaired
m and n - as well. presume
The last-mentioned
assumption
that the frequency of the second event,
length of the mispaired
section.
The assumptions
is not plausible: u.c.o.,
of
One would
increases with the
may be formulated more
exactly as follows: I
The condition m ~ n in Fig. l is not essential. It is only ment to secure that the chromosome loop to the right occurs in the chromosome containing B m. In the case m < n this loop would occur either in the B m containing chromosome, or in the chromosome with Bn, depending on the value of k.
209
Model
I. The probability w k (m,n) for mispairing
with a shift of B
Pk'
provided
(2)
Here,
versus B
n
that
the values
independent
(3)
m of this
a shift
wk(m,n ) = { ~ k
of the alleles B and B m n of the size k and of a subsequent u.c.o, equals
otherwise, for -(n-l)
size
~ k ~ m-I
Pk (k = ± 1, ± 2,
of the allele
lengths
is possible:
...)
(k ~ O)
are nonnegative
m and n,
and fulfil
numbers,
which are
the conditions
P-k = Pk (k = 1,2 .... ) r Pk -<_I C (r = 1,2 .... )
(4) k=l
with fixed C
is equivalent
is necessary
for reasons of symmetry
to a shift of B
m cistron units to the right), Eq.(4)
versus B
(a shift of B
n
for the same number of
m n is meant to secure that the total
probability w(m,n)
= E wk(m,n ) k
(summation over all integers k # O)
of an u.c.o, between B
and B is smaller than a constant less than I for m n all m and n. Therefore, the genotype BmB n produces by u.c.o, with probability Pk the gametes I
1
2
Bm_ k + ~ Bn+ k
(k = 1,2,..., m-l)
if m>l, - and with the same probability Pk the gametes 1
Bm+ k + ~ Bn_ k if n>l. Besides, 1
(k = 1,2,...,
n-l)
the "normal gametes"
l
Bm + ~ B n are formed with probability
I - w(m,n)
(by "normal"
crossing over or without
any genetic recombination). In the following,
only one special
case of model
! will be considered:
Pk for k>1 be so small in comparison with P1 that it can be neglected.
Let In
this case, one may set:
2The notation means that the two types of gametes are formed with the same frequency.
210
= (5)
I T P
for
Ikl ~
o
for
Ikl > I
I
Pk
This means unit
that u.c.o,
with
shift
of B
versus B of more than one cistron n m does not occur. In this case, the total
(to the left or to the right)
probability
w(m,n)
of an u.c.o, i
w(m,n)
=
between
B
m
and B
if both m and n equal P
n
is:
I,
if only one of the numbers
m and n equals
if both m and n are greater
In addition
to model
I, the following
variant
than
I,
I .
will be considered
theoreti-
cally: Model
2. The total probability
not depend
on the allele
is possible than B
n
at all,
i.e.
I. In this case,
versus
B
m
w(m,n)
lengths
m and n. It is always
the probability
wk(m'n)
m
equal
and B
does n to P if u.c.o.
of m and n is greater
of an u.c.o,
f o r - ( m - I )=k=n-I <<
n-I
with a shift k of
the Pk
(k # O)
= I ~ I Pk + ~--k=lPk otherwise
(k= ± I, ± 2,...)
of m and n, and satisfy
.
are nonnegative
condition
too, only the special
of B
for more
B
m for
than
numbers
which
are independent
(3).
For this model, versus n wk(m,n) = O
wk(m,n)
B
is:
1
Here,
between
if at least one of the values
Ira_ P × Pk (6)
of an u.c.o,
case will be considered,
1 cistron
Ik[ > I. Obviously,
unit does not occur,
this holds
that a shift i.e.
that
true if PI > O, but
Pk = 0 for k > 1. In this case: W_l(m,n)
= wl(m,n)
W_l(m,n)
= P, wl(m,n)
W_l(m,n)
= wl(m,n)
The difference the over-all
probability B
for m=n=l = 0
1
= ~ P
between model
shift of the alleles direction
= 0
n > I,
for m > I, n > 1.
I and model
of u.c.o,
2 is that in the last-mentioned
has always
the same value,
P - whether
model, a
against each other is possible only in one n (if m = I, n > I, or m > 1, n=l) - or in both directions (if m > I, m
and B
for m=l,
n > I). In model direction only, directions. appeal,
I, the probability
of u.c.o.,
is half the same probability,
Both models are considered,
and,
if shifting is possible if shifting is possible
in one in both
because both have a certain a priori
to the best of our knowledge,
there are no empirical
reasons for
or against a decision in one or the other direction. As mentioned
above,
a very large
The f r e q u e n c i e s
sidered.
(practically
of t h e a l l e l e s
infinite)
B1, B 2 , . . .
population
in this
is con-
population
are
named Xl, x2,... , with
(7)
~-- x = 1 . m m=l
(For purely formal reasons,
{x } m in spite of the fact that only a finite number of its terms are
as infinite different
from O, i.e.: x
given initial x
m
to treat the sequence
= O for m > M with sufficiently
m
large M). For a
state
= x (I) m
(m = 1,2 .... )
of the distribution problems
it is practical
of the alleles BI,B2,...
in generation
I, the following
are interesting.
a) The alterations
of this distribution with increasing number of generations,
and b) the question,
whether
stationary
distributions
exist.
Random mating for the B
locus is assumed. First, the case will be examined m that genotype BIB I is unable to form gametes of type B 2 (models la and 2a).
In this case,
establishment
initial distribution
of a polymorphism
of the B
m
with m > I is not O. For example, distribution
x
~1)
= 1
compatible with this model.
into this population
is possible
only,
if in the
at least one of the allele frequencies
x (I)
, x(l)
m
= 0 (m * 2) is an initial m "Alleles" B2,B3,... must have come
in earlier times or from outside,
or by an unique event
which is not examined further. Later on, the case will be treated that genotype BIB ! can form - apart from the "normal" (models
gametes B 1 - also B 2 gametes with a certain probability
Ib and 2b). Here,
it is obligatory
d
that gametes are formed in the
same frequency which contain the deletion B 0 (i.e. "no b"). These gametes are assumed to be lethal. Models models
212
Ib (and 2b).
la and 2a are special cases
(with d=O) of
2.3.2. .
.
Selection
°
.
°
.
.
°
For both models, treated w i t h o u t
the p r o p e r t i e s selection
of the d i s t r i b u t i o n
(i.e. all genotypes:
B B m
BIB 1 in m o d e l s
Ib and 2b, are a s s u m e d
s e l e c t i o n will be included, assumed
to be different.
n
to have the same fitness). L a t e r on,
i.e. fitnesses
Here,
relative
of the d i f f e r e n t
Regarding
Li,
genotypes will
fitness of g e n o t y p e
B B will be m n this is f o r m a l l y e q u i v a l e n t
taken as f f (m,n=l,2,...). With r a n d o m mating, m n w i t h the a s s u m p t i o n that selection acts e x c l u s i v e l y selection;
{x } will first be m (m,n=l,2,...), e x c e p t i n g
on gametes
(gametic
1955) and f
is the (relative) fitness of the allele B . m m of fitness f from the a l l e l e length m, two models
the d e p e n d e n c e
m
will be examined: S e l e c t i o n Type
I. The f
by the
following
(m=l,2,...)
m
i.e. Bm has a selective
advantage
form a monotonically
increasing
over Bn, if m > n. This model
sequence,
is r e a l i z e d
formula: m-I
(8)
fm
=
I -
Sm,
sm
=
s 1q
(m
=
w i t h 0 < s I < 1 and O < q < l as given p a r a m e t e r s
S e l e c t i o n Type 2. There is an "optimum" the fm increase.
For m > mopt,
decrease
leads to f
positive
limit
allele
1,2 ' . " " )
of the model.
length m = mopt,
up to w h i c h
they d e c r e a s e w i t h i n c r e a s i n g m. Either,
the
= O for all m g r e a t e r than a c e r t a i n m a x i m u m allele m length mma x (selection type 2a), or, w i t h i n c r e a s i n g m, fm a p p r o a c h e s a
f
m
(selection
= 1 - s
type 2b). S e l e c t i o n
type 2a is r e a l i z e d
as follows:
(m = 1,2,...)
m
with m-I
for m -~ m
slq 1 (9)
s
m
=
sm
for m
/q2m-mopt opt
for m > m
l
w h e r e mopt,
mmax,
q2 is d e t e r m i n e d
0 < s 1 < 1 ' and 0 < ql by:
q2m-m°P t = s m (cf. Fig.5).
opt
opt
In selection
opt < m =< m
max
max
< 1 are given parameters,
= s q m°pt-I l l type 2b, fitness
for m = m
+ max
could,
whereas
1
for example,
be d e f i n e d
as
follows:
213
f
m
= I - s
with
(m = 1,2 .... ),
m
Is Sl qm-1
(10)
s
=
m
1
for m <-- mopt
(Sl - Sm
-
) qm-m°P t
f o r m > mop t
opt
(O < s 1 < I, 0 < q < I). The sequence {fm } as defined by Eq.(IO) approaches for m > mop t asymptotically
the value I - Sl~ i.e. the fitness fl of the
allele B I. It is largely arbitrary, which of the selection types 2a or 2b is used in the examinations. technical reasons
The authors decided in favour of type 2a, mainly for
(less numerical calculations).
2.3.3. Allele Distributions
2.3.3.1. Model •
. . ° . ° .
. . . . . .
Derived from the Different Models
la (Model I with the Additional Condition
. ° . . . . . . . o . . . o . ° . . o . . . . . . . . . . . . . .
. . . . .
•
. . . .
that Genotype B.Bl.is Unable to Form Gametes of Type B_) . . . .
. . . . . .
. ° ° . ° ~
. . . o ° . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . .
Table
I shows the frequencies of the genotypes B.B. l j (i,j = 1,2,...) in the population. They were calculated under the assumption of random mating (Hardy-Weinberg-Law)
from the allele frequencies x
(m = 1,2,...). The table m contains also the probabilities with which the different genotypes form the different types of gametes. They were calculated taking into account
(2) and
(5). The frequency x' of B gametes in the population is calculated as the m m sum of the products of the frequencies of the different genotypes BiB. with J the probabilities of B gametes being formed by these genotypes 3. Calculation m for m = I, for example, gives the following result:
2 X I
(from BIBI)
+ XlX 2
(from BIB2)
X I =
oo
+
j~3 x Ixj (I - ~I P)
(from BIB3, BIB4,
...)
°=
I
(11')
=
2
+ ~ x2P
(from B2B2)
I ~-- x2xj p +~j=3
(from B2B3, B2B4,
~-- XlXj + 1p j= I
x2xj LJ =2
...)
f- xlxj]° j=3
h 3This probability 214
is O, if the phenotype does not form any Bm gametes.
Table
I. Production of gametes in model
Genotype
B1B 1
BIB j
(j>=2)
la
Frequency in population 2
Gametes Type
Probability
x1
B1
1
J
2XlX j
B2
¼P
Bj_I
4P
B.
1 -
1
B.B. 1 j
2
B.
x. 1
(i>2)
1
1-1
~P
B.
2x.x.
(2~i
lj
1
i+l
~P
i
-~( l - P)
B.J
I ~-(1 - P)
B.
1-1
1 ~P
Bj+ 1
¼P
B. i+I
1 ~P
B.
1 ~P
B
B.B. i 3
P
1
j-I
Here, Eq.(7) was used. In the same way, it follows:
(II")
x'= m
Xm + ~P
For the following, B
m
~Xm+l-
Xm)-
(I-
Xl) (xm - X m _ l ~
for m > I.
the distinction has to be made whether the alleles
(m = 1,2,...) are subject to selection or not.
2.3.3.1.1. No Selection.
In this case, the gamete frequencies x' (m = 1,2,...) m are already the allele frequencies of the next generation. Therefore, the formulas
(II) permit a calculation of the allele distribution
generation from that in the parent generation.
in the daughter
Thus, they are recurrence
formulas for the allele frequencies: Starting from the distribution x
m
= x (I) m
(m = 1,2 .... )
215
of the alleles B
m
the distribution
in generation
I, it is possible
in all subsequent
generations
in principle
by repeated application
these recurrence formulas.
The sequence
ber of equations,
in every allele distribution
because
terms on the right-hand
X'm = Xm + ~ P
x ~ = x M + ½P [ - x M 1
=
of
(11) is only formally an infinite num-
side of Eq.(ll)
[(Xm+l - Xm) -
to calculate
actually occurring all
vanish for sufficiently
(1 - X l ) ( X m - Xm_l) ]
( I - Xl)(X M - XM_1) ]
large m:
(2 =<
m
<
M),
,
(i-
x' = O for m > M + I m The maximum allele length, M, increases
each generation by one, provided
x I < 1 in the first generation. The latter, however, It follows from Eq.(lO)
(12)
i.e.:
i rex' = m m=l
Z mx m m=l
to generation.
This result
In the absence of selection, Therefore,
u.c.o,
is obvious
this result can be regarded as control formulas
The next question
is: Does a stationary
exist? A stationary
x' = x m m
It follows
is invariant
even without
from
calculation:
only causes a rearrangement
of the recurrence
(13)
of b-cistrons.
for the correct derivation
(II).
distribution
distribution
of the allele length
is defined by
(m = 1 , 2 , . . . )
from Eq.(13),
length from generation
and from the steady increase of the maximum allele
to generation which has been shown above,
stationary distribution the trivial distribution
cannot have a maximum allele length
that a
(an exception
{x I = I, Xm = O for m > I} ). Therefore,
be reached exactly from any initial distribution:
from Eqs.(ll)
length is stationary equations
216
is
it can never
Only a limiting distribution
could be stationary. It follows
la
by some transformations:
the mean of the allele length m in the population
generation
that
is a condition of model
and (13) that a distribution
{x } of the allele m if and only if the allele frequencies x satisfy the m
x2 -
xI + x1
2
=
O,
(14) Xm+ 1 - x m -
These equations
(1 - X l ) ( X m
can be transformed
x 2 = Xl(1
- Xl)
Xm+ 1 = Xm(2 -
This means
Xm-1)
Xl)
-
-
Conversely,
distribution
~-" m= 1
It is verified
Xl(l
-
distributions
-
1-(1
Xl -
a stationary
correspond
Xl)
the stationary
distributions.
They
have the common property decreasing
B I is always
that
sequence:
the most frequent
In
allele
state.
The Problem of Convergence.
Of course,
the existence
tions of allele length does not necessarily distribution
converges
distribution
towards a stationary
depend,
Therefore
form a monotonically
la, and in absence of selection,
On the other hand,
this parameter being the frequency
x I of the allele B I. All these distributions
in the stationary
Xm as given by Eq.(15)
because of
exactly to all geometrical
Xl, x 2 x3,..,
).
1
distribution.
form a special class with one parameter,
the frequencies
from m to
(m = 1 , 2 . . . .
system of allele frequencies xl)m-I
(m = 1,2,...)
holds:
as can be verified by substitution.
they constitute
for example,
probability
m easily by inference
for a given x I with 0 < x I ~ I, the numbers
they form a complete
model
every frequence x
Xl ) m - 1
satisfy the Eqs.(14),
Hence,
(m = 2,3,...).
Xl)
m + 1 that the following relationship -
(m = 2 , 3 , . . . ) .
as follows:
Xm_l(1
that in a stationary
Xm = X l ( 1
0
,
can be expressed by x I alone.
(15)
=
in the course of generations one.
of stationary
from a given initial
It is conceivable
that this could
on the form of the initial distribution,
P of u.c.o.
Theoretically,
distribu-
mean that the allele length
or on the
it would even be possible
that conver-
gence does not occur at all. This problem will not be examined
in a general
way
however,
(such an examination would be complicated).
be stated:
If the allele length distribution
initial distribution stationary
{x(1)}m to a stationary
distribution
is determined
The following,
can
{x } converges from a given m distribution {Xm} , then, this
uniquely by the initial distribution.
It does not depend on P. This follows from the invariance
of the mean allele
217
%
length together with the fact that the parameter x I of the stationary distribution (and, hence, this distribution itself) is determined by this mean allele l e n g t h : E(m) = m=1
mx (I) m =
m~ m=l
=
m
(mean of the geometrical distribution).
m~l(i_~l)m-I
~m=l
I =~--
It follows for the special class of
initial distributions (16)
x (1) 2
=
1 -
x
11)
'
x (1)
=
m
0
(m m 3 ) -
(i.e. only the alleles B 1 and B 2 are present): '~ xI
(17)
1 E(m)
1 x
I1)
1
+ 2(l-x
I1))
For some of the initial distributions
2 -
(1) xI
"
(16) and for several values of the
probability P, the convergence of allele length distribution {x } has been m examined by numerical calculation (iteration using the recurrence formulas (II') and (II"~ 4. The values of x
11 )
and P which were included in these
calculations are shown in Fig.3. In all these cases, the distribution {x } m converges towards the stationary distribution determined by Eq.(17). Hence, it is reasonable to presume that {x } converges for every combination of m [ values x~ I) and P. 1
As mentioned above, the stationary limiting distribution {x } for fixed m (I) is independent of P The speed of convergence, however, is strongly x1 influenced by P. As a measure of this speed, the time (in number of generations) required for the distribution {x } to approach the limiting m distribution {~ } up to a distance of O.001 (=maximum of absolute values of m the frequency differences Xm - ~m ) was determined. As shown in Fig.3, this time increases with decreasing P, if x
I 1)
is fixed. This means: The higher P,
the faster the convergence. More exactly: It looks, as if this "convergence time" (with fixed x~l))is" inversely proportional to P as would be plausible. On the other hand, the convergence time increases with decreasing x
, if
P is fixed. I.e. it is the longer, the rarer B 1 is in the initial distribution. It can be concluded from Eq.(17)
(see also Fig.3) that in the stationary
state the allele B I is always more frequent, and accordingly, the other alleles
4The Siemens 2002 computer of the Astronomisches Recheninstitut, Heidelberg, was used, which is a loan of the Deutsche Forschungsgemeinschaft.
218
( Gen erafion
Time required to approach the stationary distribution up to a distance of O. 001
Sfationary state
I n i t i a l state
1)
p
1
(7) xI
=0.8
0.1 O. 05 O. 01 0.001
-k_
~
t)
= 0.5
0.1 0.05 0.07 O. 001
".
(7) = XI
-
...
2
2
3 ~ 5 6 7 8
0
AII~
Genera tions
88 720 890 8 830
780 355 t760 1761 0
{~ 1
370
O. 05 O. Ol
730 3 660
length
Fig. 3. Stationary distributions of allele length in model la without selection for 3 different initial states (16), each with the "approximation times" for several values of P. Some of these P values are unrealistically high. But this followed from the requirement to cover, on the one hand, a relatively broad range of P values and to avoid, on the other hand, too small P values for technical reasons (convergence too slow)
B2, B3,... are less frequent,
than in the initial distribution.
It follows -
and this is important for practical applications - that the model generate any noteworthy polymorphism of the alleles B
m
la cannot
(m=l,2,...), - unless
there is selection.
2.3.3.1.2.
Selection.
In this case, the frequency of the allele B
m
in the next
generation becomes: (18)
x" = f x v / ~ m mm
(m = 1,2,...)
Here, x' is the frequency of the B gametes as calculated from Eqs.(ll') m m
(ll").
f
m
is their fitness,
and
and
219
oo
Y=>-f n= ]
n
x' n
is the mean fitness of all gametes in the population. case of no selection,
In distinction from the
the following inequality holds true in most cases:
i mx" % ~ _ mx m= I m m=1 m
,
i.e. the mean allele length is not invariant any more from generation to generation.
The obvious explanation is that the gametes Bi_l, Bj+ 1 (or Bi+ I,
B._I )j
produced by the genotype B.B. by u.c.o, are not subject to the same i j selection conditions as the parental gametes B.i and B., j - in spite of the fact that they contain together the same number of cistrons b.
Quite as in 2.3.3.1.1.,
the question arises, whether stationary distributions
of the allele length exist. It follows from Eq.(18)
that a distribution
{x } m
of allele length is stationary if co
(19)
Xm = fmXm / ~ =
fnx'n
(m= 1,2,...)
If x * is expressed, according to Eq.(11') and (11"), by x I and x 2 (for m=1) m or by Xm_l, Xm and X + l ( f o r m > 1), t h e n , f o r e v e r y m = 1 , 2 , . . . , an e q u a t i o n i s o b t a i n e d i n which a l l
Xl,X2,...
o c c u r s i m u l t a n e o u s l y . A g e n e r a l s o l u t i o n of
these (infinitely many) equations cannot be achieved.
Therefore,
ing a s s u m p t i o n was made t h a t f
and o n l y s t a t i o n a r y
distributions from E q . ( 1 9 )
(20)
Xm+1 x m
{x } w i t h x m
that
m
m
> 0 for all
> O for all
these distributions
m=l,2,..,
the restrict-
m=l,2,.., were c o n s i d e r e d . I t f o l l o w s satisfy
the equations:
X t
-
fm+1 m+l f x' m m
and hence, by Eqs.(ll),
(m= 1,2,...)
,
also the equations:
f2x (x2qP r(x3x2) (1 Xl)(X2Xl)]) = f x2(x1 P(x2 x1+x)) 1
fm+ 1Xm (Xm+ 1+~P ~ Xm+2-Xm+ 1 )- ( l-Xl ) (Xm+ I- X m ~ ) = 1
= fmXm+l(Xm+~P~Xm+l-Xm)-(l-Xl)(Xm-Xm_l) ~ )
(m ~ 2).
In principle,
these equations make it possible to express x
for m=3,4,..,
as f u n c t i o n of X l , X 2 , P , f l , f 2 , . . . f m _ ] :
(21)
xm ~ ~m(Xl,X2,P,fl,f2,...fm_l)
(m = 3,4,...).
m
successively
Conversely, by Eq.(21) (22) Hence,
to given frequencies
x I and x2, the numbers
together w i t h x I and x 2 satisfy Eqs.(20)
x
m
> 0
x3,x4,..,
provided
calculated
that:
for all m = l , 2 ....
they also satisfy Eqs.(19),
if additionally:
oo
m=1
Thus,
= 1 .
x
(23)
m
the search for s t a t i o n a r y
if they exist at all - involves
distributions
{x } w i t h x > 0 (m=l,2,...) m m the f i n d i n g of frequencies x I and x2, for
-
w h i c h the x
(m=3,4,...) c a l c u l a t e d a c c o r d i n g to Eq.(21) together w i t h x 1 m and x 2 satisfy the conditions (22) and (23). (Instead, it can also be presupposed
that f
> 0 for 1 ~ m ~ M, f = O for m > M, and s t a t i o n a r y m m {x } are looked for w i t h x > O for 1 ~ m ~ M, x = 0 for m m m m > M. In this case, the E q s . ( 2 0 ff.) exist only for m < M or m ~ M). distributions
But for the solution of this problem, Furthermore,
even if the existence
of s t a t i o n a r y
length could be p r o v e n for a special special p r o b a b i l i t y necessarily
P, the question,
be reached
too, no p r a c t i c a b l e
w a y seems
distributions
to exist.
of the a l l e l e
s y s t e m {f } of fitness values and a m whether a stationary distribution would
from a given initial
distribution,
w o u l d r e m a i n open.
This is the same s i t u a t i o n as in the case of no selection. The l a s t - m e n t i o n e d
problem,
view, has been examined formulas
(II) and
b e i n g alone of interest
empirically
(18). Again,
for s e l e c t i o n
from a p r a c t i c a l
types
point of
1 and 2a, a p p l y i n g
we started w i t h the special
initial
distribu-
tion (16).
2.3.3.1.2.1.
S e l e c t i o n Type
m > n). For the p a r a m e t e r s s I = 0.2 w e r e chosen.
,
1 (B m Has a S e l e c t i v e A d v a n t a g e of the fitness model
over Bn if
(8), the values
q = 0.9
(m=l,2,...) d e t e r m i n e d in this way, m the c a l c u l a t i o n was carried out for xIl) = 0.95 and P = O.01, - and for
x
11) =
With the fitness values
0 (only the allele B 2 is present
Table 2 shows the mean allele
(24)
f
o
m
for both cases.
(m-E (m)
Besides,
at the beginning)
length E(m)
x
and P = 0.05.
together w i t h the standard d e v i a t i o n
m
the allele
length w i t h the h i g h e s t
frequency
is
221
Table 2. Distribution of allele length in model la for selection type s l = 0.2, q = 0.9, starting from the initial distribution (16) x 1( 1 )
•~
= 0 •9 5 ,
P = 0.01
Most frequent
= O,
P = 0.05
Most frequent
allele length m
1)
x
l with
allele length
Frequency
E(m)
o
m
m
Frequency
E(m) m
1
1
O.9500
05O
0.2179
2
1.0000
2 000
0
20
1
0.9227
078
0.2699
2
0.4872
2 176
0.8615
40
1
O.8797
123
0.3361
3
0.2807
2 654
1.2306
6O
1
0.8168
192
0.4160
3
0.2435
3 438
1.5738
80
1
O.7293
296
0.5121
4
O.2081
4 5O7
1.8678
100
1
0.6155
451
0.6271
6
0.1886
5 751
2.O859
140
2
0.4006
999
0.9184
8
O.1685
8 315
2.3805
180
3
0.3164
2 992
1.2347
11
0.1515
I0 7O9
2.6O40
220
4
0.2732
4.364
1.4654
13
0.1427
12.880
2.7929
15
0.1345
14.844
2.9600
260
6
0.2440
5.867
1.6211
300
7
0.2299
7.358
1.7406
16
0.1276
16.628
3.1124
400
11
0.2010
10.758
1.9547
20
0.1158
20.469
3.4511
5OO
13
0.1857
13.637
2.1113
23
0.1062
23.658
3.7501
60O
16
0.1791
16.093
2.2412
26
0.O995
26.387
4.0224
700
18
0.1705
18.226
2.3557
28
0.0931
28.773
4.2748
8O0
2O
0.1632
20.111
2.4597
90O
22
0.1552
21.799
2.5562
1000
23
0.1515
23.330
2.6468
1100
25
0.1447
24.731
2.7326
1200
26
0.1423
26.023
2.8145
1300
27
0.1387
27.223
2.8931
1400
28
O.1350
28.343
2.9687
1500
29
0.1317
29.394
3.0418
tabulated.
In both cases,
the tendency of the most frequent
allele length as
well as of the mean allele length shows that the bulk of the distribution shifts constantly generations
towards higher allele length with increasing number of
(cf. also Fig.4).
frequent allele tendency permits
length
The calculation was broken off when the most
(and also E(m)) became about 30, but the observed
the conclusion
not tend to a stationary
that the distribution
distribution
also the standard deviation ~ means that the distribution
of allele length does
in either case. Not only E(m), but
increases with the number of generations. This m of allele length becomes more and more extended.
Generation
Fig. 4 Propagation of the distribution of allele length with time in model la with P = O.01, x(l) = 0.95 for selection type llwith s 1 = 0.2, q=O.9
I00
200
300
500
1000
5
I0
• '
IS
"
2 0
"
Allele
However,
the increase of ~
2.~
m 3o-limit of the distribution It is reasonable
is slower than that of E(m). Therefore, increases
constantly.
holds true generally
and the probability P of u.c.o.
for selection For a definite
however,
further calculations
Besides,
the results do not permit any conclusions
I, independ-
conclusion,
m
as to the exact way in
depends on the parameter values.
Selection Type 2a (=There is an Optimum Allele Le_ng!h ~. A great
number of combinations examined completely.
mop t = IO,
of model parameters
Therefore,
parameters mopt, mmax,
is possible.
it was decided
mma x = I00, ql = 0.9
They cannot be
to work with fixed values of the
and ql of the fitness model
This leaves free the parameter were chosen.
type
with other parameter values would be needed.
which the growth rate of E(m) and o
2.3.3.1.2.2.
the lower
to presume that this tendency which was found for special
values of the parameters ently of x
" " 3 d
length
(19):
.
Sl, for which the values O.1, 0.2, and 0.4
Fig. 5 shows the fitness values for these parameter values
depending on the allele length m. The calculation
of the allele length
223
o.5
,o
20
ao
mopt
~o
so
6b
~
Bb
9b
Fig.5. Relative fitness, length, m, for s e l e c t i o n 3 v a l u e s of s I
~oo
--
t
L~ngth of ellele ( m )
mmax f , of a l l e l e B (m=1,2,...) in d e p e n d e n c e on a l l e l e m . m type 2a w l t h m = I0, m = 100, ql = 0.9 and opt max
Frequency 1,0
$1 =
0.1
$1 :
0.2
SI =
O.l,
05
5
10
15
5
10
/5
5
10
75
Length of allele Fig.6. S t a t i o n a r y d i s t r i b u t i o n of allele length in model s e l e c t i o n type 2a w i t h mop t = 10, mma x = I00, ql = 0.9
224
la w i t h P = O.O!
for
S
=
0.1
S
=
o.2
S
=
o.4
Frequency
1.o
o5
.....
r..k I0
5
.., 15
5
I0
l~
Length o f
5
10
15
allele
Fig. 7. Stationary distribution of allele length in model la with P = 10-4 for selection type 2a with mop t = 10, mma x = 1OO, ql = 0.09
distribution {x } was carried out for all combinations of one of the three m
values of s I with one of the five values each of P:
5xlO - 3 ,
and 10 - 2 "
t h e two v a I u e s cases. tions than
For the parameter
x
10-4 , 5xiO -4,
of the initial
0 . 9 5 and 0 . 0 5 w e r e c h o s e n .
Altogether,
10-3 ,
distribution
(16),
t h e s e w e r e 30 d i f f e r e n t
I n a l l o f t h e m , {x } t e n d e d t o a s t a t i o n a r y distribution. The c a l c u l a m w e r e b r o k e n o f f when t h e a l l e l e f r e q u e n c i e s x d i d n o t c h a n g e b y more m
10 . 8
from one g e n e r a t i o n
P and S l ,
the stationary
tion (of x
I')
to the next.
distribution
is
For all
combinations
independent
of the initial
), but depends on s 1 and P (Tables 3-5, Figs.
there is a difference from case 2.3.3.1.1.
of values
of
distribu-
6 and 7). Hence,
(model la, no selection)
in which
the stationary distribution depends on the initial distribution, but not on P. In all cases,
the most frequent allele length (the mode)
is the same, and is
225
Table 3. Stationary distribution of allele length in model la for selection type 2a with mop t = IO, mma x I00, s 1 0. I, q 0.9 =
m
=
=
P
10 - 4
5xlO - 4
10 - 3
4
5xlO - 3
10 - 2
.
O.0OO0
5
O,OOOO
6
0.0002
0.OOOO
0.0004
0.OO18
7
O.0OOO
0.OOO0
O.0001
0.0037
0.0098
8
0.OO01
0.O012
0.0037
0.0263
0.0421
9
O.O106
0.0445
0.0722
0.1270
O.O131
10
0.9562
0.8082
0.6753
0.3496
0.2630
11
0.0325
0.1345
0.2114
0.2802
0.2467
12
0.0005
0.0111
0.0336
0.14'08
0.1634
13
0.0000
0,0006
0.0035
0.0522
0.0853
0.0000
0.0003
0.0153
0,0370
0.0000
0.0037
0.0137
16
0.0008
0,0044
17
0.0001
0.0013
18
0.0000
0.0003
14 15
19
0.0001
20
0.0000
E(m)
10.023
10.112
10.210
10.612
10.832
identical with the "optimum" allele length mop t = 10. However, with fixed Sl, the frequency of this allele length increases, and the distribution concentrates more tightly around this allele length, with decreasing P. A similar concentration is seen when P is fixed, and s I is enhanced. Generally, the mean allele length E(m) in the stationary state is somewhat larger than the mode m . This is a consequence of the skewness of distribution which is opt very obvious from Tables 3 - 5 (see also Figs. 6 and 7), and results from the asymmetry of fitness (see Fig.5). Apart from the shape of the stationary distribution, the rate of convergence of {Xm } towards this distribution, hand,
the shape of the initial
to influence
distribution
the convergence rate
the distribution
{x } s t a r t i n g m
0 . 0 5 B1
too, depends on s; and P. On the other
itself.
(i.e. Instead,
the value if
of x
11) ) s e e m s
s I and P a r e f i x e d ,
from +
0 . 9 5 B2
reaches every state between initial and final distribution by a nearly
not
Table 4. Stationary distribution of allele length in model t y p e 2a w i t h mop t = 10, mmax = 1OO, s 1 = 0 . 2 , ql = 0 . 9 m
la for selection
P
10 - 4
5xlO - 4
10 - 3
5xlO - 3
10 - 2
4 5
O.OOOO
6 7
O.OOOO
0.0003
O.OO00
O.0OOO
0.0008
0.0029
0.0010
0,0109
0.0218
8
0.0000
0.0003
9
0,0052
0.0233
0.0411
0.0964
0.1110
I0
0.9743
0.8806
0.7827
0.4382
0.3223
11
0.0203
0.0910
0.1581
0.2982
0.2833
12
0.0002
0.0046
0.0159
0.1159
0.1598
13
0.0000
0.0002
0.0011
0.0317
0.0675
0.0000
0.0001
0.0067
0.0229
0.0000
14
"
15
0.0011
0.0065
16
0.0002
0.0016
17
0.0000
0.0003
18
0.0001
19
0.0000
20 E(m)
10.016
10.077
constant number of generations 0.95 B I
10.150
I0.538
10.777
earlier than the distribution which starts from +
0.05 B 2 .
The extent of this "lead" depends on s I and P too. Fig.8 shows the time (in generations),
after which the maximum deviation of the frequencies
from the stationary
frequencies
"time of convergence" creasing
x 1) . When s 1 and x
d e p e n d e n c e on P i s n o t c o n s t a n t convergence
time increases
however,
decreases
ties
it
to fit
convergence the definite with
shift
increasing
in its
Fig.8
fixed,
on t h e o t h e r
direction.
First,
P. A f t e r
represents
to the points
sl, and increases with inhand,
having
reached
o n l y one o f d i f f e r e n t
rendered
o f t h e maxima t o w a r d s h i g h e r of Sl,
seems to be r e a l .
by t h e p a i r e d
its reason is not evident.
P-values,
the
as e x p e c t e d ,
which is
This behaviour
the
a maximum, possibili-
values
(PoIynomials of degree 4 in log P were chosen.)
value
time is unexpected;
are held
with decreasing
again.
smooth curves time.
m to 10 -3. With constant P, this
has reduced
decreases with increasing
I
x
o f P and
However, observed
of convergence
Table 5. ~tationary distribution of allele length in model la for selection t y p e 2a w i t h mop t = 10, mmax = 100, s 1 = 0 . 4 , q l = 0 . 9 m
P
10-4
IO
5xlO -4
10-3
5xlO -3
10-2
O.O000
O.0OOO
0.O000
O.O000
0.0001
0.0005
O.OO00
0.0001
0.0002
0.0033
0.0081
0.0024
O.0111
0.0206
0.0614
0.0794
0.9846
0.9262
0.8600
0.5450
0.3984
0.0129
0.0606
0.1116
0.2898
O.3109
2
0.0001
0.0020
0.0072
0.0820
0.1418
3
0.0000
0.0000
0.0003
O.O158
0.0463
0.0000
0.0023
O.0117
5
0.0003
0.0024
6
O.OO00
0.0004
4
7
0.0001
8
0.0000
9 20 E(m)
10.O11
10.O53
10.106
10.444
10.698
The results may be summarized as follows: Model la in connection with selection type 2a, and with additional, biologically meaningful assumptions about the probability P of u.c.o, leads to stationary distributions of the allele length, representing a polymorphism of a few alleles B
m
with moderate
lengths.
Now, the (unrealistic) assumption will be abandoned that at the beginning of the process alleles B
with m > I are already present in the population. m stead, the following initial distribution will be used: (25)
x I1)
= 1,
x(l)
= 0
for
In-
m > 1.
m
Besides, the existence of an event is assumed, by which genotype B I B 1 ,
apart
from gametes Bl, occasionally produces gametes carrying the duplication B 2.
No of generations. 5000
4000
~000
x l [I)
d
Model i
1 l
lO -5
t
IO 4
0.95
..t (6
000
o to -6 -I0 -5 10.4 -0 0 _
,'I "I
10"5 10.5 10-4 0 ] 0
~0.95 ~0.05
h
o
0.05 I I I 0.95 ~05
'A I
0
Ib i
2000
I
1o .6
lo
Ib la
Ib la
t 10 .4
5xlO "~
10 - 3
5xlO-3
I0 *2
---~ p
Fig. 8. Dependence of "convergence time" (=number of generations required to approach the stationary distribution up to a distance O.001) on P for models la and lb and selection type 2a with m _~ = lO, mma x = lOO, q! = 0.9, and some paired values of s I and xI]) (model ? ~ or s I and d (model lb)
This "initial
event" has been discussed above; here,
its special nature
will not be considered. In this case, (26)
Here,
~
the distribution
B ° + (l - d) B 1 + ~
d is the probability
In order to s i m p l i f y
of gametes B2 .
of duplication,
the model,
for genotype BIB l would be:
B o
B
o gametes are
is the deletion of cistron b. a s s u m e d to be l e t h a l .
In this
229
case, the frequency x v of B gametes in the population can be derived from m m Table ! after modification according to Eq.(26) analogous to model la:
, _
Xl
1 1 - -~-dx12
{ x 1 + ~IP ( x 2 _ x l
+
Xl2-) _ d x ~ }
(27') x I + x 21) -
P(x 2 =
dx~(2
- Xl)
X l +
2 - dx~
, _
,-gq l
{x 2 + =
PE(x3-x2)-(I-xl)(x2-xl) ]
(27")
+ dx~(1+x 2)
= x2 +
)
2 - dx~
x' m
1 1
2
{x m
1 + 7P~(Xm+l
-
Xm)-(1-Xl)(Xm-Xm-l~}
=
1 ---~-dx I (27"') p ~ X m + 1_Xm)_ ( l_Xl ) (Xm_Xm_ i~ =
+ d X l2X m
(m $ 3).
+
X
m
2 - dx2|
Further analysis will be carried out separately for the cases without selection and with selection against the alleles B
2.3.3.2.1. No Selection.
m
(m=l,2,...).
Here, the frequencies
already the allele frequencies
of gametes x' (m=l,2,...) are m in the next generation, i.e. formulas (27) are
again recurrence formulas for the distribution of allele length, {x }. They m starting from the initial distribution (25), to calculate
permit in principle,
{x } successively for every subsequent generation. m Unlike 2.3.3.1.1.,
the mean allele length E(m) is not invariant from one
generation to the next: Eq.(12)
(28)
E'(m) =
mx' m m=l
i.e.
E(m)
occur).
increases,
is now replaced by
] 1
mx 2
1 - z-kdXl
unless
x I equals
m=l
0
m
E(m) I
2
'
1 - zi--dXl
(meaning that no transitions BI+B 2
Again,
the next problem to be examined
distributions. (m=l,2,...)
From Eq.(27)
is the question for stationary
it can be seen that the allele frequencies
form a stationary
x
m if and only if they satisfy the
distribution,
equations
(29)
2 1- ~ X l ) X 1 ,
x2
= (1-(1-26)x
x3
= (2-Xl-6X~)X2-(l-(1-~)Xl)X
1
2 Xm+ 1 = ( 2 - X l - ~ X l ) X m - ( l - X l ) X m _ Here,
~ = d/P. On the other hand,
x I must be 0 in the stationary
wise E(m) would increase according have only the trivial
( m => 3).
1
to Eq.(28).
state,
With x I = O, however,
other-
Eqs.(29)
solution
x I = x 2 = x 3 = ... = O , which,
in turn, represents
therefore,
no distribution.
a stationary distribution
In the case of no selection,
of the allele length does not exist.
the contrary,
the allele length distribution
higher allele
lengths with increasing
2.3.3.2.2.
Selection
allele frequencies frequencies in 2.3.3.1.2
generation number.
lb. Quite as shown above for model
in the next generation
However,
usefulness.
Furthermore, allele
la, the
can be derived from the gamete
selection
they would lead still much less to any result of Therefore,
the problem will again be treated numerically.
type 2a (one fitness optimum;
length) will exclusively be examined:
results with selection shorter ones) convergence
still more towards
x' (m=l,2,...) using formula (18). The general considerations m about the existence of stationary distributions could be made
analogously. practical
in Model
is expanding
On
type
of a maximum from the
I (general advantage of longer alleles
in the case d = 0 (model
to a stationary
presence
It can be concluded
over
la) that for d > O, a forteriori,
distribution
can be expected
no
for this selection
type. Starting with the initial distribution was calculated
for the same numerical
(25), model
Ib with selection
values of the parameters
type 2a
mopt, mmax,
ql' Sl' and P as had been used in the case d = O (model
la). Each of the 15
paired values of s I and P were combined with the values
10 -4 , 10 -5, and 10 -6
of d. The results were as follows: In each of the 45 cases mentioned, verges
to a stationary
and coincides
distribution.
the allele length distribution This distribution
exactly with the stationary
distribution
{x } conm is independent of d,
for the same
231
Generation
d
=
0
d
=
10
-5
d
=
10 -z'
5 lO length of allele
15
lO0
500
1000
stationary distribution
5
I0
15
5
I0
15
Fig. 9. Convergence of the allele length distribution to the stationary state for model la/lb and selection type 2a with P = 10-3 , mop t = 10, mma x = 1OO, s 1 = 0.2, ql = 0.9, and 3 values of d. I.Comparison of the distribution at the same times
(s1,P)-combination in the case d = O. Only the speed of convergence depends besides of s] and P = on d, too. In Fig.8, generations,
the "convergence time" (= time in
after which the m a x i m u m deviation of the allele frequencies x
from their stationary values has become
m 10-3 ) is also given for the 45 cases
d > O. With fixed values for s 1 and P, the convergence time increases with decreasing d. This result is plausible.
Besides,
the convergence time in each
of the 3 cases d > O is longer than for the two comparable cases d=O. At the first glance, however,
232
this result seems to be a little surprising.
It can be explained,
if the different starting positions of the two distributions are
d= O ,xlh)=O,95
d=lO-S,xlh) =I
d=lO ~, x;1) =1
325
233
2OO
Genen ~ 425
Genec 333
Gener. 500
Gener. 725
GeneK 633
Gener 1225
Gener. 1133
I00
t~
G en er
~. Genett tO00
r-
sfotio nary
.
5
tO
rd i s t r i bution
4FL,.
15
S
I0
15
5 tO length of UUele
15
Fig. 10. C o n v e r g e n c e of the allele length d i s t r i b u t i o n to the s t a t i o n a r y state for m o d e l la/Ib and s e l e c t i o n type 2a w i t h P = 10-3, mop t = IO, mma x = 1OO, s l = 0.2, ql = 0.9, and 3 values of d. I I . C o m p a r i s o n of the d i s t r i b u t i o n s at tzmes shifted to c o m p e n s a t e for the d i f f e r e n t s t a r t i n g c o n d i t i o n s (see text for details)
taken into account:
In the case x
lead over the d i s t r i b u t i o n production demonstrate Here,
of B 2 gametes
> O, the d i s t r i b u t i o n
in the case x
from g e n o t y p e
{x } has a c e r t a i n m = O, w i t h w h i c h the a d d i t i o n a l
BIB l cannot pull up. Figs.9
and
IO
this for the special case of s I = 0.2, P = 10 -3 more exactly.
the a l t e r a t i o n
of the d i s t r i b u t i o n w i t h time is shown side by side for
the cases d = O w i t h x distributions l)OOO).
~1)
~1)
= 0.95,
are c o m p a r e d
The d i s t r i b u t i o n
d = 10 -5, and d = 10 -4. In Fig.
in the same g e n e r a t i o n
for d = O has a c e r t a i n
(l,
9 the three
100, 200, 500 and
lead as c o m p a r e d w i t h d > O.
233
This lead diminishes distributions
gradually,
but even at generation
have not pulled up (furthermore,
d = 10 -4 has a lead compared with d = 10-5). Therefore, distributions
generation numbers are shifted against each other.
time t d + I (generations)
is quite different:
and has almost disappeared
for d = 0 are given a
Now,
The shift t d
for d = IO -4. With this
the distributions
to d = O. This lead, however,
after about 500 generations.
convergence
time for d > 0 (cf. Fig.8),
the unequal
starting conditions.
Model 2 (Alternative Assumption
which has been mentioned
for d > 0 at the
starting positions.)
for d = |0 -5 and 133 generations
the situation
but the
The shift t d (depending
(The distributions
d > 0 have a certain lead as compared
2.3.3.3.
in Fig. 10 the same
that the distribution
in order to secure identical
is 225 generations
for
agrees as exactly as possible with the distribution
for d = O in the Ist generation.
"handicap"
the other
are not compared at the same number of generations
on d) has been chosen in such a way,
"handicap"
1,000,
also the distribution
is, indeed,
Hence,
for is small,
the longer
only a consequence
about P)= Now,
the alternative
in 2.3.1. will be examined briefly.
of
model,
Only the case
d = O (model 2a) will be considered. The probabilities
of the different
the different
genotypes,
probabilities
for the genotypes
types of gametes,
which are produced by
can again be taken from Table BIB j (j=2,3,...).
I excepting
These genotypes
the have now
the following gamete distribution:
(30)
I I-P) Bj + ½P B2 + ½P Bj_ I ( l - P ) B I + ~(
The frequency x' of the B m
Table
gametes
in the population
can be taken from
m
I after modification
according
to Eq.(30)
analogously
to 2.3.3.1
and
2.3.3.2: ,
I
(31')
xI
(31")
x 2' = x 2 + 1p[(I+xl) (x3-x2)-(l-Xl)(X2-2Xl) ]
(31"')
X'm = Xm + ~PE(l+Xl)(Xm+l-Xm)-(1-Xl)(Xm-Xm_l)]
=
xI
+
~PE(1+xl)x 2
-
2(1-Xl)Xl]
,
I
The structure of formulas ing formulas
(m >= 3)
(31) is similar to the structure
(11) for model
la, with the one difference
of the correspond-
that not only the
case m = I, but also the case m = 2 must be treated separately.
234
In the following, the case will be examined that the alleles B (m=],2,...) m are not subject to selection. Then, the gamete frequencies x' are already the m allele frequencies in the next generation. Again the question comes up, whether there are stationary distributions of the allele length. It follows from Eq.(31) that a distribution (x } of allele length is stationary, m if the frequencies x satisfy the equations: m (l+Xl)X 2 -
(32)
2(1-Xl)X
1 = 0 ,
(l+Xl)(X3-X2)-(1-Xl)(X2-2Xl)
= 0
(l+Xl)(Xm+l-Xm)-(1-Xl)(Xm-Xm_l)
, = 0
It was shown above that the correspondingEqs.(14) solutions
if and only
in model la have the
(15). In the same way, it can be demonstrated that the complete set
of solutions of Eqs.(32) is given by:
(33)
O -~ x I -~ 1
arbitrary,
x m = 2 X l \ 1 + Xl ]
(m=2,3,...)
For O < x I < I, the numbers Xl,X2,... frequencies
calculated from Eq.(33) are, indeed,
(i.e. they are non-negative,
and their sum is I). Hence, they
form a stationary distribution of allele length. The distribution (33) is almost a geometrical distribution with the parameter 2Xl/(l+Xl): only the frequency of B 1 is decreased as compared with its corresponding value in a geometrical distribution. Accordingly,
the frequencies
of the other alleles are increased by a constant factor, because the sum of frequencies must be equal to I. Again, the question, whether the distribution of allele length converges from a given initial distribution towards a stationary distribution cannot be answered generally. As in case 2.3.3.1.1., only examination of special cases can help to solve this problem. If there is convergence, the stationary distribution does not depend on P, but is determined uniquely by the initial distribution. Quite as in 2.3.3.1.1.,
this follows from the invariance of
the mean allele length E(m). For the special initial distribution
(16), the
stationary frequency x; of B 1 results as:
(34)
+~/ ~xl = -(l-x}l)) v (l-xll))
2 + 1 .
It follows from Eqs.(33) and (34) that B 1 is the most frequent allele in the
235
stationary state. Besides, and, correspondingly,
the stationary frequency of B 1 is always higher
that of B 2 is lower than the initial frequency.
Model 2a has these properties
in common with model
la.
3. Discussion
In the following, we shall discuss some of the formal and biological consequences of the examined models.
3.1. The First Event. Establishment of Duplications in the Population As mentioned above,
the event which leads to the first cistron duplication
has to be treated separately from u.c.o., as some other mechanism is required for its occurrence. cannot occur any more, B 2. In this case,
la, it was assumed that this first event
the first duplication must have been introduced into the
population from outside, population itself
In model
i.e. that the genotype BIB I is unable to form gametes
or it must have occured as an unique event in the
(this means a probability of almost 0), and must have found
a way to establish itself. In the model rate
Ib, it was assumed that gametes B 2 are formed at a recurrent
(= with a small, but not negligible probability)
At the first glance, ever,
from genotype BIB I.
only the second model seems to be realistic.
This, how-
is true only if one sticks to the deterministic model and to the
assumption of random mating. With regard to such very rare events, however, the limitations of this model become obvious. The calculations show that with these limitations random mating),
a duplication needs a long time in order to reach a fre-
quency at which u.c.o,
can occur at a non-negligible rate (Fig. lO). This
remains true even with an (unrealistically) event
(deterministic model and
(cf. 2.3.3.2.),
high probability d of the first
and with a selective advantage.
The reason is that
two rare events are required: I. The duplication must occur, and 2. the duplication must become homozygous,
before u.c.o,
allele containing more than two copies of the cistron
can lead to any
(allele Bm; m > 2).
The latter event could occur by chance in a random mating population. sidering the breeding structure of higher organisms, however,
Con-
it is much
more likely that it occurs population
group,
in a consanguineous
within a short time.
If the group starts growing
for the process of gene amplification This is the reason why the model
As expected,
in number,
a continuous
difference
"mutation"
generation
pressure),
With selection,
however,
BIB I homozygotes
does not affect
together with a selective advantage
3.2. The Consequences (Critical Evaluation The calculations homologous
of B 2 alleles
of u.c.o.,
For the final results of
and a realistic
selection model
it is unimportant,
whether
the
if There is No Selection
does not exceed one cistron length.
This limitation
It seems to be plausible
is
that Pk
of u.c.o.) will decrease with increasing distance of the cistrons
The chromatid
have deformed cistrons
loops
(cf. Figs.
I and 2) will be longer,
and the
distance k that the random mutations
so much that mispairing
is made impossible,
the distance between cistrons within an allele will be the larger, the time which has elapsed since these cistrons have originated However,
there is no reason for the assumption
O with k > I. Therefore, would change,
however,
of allele
because the longer
from a common
that Pk will become
the question has to be discussed,
if this oversimplification
is still lacking; variability
at all.
(Fig. 10). This
of the Models)
probability will increase with increasing
ancestor.
from
or never.
an oversimplification:
(probability involved:
to
were limited to the case that the shift of structure
cistrons
undoubtedly
from generation
is affected very little
results of this study:
seldom,
la (in ab-
the form of the final distribution
(assumption of an optimal allele length), primary event occurs
in model
the mean allele length remains constant.
is scattered more and more).
The rate at which it is approached
u.c.o,
con-
in the absence of
it turns out that the production
is one of the interesting
of the
pressure BI+B 2 leads to a
the maximum allele length increases
(the distribution
Comparison
of their long-time
can only be observed
(very slow) increase of the mean allele length, whereas sence of the "mutation" In both models
in a small
the conditions
la has also been examined.
for the allele frequencies
selection.
Besides,
are given.
two models has shown that an appreciable sequences
mating.
the duplication may attain a relatively high frequency
how the results
would be abandoned.
it may be presumed with caution
This discussion
that the
length will generally be increased compared with the
simplified model.
237
Which conclusions can be drawn from these models, if there is no selection? (Models la, Ib and 2). As mentioned above,
all the models
(la, Ib, 2) lead to an increase of the
maximum allele length from generation to generation, and, correspondly, also to an increase of the variance of the allele length which, however, diminishes more and more. On the other hand, in each model, B I will finally be the most frequent allele whether it has this property already in the initial state or not. In model la and 2a, the mean allele length remains constant from one generation to the next, whereas in model Ib (formation of additional allele B 2 from BIB|) the mean is growing gradually (presumably, this is true also for model 2b). In models la and 2a, the allele length distribution converges to a stationary distribution, at least if it has started from the special state where only B 2 exists apart from B|. In model
la, the stationary fre-
quencies of the alleles BI,B2,... form a geometric sequence (with a factor less than I). In model 2a, this is true only from B 2 onwards, but in this model, too, the complete sequence of the stationary allele frequencies is monotonously decreasing. In model Ib, on the other hand, a stationary distribution is not reached even after extremely many generations. Alteration of the allele frequencies, however, diminishes gradually, and the allele length distribution takes a shape similar to that found in model la: the system of allele frequencies forms a monotonically decreasing sequence. For all practical purposes, this means that without selection, the variability remains relatively small, even if longer and longer alleles are formed at a small frequency. The alleles with one or a few copies of the cistron are always more frequent than the alleles containing many copies. Probably, elimination of the simplifying assumption that mispairing can only lead to a shift of one cistron length would not change this picture appreciably; possibly, it would enhance the range of variability.
3.3. Selection Models and Their Consequences The assumption seems to be plausible that the alleles formed as consequence of u.c.o, are subject to different selection pressures.
In our models, the
possibility has not been considered that the probability P of u.c.o, is itself influenced by selection. Concerning selective advantages and disadvantages of the alleles, different assumptions were made: Selection type I (2.3.2., 2.3.3.1.2.1.) can be described as "the longer, the better". Here, the bulk of the distribution shifts constantly towards higher allele length. No stationary distribution is reached. Not only the mean, 238
but the standard deviation,
as well,
increases with the number of generations
albeit not quite as much as the mean. becomes more and more widespread. generalized beyond
It looks,
as if these results
one would expect long stretches
together with an appreciable
amount of variability
clustering of individual values around the mean. ization experiments with Gamma globulin
could be
(Delovitch & Baglioni,
and a certain
First results
of hybrid-
from protein sequence data
1973; see also Storb,
"the more,
of duplicated
loci for the labile chain seem to
confirm the great number of loci predicted
the principle
of allele length
the special cases examined.
With this type of selection, cistrons
The distribution
1972). Also on a priori reasons,
the better" would be reasonable
in this case,
if
the number of these cistrons defines
the versability
of the antibody response.
Indirect
intraindividual
variability
evidence
for an appreciable
response could be derived from many clinical man.
It can be hoped that hybridization
termine this range of variability
and serelogical
experiments
ness increases,
directly.
whereas above this value,
is O; and a second one,
it decreases
two different possibilities
in which the fitness decreases
up to which the fit-
again.
in which it approaches
Besides,
For the exact
were considered:
up to a certain maximum value,
greater than O. For technical reasons, examined more thoroughly.
in
will soon allow to de-
In selection type 2, an optimum allele has been assumed,
manner of this decrease,
of the immune
observations
asymptotically
One,
above which it to a limit
only the first possibility was
the differences
of the two models
are
relatively unimportant. This mode of selection was examined together with models and
Ib
(2.3.3.2.2.).
examinations
As mentioned
above,
|a (2.3.3.1.2.2.)
the most remarkable
was that the difference between these two models
i.e. that a "mutation"
pressure of new B 2 alleles
influence at all on the limiting distribution,
result of our is so small;
from BIB I zygotes has no
and so little influence on
the rate of approach to this distribution. In both models,
the allele distributions
tend towards
which the optimum allele is more frequent
stationary
than all the others,
frequency of other alleles decreases with increasing difference numbers
from the optimum state.
creasing probability different
alleles
of u.c.o.,
(Sl)
and with decreasing
(cf. Fig. ll). The stationary
of the initial distribution. of Crow & Kimura
The range of variability
Qualitatively,
(1970) who theoretically
states,
in
and the of cistron
increases with in-
fitness difference between distribution
is independent
this result is the same as that
derived a normal distribution
as
S I =0. I
I co
I S = 0.2
o io ~
5xlo -~
lo -3
5xlo -3
lo - 2
p
Fig. 11. Standard deviation of allele length in the stationary state for model |a and selection type 2a with mop t 10, m = 100, ql = 0 . 9 , a n d 3 values
of
s1
max
(continuous) approximation for the stationary distribution in their model (cf. 1.3), with the variance
(35)
V =
u~ k
"2c
where u is the probability of the allele B
changing over to something else m 2 (i.e. essentially the probability of u.c.o.), o k the variance of the
distribution of shift number k, and c a measure of the'~trength" of selection (comparable with sl)o The practical consequence of these results is the following: If we find in a special case a certain sequence of structure homologous cistrons for which u.c.o, is possible, and if the allele length in this case clusters around a certain value, then, the assumption is justified that this value represents an optimum allele length with maximum fitness. If the range of variability
240
is small,
this can either mean large differences
of similar length, indicates Again,
or a small probability P of u.c.o. A large variability
either small fitness differences,
it may asked,
assumption
in which direction
or relatively high value of P.
abandoning
of the simplifying
that mispairing with a shift of more than one cistron length does
not occur would influence
the result.
what the standard deviation not only the neighbouring u.c.o.
in fitness between alleles
We presume
in the stationary
that it would enhance some-
state:
In the general model,
but all alleles are involved
If the effect of this additional
involvement
in the process of
could be compared with
an increase of P, this would mean that the standard deviation o
would be m increases with P. This would be in agreement with the in-
increased, as ~ N ference which may be drawn from Eq.(35):
The variance V is proportional
Ok, and o k will increase if shift numbers ever,
greater
than
to
] can also occur. How-
this problem needs further examination.
Looking around for an example, globin polymorphism populations
one would be inclined to think of the hapto-
in man. Here, Hp 2 is the most frequent allele in many
(for ref. cf. Giblett,
]969).
In our notation,
it would re-
present very nearly the allele B 2. The Johnson type alleles, hand, which represent
on the other
type B 3 (or other alleles of still higher order),
to be very rare. This would point towards a selective advantage Hp 2 (possibly only under certain conditions, allele in some other populations), all higher alleles B
m
seem
of allele
as Hp I is the most frequent
and towards a definite disadvantage
of
(m=3,4,...).
3.4. Selection Relaxation Modern civilization
has led to a sharp decrease of mortality
the age groups before and during the reproductive
period.
especially
One consequence
the relaxation of selection especially by agents which were responsible former times for most of this early mortality. important,
how much a (sudden)
Therefore,
in
the question
in is
selection relaxation would influence a system
which had been built up before by u.c.o,
under the influence of selection,
for example the type 2a discussed
above. An answer can be derived from an
application
case of model
of the selection-free
tion of the selection phase as initial tion converges
is
at all from this initial
state.
la on the stationary distribu-
If the allele length distribu-
state - this is not yet proven but
may be presumed - it tends to a geometrical
distribution which is completely
determined by the mean allele length in the stationary
state of the selection
241
phase.
This follows from the fact that the mean does not alter if no
selection works. More precisely,
the parameter
limiting distribution
is the reciprocal
mean is approximately
equal to the optimal
x I of the geometrical
of the mean.
In our special case,
allele length. Hence,
bution would become dispersed more and more in both directions formerly optimal allele length;
the
the distri-
from the
the maximum cistron number would increase,
more and more alleles with fewer and fewer cistrons would also be formed, including B 1 - up to the point when the geometrical For the optimal allele length approximately
10, e.g.,
distribution
is reached.
the final frequency of B I would be
10%, and those of B2,B3,...
approximately
allele B20 would still have a frequency of nearly 0.5%. When the optimal allele length increases,
9%, 8.1% etc. The
1.4%, and B30 of nearly
the final frequency of B I in
the relaxation phase will become lower but, at the same time,
the decrease of
the other allele frequencies with increasing allele length will become slower:
the final state will more and more approach an uniform distribution.
In the haptoglobin
case mentioned
relaxed many generations a higher Hp
1
frequency,
before would be characterized,
than populations
which selection
is still at work.
of their significance factors,
loci, selection
relaxation
for infection protection.
who have only very few such cistrons, On the other hand,
Here,
likely because
in absence of other together with maintenance
to more and more individuals
and will possibly suffer from immune
It is not unreasonable
the number of autoimmune
These diseases are more or less neutral they do not influence
middle and higher age,
in
it would also lead to many persons with an
extremely high number of such cistrons.
because
is especially
spreading of the allele length distribution
that this could enhance
on the one side, by
with the same initial gene frequencies,
of a high mean would lead, on the one hand,
defects.
in which selection had
and, on the other hand, by a higher frequency of
Johnson alleles
For the Gamma globulin
above, populations
diseases
to suspect
and allergies.
from the point of view of selection,
reproduction
very much. 5 they can be a real danger .
But especially
in
5According to Schull & Neel (1965), frequency of allergic diseases seems to be decreased in children from consanguineous marriages. This would favour the hypothesis that very many different genes concerned with antibody production enhance the danger of allergies.
242
3.5. Problems Which Go beyond the Examined Models: For Example Mutations In our discussions, complicate
we have deliberately
neglected
some other factors which
the situation when the models are applied to concrete
The most important of these problems If a point mutation,
is the occurrence
of random mutations.
for example a single base substitution,
which is only present
once,
situations.
hits a cistron
this will have one of three different
con-
sequences: I. The function degeneracy
is not impaired,
either because
of the code, or because
the change occurred within
the amino acid substitution
the
is inert
functionally. 2. The function is altered
in a way that does not impair the vitality of the
individual very much. 3. The function
is vitally altered or completely
destroyed;
the individual
is seriously debilitated. The third possibility
is a frequent one, because
the destroyed
function cannot
be replaced. If there are many cistrons possibility
for a certain function,
no.3 will hardly ever occur: Even a completely
be substituted
functionally by other cistrons.
globulin cistrons mentioned
above:
would lead to a slight decrease
Inactivation
of relaxed
inert cistron can
To return to the immune of some of them by mutation
in the spectrum of different
which the organism is able to produce antibodies. conditions
on the other hand,
selection,
Especially
antigens
against
under the
this would not have any dramatic
consequences;
the decline would occur very slowly. How the effects of random mutation with those of u.c.o.,
and especially,
their combined effects,
how selection relaxation
- this problem remains
3.6. Possible Protective Mechanisms
combine
influences
to be examined.
against u.e.o.
The Highly Redundant DNA Parts The DNA of higher organisms redundancy. nucleolus
Conspicuous
organizer
including man contains
of high
among them - and relatively well examined - are the
regions,
forming hybridization
sequences
which produce rRNA. Bross & Krone
experiments,
have about 416 rRNA producing
have estimated
cistrons.
(1972) per-
that normal human beings
These cistrons which are located
243
close to each other within the nucleolus organizer regions should be exposed to relatively frequent u.c.o., because they are obviously structure homologous, but not position homologous. On the one hand it is difficult to figure out a mechanism by which selection
should maintain a very specific optimum
number of these cistrons. Furthermore,
the system is expected to be relatively
insensitive towards inactivation by random mutation, for reasons explained above (the function of all these cistrons is identical). In conclusion, one would expect an enormous amount of variability for these cistrons. From the results of the above-mentioned authors, it appears, as if there would, indeed, be some variability. The individual values, however, seem to be clustered around a mean. The variability seems to be smaller than expected. We do not deny that a plausible selection model could be constructed for explanation. However, another possibility has to be considered: Does the organism have a special mechanism for protection against u.c.o.? In this connection,
it is interesting to remember that normal crossing over does not
occur at random either, but is subject to certain restrictions. It has been shown repeatedly in different organisms that constitutive heterochromatin does not take part in crossing over. According to Natarajan & Gropp (1971), the following pattern of meiotic behavior is observed: "a) The heterochromatic segments remain condensed in all stages of meiotic prophase, in contrast to euchromatin, b) they pair homologously,
similar to euchromatin,
till pachytene, c) they separate in early diplotene, because of the absence of any chiasma in these region". According to these authors, the same behavior was observed by others in the plants Salvia and Plantago, in
Drosophila,
and
in the tomato. Evidence is accumulating that nucleolus organizer regions coding for rRNA are located in these regions (Hedgehogs: Natarajan & Gropp, 1971; Microtus agrestis: Natarajan & Sharma,
1971). Natarajan & Gropp (1971) -
following a suggestion of Yunis & Yasmineh (1971) - discuss the possibility that the lack of chiasmata may be a mechanism to protect vital genes from crossing over and mutation. In our opinion, the protection from u.c.o, could be still more important.
3.7. Problems for Further Research By our theoretical investigations, a number of additional problems are suggested: What is the actual probability P of u.c.o.? On which conditions does it depend? In which way is it influenced by the distance between the structure homologous,
but not position homologous cistrons? How much is it decreased by smaller differences between these cistrons which are produced by random mutation? How does selection work to build up long series of neighbouring structurehomologous, but not position-homologous cistrons? Is u.c.o, the only mechanism involved here, or are other mechanisms also possible, for example, multiple replication cycles of restricted DNA areas during GI? Keyl (1966) discovered that two subspecies
of Chironomus thummi showed marked
differences in the DNA content only of certain chromosome bands which can most easily explained in this way. How large is the interindividual variability of allele length for systems which contain many redundant copies of identical (or very similar) cistrons? Knowledge on this variability could supply information on P, if s] (selection) is known, or on Sl, if P is known. The work mentioned above using hybridization technics seems to be promising. Which are the consequences of selection relaxation for these systems; especially, if not only u.c.o., but also the effects of random mutation are considered? How do special protective mechanisms interfere with u.c.o.?
References
Black,J.A., Dixon,G.H.
(1968). Amino acid sequence of the alpha chains of
human haptoglobins and their possible relation to the immunoglobin light chains. Nature 218, 736 Bridges,C.B.
(1936). The bar "gene", a duplication. Sci. 83, 210
Britten,R.J., Kohne,D.E. Bross,K., Krone,W.
(1968). Repeated sequences in DNA. Sci. 161, 529
(1972). On the number of ribosomal RNA genes in man.
Humangenetik 14, 137 Crow, J.F., Kimura,M.
(1970). An introduction to population genetics theory.
New York-Evanstone-London: Dayhoff,M.O.
Harper & Row
(1972). Atlas of protein sequence and structure, Vol.5. Silver
Spring, Maryland: Nat. Biomed. Res.Found. Delovitch,T.L., Baglioni,C.
(]973). Estimation of light-chain gene reite-
ration of mouse immunoglobin by DNA-RNA hybridization. Proc.Nat.Acad. Sci. Wash. 70, 173 Dixon,G.H.
(;966). Mechanisms of protein evolution. Essays Biochem. 2, ]48
Fisher,R.A.
(1922). On the dominance ratio. Proc. Roy. Soc.Edinburgh 42, 321
Giblett,E.R.
(1969). Genetic markers in human blood. Oxford-Edinburgh:
Blackwell Haldane,J.B.S. Harris,H.
(1932). The causes of evolution. London: Longmans Green
(1970). The principles of human biochemical genetics. Amsterdam-
London: North Holland Hilschmann,N., Kayne,M.,
Barnikol,H.U.,
Hess,M., Langer,B., Ponstingl,H.,
Suter,L., Watanabe,S.
(1969).
Structure and formation of anti-
bodies. In: Current problems in immunology H.E.Bock, E.Grundmann, KeyI,H.G.
(Bayer-Symposium I), O.Westphal,
Eds., p.69. Berlin-Heidelberg-New
(1966). Increase of DNA in chromosomes.
Vol.l, C.D.Darlington,
Steinmetz-
York: Springer
In: Chromosomes today,
K.R.Lewis, Eds., p.99. London: Oliver & Boyd
Kohne,D.E.
(1970). Evolution of higher-organism DNA. Q.Rev. Biophys. 3, 327
Lewis,E.B.
(1951). Pseudoallelism and gene evolution.
Symp. Quant. Biol. Li,C,C.
16, 159
(1955). Population genetics.
May,H.G.
(1917).
Cold Spring Harbor
Chicago: Univ. of Chicago Press
Selection for higher and lower facet numbers in the bar-eyed
race of Drosophila and the appearance of reverse mutations.
Biol.Bull.
33,
361 Mayo,O.
(1970). The role of duplications
Metz,C.W.
Amer.Natur. Nance,W.E.
in evolution. Heredity 25, 543
(1947). Duplication of chromosome parts as a factor in evolution. 81, 81
(1963). Genetic control of hemoglobin synthesis.
Natarajan,A.T.,
Gropp,A.
chromatic segments in hedgehogs. Natarajan,A.T.,
Sci. 141, 123
(1971). The meiotic behavior of autosomal hetero-
Sharma,R.P.
(1971).
Chromosoma (Berl.)
35, 143
Initiated uridine induced chromosome
aberrations in relation to heterochromatin and nuclear organization in
Microtus agrestis L. Chromosoma (Berl.) 34, 168 Nei,M., Kojima,K.-I.,
Schaffer,H.E.
(1967). Frequency changes of new
inversions in populations under mutation-selection
equilibria. Genet. 57,
741 Ohno,S.
(1970). Evolution by gene duplication.
Berlin-Heidelberg-New
York:
Springer Ohno,S.
(1972). Origin, maintenance and significance of genetic polymorphism.
In: The biological significance of the histocompatibility on a colloquium held at Titisee (Schwarzwald), E.GHnther, E.Albert, F.Kueppers, Schull,W.J., Neel,J.V.
14-15, 1971,
K.Bender, Eds., Humangenetik
14, 173
(1965). The effect of inbreeding on japanese children.
New York: Harper & Row
246
October
antigens. Report
Smithies,O.
(1964). Chromosomal rearrangements and protein structure. Cold
Spring Harbor Symp.Quant. Biol. 29, 309 Smithies,O., Connell,G.E., Dixon,G.H.
(1962). Chromosomal rearrangements and
the evolution of haptoglobin genes. Nature 196, 232 Spofford,J.B.
(1969). Heterosis and the evolution of duplications. Amer.Natur.
103, 407 Stephens,S.G.
(1951). Possible significance of duplication in evolution.
Adv. Genet. 4, 247 Storb,U.
(1972). Quantitation of immunoglobin genes by nucleid acid hybrid-
ization with RNA from myeloma and spleen microsomes. J.Immunol. Sturtevant,A.H.
108, 755
(1925). The effects of unequal crossing over at the bar locus
in drosophila. Genet.
IO, 117
Sturtevant,A.H., Morgan,T.H.
(1923). Reverse mutation of the bar gene correlated
with crossing over. Sci. 57, 746
Drosophila. Biol. Bull. 26, infra-bar in Drosophila. Amero
Tice,S.C.
(1914). A new sex-linked character in
Wright,S.
(1929). The dominance of bar over
221
Natur. 63, 1034 Yunis,J.J., Yasmineh,W.G.
(1971). Heterochromatin,
satellite DNA, and cell
function. Sci. 174, 1200 Zeleny,C.
(1919). A change in the bar gene of drosophila involving further
decrease in facet number and increase in dominance. J.Gen.Physiol. Zeleny,C.
(1921). The direction and frequency of mutation in the bar-eye
series of multiple allelomorphs of Zeleny,C.
2, 69
Drosophila.
J.Exp.Zool. 34, 203
(1922). The effect of selection for eye facet number in the white
bar-eye race of
Drosophila melanogaster.
Genet. 7, 1
Prof. Dr. F. Vogel Institut fHr Anthropologie und Humangenetik der Universitgt D-6900 Heidelberg MSnchhofstr. 15a Federal Republic of Germany
247