Population genetics of unequal crossing over

The population genetics of unequal crossing over was examined for an infinite population with random mating. The following cases were considered:1. T...

2 downloads 29 Views 2MB Size

Download PDF

J.Mol.Evol.4,2OI-247 © by Springer-Verlag

(1975) 1975

Population Genetics of Unequal Crossing over j. Krfiger and F. Vogel Institut f~r Anthropologie und Humangenetik der Universit~t Heidelberg Received December

I, 1973; in revised form July 22, 1974

Summary. The population genetics of unequal crossing over was examined for an infinite population with random mating. The following cases were considered: I. There is an initial portion of duplicated genes which offer the opportunity for unequal crossing over, but the primary event leading to the duplication does not occur any more (model ;a). 2. This primary event occurs with a certain (small) probability (model Ib). For both possibilities the long-term consequences for the distribution of "alleles" (i.e. the single gene, the duplicated gene, the triplicated gene etc.) were considered with the following additional assumptions: I. No selection. 2. Selection with maximum fitness for an optimum "allele length" (i.e. number of gene repeats). 3. For model la, selection with general advantage of longer alleles over shorter ones was also examined. The results are briefly the following: In model la under assumption 1 the distribution of allele length tends with increasing generation number to a stationary state which depends on the initial allele distribution (i.e. on the initial frequency of the duplicated gene) but not on the frequency, P, of unequal crossing over; the stationary frequencies of the alleles decrease with increasing allele length. Under assumption 2 there is likewise a stationary allele distribution, but this depends on P as on the strength of selection and not on the initial allele distribution; it is concentrated more or less tightly around the optimal allele length. Under assumption 3 no stationary state seems to be reached: the mean and the standard deviation of the allele distribution increase steadily with the generation number. In model Ib under assumption I, with certainty no stationary distribution exists. Under assumption 2 the situation is the same as that in model la; the stationary distribution of allele length is identical with that in model la for the same P and same selection strength, quite independent of the probability of the primary event. The results were discussed with respect to empirical examples in which unequal crossing over is expected to be important, for example human haptoglobins, immune globulin determining cistrons, and nucleolus organizer regions. The consequences of selection relaxation were considered. Key words: Crossing over - Unequal Crossing over - Gene Duplication Selection Relaxation.

201

|. The Problem

1.1. Unequal Crossing over

Sturtevant unequal of

(1925) was,

to the best of our knowledge,

crossing over. The object of his investigation was the bar mutation

Drosophila melanogaster,

a sex-linked

discovered by Tice

(1914). May

reverts

and Zeleny

to normal,

tensively,

(|9|7)

about

Zeleny also concluded

I in |,600 offspring

(or perhaps exclusively)

occurs exclusively

that the bar gene occasionally studying this mutation exis variable,

but than

from a pure bar stock receives

a

that the reversion probably occurs

in females.

Drosophila,

(In

crossing over

not in male germ cells.) He also found that

and which was renamed by Sturtevant

& Morgan

in one individual reported

in female,

which had been

bar gives rise to a new and more extreme allele, which he called

"ultra-bar",

Sturtevant

reported,

character,

found that the frequency of the reversion

in many stocks,

homozygous

dominant

(1919,1921,|922),

non-bar allele. chiefly

the first to describe

as "double bar".

(1923) showed that the combination of double bar and bar

also gives rise to reversions

six reversions which,

to the normal

at the same time,

crossing over in the same region.

Therefore,

state. They

gave genetic evidence

Sturtevant

for

planned an experiment

to test whether

this reversion was always connected with crossing over. This

being confirmed

in spite of the fact that both external mutations

the examination

of crossing over were located close to the bar locus, he

concluded

that the reversions

to an unequal crossing over:

as well as the double bar mutations were due In his opinion,

with two bar loci (double bar),

this leads to one chromosome

together with an other chromosome with no

bar locus at all (reverted round). a homozygous

used for

He estimated

the reversion

frequency

in

bar stock as | in 2,920 germ cells.

An explanation

of this unusually

first given by Wright

strong tendency to unequal crossing over was

(1929) who formulated

a little ambiguously:

that no difference has been detected between the demonstrable

loss of the bar gene,

"The fact

the round eye, which arises by

and ordinary round eye has suggested

that the relation of bar to round is a real example of presence and absence, which probably a translocation. behavior

implies

that the original mutation

from round to bar was

Such an origin might also be related to the so far unique

in crossing over".

This explanation was corroborated

(1936). Using the salivary gland giant chromosomes, dominant bar mutation

202

is due to a duplication

by Bridges

he showed that the simple~

of some bands.

The reversion

corresponds to the normal state, whereas double bar is caused by a triplication. Both types can be produced by one single event of unequal crossing over. In this paper, Bridges did not formulate clearly the obvious reason of this event: The mispairing of "structure homologous", but not "position homologous" chromosomes sites.

Smithies, Connell & Dixon (1962) seem to have been the first to invoke the process of unequal crossing over for a phenomenon in human genetics. First, they discovered that the haptoglobine cistron Hp 2 has almost twice the length of the alleles Hp IS and Hp IF, as evidenced by the length of the polypeptide chain. Secondly, they showed - and it has been confirmed later - that in the Hp 2 chain, the amino acid sequence of the Hp I alleles is repeated almost completely. They concluded that this allele might have been produced by unequal crossing over. Furthermore, they predicted that unequal crossing over might again occur with a relatively high probability between two Hp 2 alleles, rendering, on the one hand, an allele similar to Hp I, and, on the other hand, an allele containing the genetic information almost in triplicate. Repeated occurrence of this event might lead to still longer alleles, and, hence, to a polymorphism of allele length in the population. Smithies (1964), stressed clearly the essential difference between the first unique event producing the (almost) double cistron Hp 2 from the single cistron Hp I, and the unequal but homologous crossing over which becomes possible as soon, as the first duplicated cistron is present in the population. He concluded that this process would lead occasionally to almost triplicate cistrons, and explained the Johnson-type haptoglobins in this way. This prediction seems to have been confirmed by Dixon (1966). Nance (1963) and Smithies (1964) applied the concept to the hemoglobin cistrons, discussing especially the closely linked, and very similar $ and ~ cistrons. In this connection, Smithies explained the Lepore-type hemoglobins as due to the pairing of B and 6 cistrons and intracistronic unequal crossing over. These considerations were generally accepted (see for example Harris,

1970), and unequal crossing over is now

accepted as a relatively frequent event between duplicated cistrons. Black & Dixon (1968; see also Giblett, 1969) discussed the possibility that the first, unique event which led to the Hp 2 allele, might also be caused by mispairing and subsequent unequal crossing over. They examined the base sequences corresponding to the amino acids in positions 9-17 of the W-terminal portion, and 67-75 of the C-terminal portion of the haptoglobin I

chains. Considering

only the 18 unambiguous positions, that is, the first two bases in each of the nine corresponding codons, these authors found that there are II identities. According to their opinion, one would except from the presence of six

guanines and three cytosines that strong hydrogen bonding could occur between the transscribed DNA strand for positions 9-17 and the non-transscribed (complementary) strand for positions 67 to 75 (or vice versa).

1.2. Duplications

In the meantime different lines of evidence had led many authors to the opinion that duplication of genetic material is an important genetic mechanism of evolution. Haldane (1932) seems to have been among the first to stress this point. Metz (1947) made it the topic of his presidential talk, Stephens (1951) gave a review, and Lewis (1951) discussed the problem in connection with pseudoallelism. Recently it has been elaborated by Ohno (1970). He envisaged two main mechanisms of gene duplication: First, tandem duplication involving part of one linkage group at a time; second, duplication of the entire genome (polyploidization). As one important mechanism of the firstmentioned kind, he considered unequal, but homologous crossing over. In the meantime, evidence had accumulated that in higher organisms, including man, there are many duplicated DNA sites. This was shown directly by hybridization experiments (Britten & Kohne,

1969; Kohne, 1970) - and indirectly by compara-

tive analysis of related polypeptide chains which are obviously determined by different cistrons (for ref. see Dayhoff, 1972). Now, the process of biochemical and functional differentiation of duplicated genes and genomes is being followed up with different enzymes and isoenzymes (ref. in Ohno, 1970). One of the most conspicuous examples is the genetic determination of the immune globulins. There is ample evidence of gene duplication when the different stable parts of gammaglobuline heavy and light chains are compared with each other. Moreover, the leading hypothesis on the genetic determination of the labile parts of these chains (Hilschmann et al., 1969) assumes a succession of many up to a certain degree structure homologous, closely linked cistrons of which only one in each cell alone is active. From the similarities and differences in the labile parts of the myeloma K, % and H chains analysed so far, it can be anticipated that several hundreds or even thousands of different cistrons may exist in every individual, which can even be arranged into phylogenetic trees. Recent DNA-RNA hybridization experiments seem to confirm this hypothesis directly (Storb,

1972; Delovitch & Baglioni, 1973).

All this evidence together, only a part of which could be mentioned here, shows that gene duplication must have occurred on a large scale during evolution. Duplication, however, means chance for unequal crossing over. Therefore, reshuffling of genetic material by unequal but homologous crossing

over must have been extensive in the past, and is still frequent. It is all the more surprising that so little work seems to have been done to elucidate the population genetics of this phenomenon.

!.~._F!rme~ Wor_k_i~ Pop_ulation Genetics

The papers of Spofford (1969) and Mayo (1970) deal with the probability for incorporation of duplications into a population; a process which had been investigated first by Fisher (1922) for a single mutation, and by Nei et al. (;967) for inversions. Spofford showed that for the incorporation of a new duplication, an initial advantage is almost essential. As a possible mechanism for this advantage, she invoked single locus heterosis, i.e. the interaction of the two products of the duplicated cistrons, leading to the formation of heterodimers. Mayo showed that, even assuming duplications to be very rare events, their rates of incorporation are not negligible. He assumed either neutrality, or a reasonable selective advantage (IO-4). He also considered unequal crossing over deriving a formula for the frequency of reversion towards a single cistron b=B I in an infinite population in dependence of the probability of unequal crossing over. Besides, he carried out some calculations on the formation of triplications etc. in a finite population (of size 500). They were assumed to be either lethal or not, in the latter case the genotypes having fitnesses symmetrical about the "peak" genotype B2B2,

where B 2 is the duplication of b. The mean frequencies of

the disadvantagous allele B I for 50 successive generations were calculated with values 0.01 and O.001 for the probability r of unequal crossing over, each with four assumptions about selection. This frequency seems to be higher for the higher value of r and higher for lower values of the selection coefficient s. Due to the special nature of the assumptions and the exclusive consideration of the frequency of B I only, the results of these calculation are not very revealing.

Crow & Kimura (1970, l.c. pp.294-296) considered the formation of larger numbers of cistron repeats by unequal crossing over and algebraically derived a continuous approximation for the (discontinuous) stationary distribution of repeat numbers in the population under the influence of a special type of selection: They assumed that the fitness of the genotypes (measured in Malthusian parameters) has a maximum for an intermediate repeat number and decreases with the square of deviation from this optimum number. Thus they obtained a normal distribution. But their approach, intended only as an text-book example for stabilizing selection, is also not broad enough. 205

The problem seems to deserve a new approach. following.

In this paper,

the stochastic

This does not mean that we consider excepted

to give a clearer picture

It will be presented

aspect will deliberately

it unimportant.

However,

in the be neglected.

this reduction

is

in this early stage of the discussion.

2. Our Own Examinations

2.1. Definition Two cistrons

on homologous

chromosomes

they are located in corresponding be named structure-homologous, Position-homologous

cistrons

of these chromosomes.

if they have the same nucleotide

if

They may sequence.

are also more or less structure-homologous.

the other hand, when a chromosome of cistrons

on homologous

completely)

structure-homologous

case,

may be named position-homologous,

positions

site has been duplicated,

chromosomes,

On

there can be pairs

which are completely

(or almost

without being position-homologous.

In this

it may occur that two cistrons are pairing during synapsis which are

not position-homologous,

but structure-homologous.

to occur within this "incorrectly"

paired site,

(homologous)

(u.c.o.).

unequal crossing over

of chromosome portions

of unequal

If crossing over happens

then we use the term

The consequence

is an exchange

length.

2.2. The First Event At the onset, we assume a pair of homologous are chains of different

cistrons;

only these - are also structure-homologous. are only able to pair at meiosis impossible. Mechanisms

What we need,

Therefore,

in the ordinary,

cistrons - and

the two chromosomes

the simplest one

sites in adjacent homologous

crosswise

reunion.

is

of (at least) one cistron.

are known in cytogenetics,

being two breaks at slightly different and subsequent

Both chromosomes

classical way. u.c.o,

is an initial duplication

for such a duplication

during meiosis,

chromosomes.

the position-homologous

chromatids

Another mechanism,

mis-

pairing due to accidental homology of short base sequences within the same or different

cistrons,

has been discussed by Black & Dixon

(1968).

sites of breakage are separated just by the length of one cistron, results

this event

in two gametes which do not contain this cistron at all, together

with two other gametes containing simplicity,

we disregard

They may be lethal.

the cistron in duplicate.

the first-mentioned

On the other hand,

gametes

- For the sake of

(with the deletion).

if a gamete with the duplication

into a diploid individual by fertilization

206

If the

- and if this individual

comes

forms

a.)

Homologous unequal crossing over i f the primary duplication i5 heterozygous

cistron:

o

~bl)

I

b

bI

c

~

m

cells

b

c

i

I

i

m

I

I

I

I

I

I

i

i

b2

o

Germ

b2

o

c

1

c

or

or

b

o

i I b1

o

b2

I

I

I

I

o

c

c I

]

I

b1

c

b.) Homologous unequal crossing over H the primary duplication is homozygous o

I I

.

I

I

b2

a

a

)

c

b1

b1

I o

c

I

b2

o

13

, ....

b~ bl o

Germ cell5

~ m

.

bn

bk+ 1

I c

b1

c

I

]

t

I

I bI

I I

I

b2

b2

c

C

i

C

Bm

b._k+~"'b m

bk+l

- ,

o

I

over, (a) if the p r i m a r y d u p l i c a t i o n is d u p l i c a t i o n is homozygous. F o r m a t i o n of in case b. Diagrams on the left show the and r e u n i o n event

j bn_k

bk

b2

I

~

I o

Fig. ] a and b. Unequal crossing heterozygous, (b) if the p r i m a r y longer alleles is only p o s s i b l e c h r o m o s o m e s b e f o r e the b r e a k a g e

b1

I

bn

l ,---m

bl

c

,

~ ,-.- b

bk

bl

gn- k Bm+k

I b~

Fig. 2. M i s p a i r i n g of h o m o l o g o u s c h r o m o s o m e s w i t h s h i f t i n g of B n versus B m by k cistrons to the left, and s u b s e q u e n t unequal c r o s s i n g over. The u p p e r d i a g r a m shows the chromosomes b e f o r e the b r e a k a g e and r e u n i o n event

germ cells - there is for the first time a risk for m i s p a i r i n g homologous Fig.]:

cistrons,

and hence,

for u.c.o.

As long as the d u p l i c a t i o n

The c o n s e q u e n c e s

remains heterozygous,

of s t r u c t u r e -

are seen in

all gametes will

c o n t a i n either one or two copies

of the cistron.

comes homozygous,

types of gametes may be formed,

lead,

on the one hand,

to gametes copies

however,

containing

other

W h e n the d u p l i c a t i o n

to gametes w i t h only one copy, three,

and in subsequent

and,

generations,

be-

u.c.o,

may

on the o t h e r hand, more than three

(Fig.2).

207

Gene amplification by u.c.o, has started. The formal aspects will be examined in the following. As usual in population genetics, the real processes will be approximated by simplified models, which are necessarily up to a certain degree arbitrary. Later on, the results will be discussed considering especially the limitations imposed by the models themselves and the special assumptions made for the calculations.

2.3. The Process of Codon Amplification by u.c.o. In the following, the process of codon amplification by u.c.o, is described for an infinite population, i.e. in a deterministic way. With regard to the basic event described above, the first gene duplication, two different models are considered alternatively: Model

la. The duplication, which might be named bb, is already present in the

population, - either together with the single cistron b, or alone. It is not formed newly from b. Model

lb. The duplication is being formed in individuals homozygous for b

with a constant frequency which is, of course, independent of the frequency of u.c.o.

Let (i)

S

m

= b

Ib2 ""

.

b

m

be a consecutive sequence of repetitions of cistron b on a chromosome. Here, th the index i in b. (i = 1,2,...,m) means that b. is the copy of b on the i 1 1 position of the sequence. Neglecting possible small structural differences between these copies which could be produced by point mutations, we regard these copies of b as identical (structure-homologous). This means that, for example, also the cistron sequence blb2...bkblb2-..bm_ k which consists of the initial part of size k of the sequence Bm = blb2"''bm (k ~ m) and the initial part of size m-k of the sequence B n = blb2...b n (m-k ~ n) is identical with B . Special cases are: B 1 = bl, the simple m

208

cistron b itself;

B 2 = blb2, a duplication

of b; B 3 = blb2b3,

of b, etc. In spite of the fact that the B alleles

sensu strictiori,

"alleles"

(m=1,2,...)

m they will be regarded

of this polymorphism.

Accordingly,

a triplication

are, of course,

in the following

the constellation

not

as the

of the

alleles B and B on homologous chromosomes will be designated as genotype m n B B . The multiplicity m of cistron b in the allele B is called the length m n m of B . m During meiosis,

the genotype B B forms - apart from the "normal" gametes m n B m and Bn - by u.c.o, the gametes Bm+ k and Bn_ k (I < = k < n) or Bm_k and

Bn+ k (I _~ k < m). In the first case, cistron bk+ I of B n and, consequently,

cistron b I of Bm is mispaired with b 2 of B m with bk+ 2 of Bn, b 3 of B m

with bk+ 3 of Bn, etc. This means a shift of Bn versus Bm by k cistrons the left (Fig.2).

In the second case,

bk+ ! of B m and, accordingly,

to

cistron b I of B n pairs with cistron

b 2 of Bn with bk+ 2 of Bm etc. This means a

shift of B

versus B by k cistrons to the right I. u.c.o, must not occur at n m the point indicated in Fig.2.1t might take place anywhere within the misof the homologous

paired section blb2...bn_ k / bk+ | bk+ 2 ... b n Obviously,

the resulting

independent

chromosomes

of the frequency of u.c.o.,

into two separate events: event of unequal

m considered

I. Mispairing

- are

[Ik[ cistron units to the left reasonable

is independent

it is conveniently

subdivided

and Bn; 2. The m The probability of the first

is assumed to depend on the "size" k of the shift B

to presume

creasing k. On the other hand,

Fig.2)

the gametes

of the alleles B

crossing over which follows.

(mispairing)

versus B

therefore,

of the exact point of u.c.o.

For a discussion

event

- and,

chromosomes

(k
that this probability

the assumption

of the length of the mispaired

- and, hence,

(k>O)~.

n It is

decreases with in-

is made that this probability section

(n-k cistron units in

of the allele lengths m and n. For the sake of simplicity

the further assumption

is made that the conditional

probability

of u.c.o,

at

a given k (shift of Bn versus Bm) is independent

not only of k (this seems to

be plausible)

section - and this means,

but of the length of the mispaired

m and n - as well. presume

The last-mentioned

assumption

that the frequency of the second event,

length of the mispaired

section.

The assumptions

is not plausible: u.c.o.,

of

One would

increases with the

may be formulated more

exactly as follows: I

The condition m ~ n in Fig. l is not essential. It is only ment to secure that the chromosome loop to the right occurs in the chromosome containing B m. In the case m < n this loop would occur either in the B m containing chromosome, or in the chromosome with Bn, depending on the value of k.

209

Model

I. The probability w k (m,n) for mispairing

with a shift of B

Pk'

provided

(2)

Here,

versus B

n

that

the values

independent

(3)

m of this

a shift

wk(m,n ) = { ~ k

of the alleles B and B m n of the size k and of a subsequent u.c.o, equals

otherwise, for -(n-l)

size

~ k ~ m-I

Pk (k = ± 1, ± 2,

of the allele

lengths

is possible:

...)

(k ~ O)

are nonnegative

m and n,

and fulfil

numbers,

which are

the conditions

P-k = Pk (k = 1,2 .... ) r Pk -<_I C (r = 1,2 .... )

(4) k=l

with fixed C
is equivalent

is necessary

for reasons of symmetry

to a shift of B

m cistron units to the right), Eq.(4)

versus B

(a shift of B

n

for the same number of

m n is meant to secure that the total

probability w(m,n)

= E wk(m,n ) k

(summation over all integers k # O)

of an u.c.o, between B

and B is smaller than a constant less than I for m n all m and n. Therefore, the genotype BmB n produces by u.c.o, with probability Pk the gametes I

1

2

Bm_ k + ~ Bn+ k

(k = 1,2,..., m-l)

if m>l, - and with the same probability Pk the gametes 1

Bm+ k + ~ Bn_ k if n>l. Besides, 1

(k = 1,2,...,

n-l)

the "normal gametes"

l

Bm + ~ B n are formed with probability

I - w(m,n)

(by "normal"

crossing over or without

any genetic recombination). In the following,

only one special

case of model

! will be considered:

Pk for k>1 be so small in comparison with P1 that it can be neglected.

Let In

this case, one may set:

2The notation means that the two types of gametes are formed with the same frequency.

210

= (5)

I T P

for

Ikl ~

o

for

Ikl > I

I

Pk

This means unit

that u.c.o,

with

shift

of B

versus B of more than one cistron n m does not occur. In this case, the total

(to the left or to the right)

probability

w(m,n)

of an u.c.o, i

w(m,n)

=

between

B

m

and B

if both m and n equal P

n

is:

I,

if only one of the numbers

m and n equals

if both m and n are greater

In addition

to model

I, the following

variant

than

I,

I .

will be considered

theoreti-

cally: Model

2. The total probability

not depend

on the allele

is possible than B

n

at all,

i.e.

I. In this case,

versus

B

m

w(m,n)

lengths

m and n. It is always

the probability

wk(m'n)

m

equal

and B

does n to P if u.c.o.

of m and n is greater

of an u.c.o,

f o r - ( m - I )=k=n-I <<

n-I

with a shift k of

the Pk

(k # O)

= I ~ I Pk + ~--k=lPk otherwise

(k= ± I, ± 2,...)

of m and n, and satisfy

.

are nonnegative

condition

too, only the special

of B

for more

B

m for

than

numbers

which

are independent

(3).

For this model, versus n wk(m,n) = O

wk(m,n)

B

is:

1

Here,

between

if at least one of the values

Ira_ P × Pk (6)

of an u.c.o,

case will be considered,

1 cistron

Ik[ > I. Obviously,

unit does not occur,

this holds

that a shift i.e.

that

true if PI > O, but

Pk = 0 for k > 1. In this case: W_l(m,n)

= wl(m,n)

W_l(m,n)

= P, wl(m,n)

W_l(m,n)

= wl(m,n)

The difference the over-all

probability B

for m=n=l = 0

1

= ~ P

between model

shift of the alleles direction

= 0

n > I,

for m > I, n > 1.

I and model

of u.c.o,

2 is that in the last-mentioned

has always

the same value,

P - whether

model, a

against each other is possible only in one n (if m = I, n > I, or m > 1, n=l) - or in both directions (if m > I, m

and B

for m=l,

n > I). In model direction only, directions. appeal,

I, the probability

of u.c.o.,

is half the same probability,

Both models are considered,

and,

if shifting is possible if shifting is possible

in one in both

because both have a certain a priori

to the best of our knowledge,

there are no empirical

reasons for

or against a decision in one or the other direction. As mentioned

above,

a very large

The f r e q u e n c i e s

sidered.

(practically

of t h e a l l e l e s

infinite)

B1, B 2 , . . .

population

in this

is con-

population

are

named Xl, x2,... , with

(7)

~-- x = 1 . m m=l

(For purely formal reasons,

{x } m in spite of the fact that only a finite number of its terms are

as infinite different

from O, i.e.: x

given initial x

m

to treat the sequence

= O for m > M with sufficiently

m

large M). For a

state

= x (I) m

(m = 1,2 .... )

of the distribution problems

it is practical

of the alleles BI,B2,...

in generation

I, the following

are interesting.

a) The alterations

of this distribution with increasing number of generations,

and b) the question,

whether

stationary

distributions

exist.

Random mating for the B

locus is assumed. First, the case will be examined m that genotype BIB I is unable to form gametes of type B 2 (models la and 2a).

In this case,

establishment

initial distribution

of a polymorphism

of the B

m

with m > I is not O. For example, distribution

x

~1)

= 1

compatible with this model.

into this population

is possible

only,

if in the

at least one of the allele frequencies

x (I)

, x(l)

m

= 0 (m * 2) is an initial m "Alleles" B2,B3,... must have come

in earlier times or from outside,

or by an unique event

which is not examined further. Later on, the case will be treated that genotype BIB ! can form - apart from the "normal" (models

gametes B 1 - also B 2 gametes with a certain probability

Ib and 2b). Here,

it is obligatory

d

that gametes are formed in the

same frequency which contain the deletion B 0 (i.e. "no b"). These gametes are assumed to be lethal. Models models

212

Ib (and 2b).

la and 2a are special cases

(with d=O) of

2.3.2. .

.

Selection

°

.

°

.

.

°

For both models, treated w i t h o u t

the p r o p e r t i e s selection

of the d i s t r i b u t i o n

(i.e. all genotypes:

B B m

BIB 1 in m o d e l s

Ib and 2b, are a s s u m e d

s e l e c t i o n will be included, assumed

to be different.

n

to have the same fitness). L a t e r on,

i.e. fitnesses

Here,

relative

of the d i f f e r e n t

Regarding

Li,

genotypes will

fitness of g e n o t y p e

B B will be m n this is f o r m a l l y e q u i v a l e n t

taken as f f (m,n=l,2,...). With r a n d o m mating, m n w i t h the a s s u m p t i o n that selection acts e x c l u s i v e l y selection;

{x } will first be m (m,n=l,2,...), e x c e p t i n g

on gametes

(gametic

1955) and f

is the (relative) fitness of the allele B . m m of fitness f from the a l l e l e length m, two models

the d e p e n d e n c e

m

will be examined: S e l e c t i o n Type

I. The f

by the

following

(m=l,2,...)

m

i.e. Bm has a selective

advantage

form a monotonically

increasing

over Bn, if m > n. This model

sequence,

is r e a l i z e d

formula: m-I

(8)

fm

=

I -

Sm,

sm

=

s 1q

(m

=

w i t h 0 < s I < 1 and O < q < l as given p a r a m e t e r s

S e l e c t i o n Type 2. There is an "optimum" the fm increase.

For m > mopt,

decrease

leads to f

positive

limit

allele

1,2 ' . " " )

of the model.

length m = mopt,

up to w h i c h

they d e c r e a s e w i t h i n c r e a s i n g m. Either,

the

= O for all m g r e a t e r than a c e r t a i n m a x i m u m allele m length mma x (selection type 2a), or, w i t h i n c r e a s i n g m, fm a p p r o a c h e s a

f

m

(selection

= 1 - s

type 2b). S e l e c t i o n

type 2a is r e a l i z e d

as follows:

(m = 1,2,...)

m

with m-I

for m -~ m

slq 1 (9)

s

m

=

sm

for m

/q2m-mopt opt

for m > m

l

w h e r e mopt,

mmax,

q2 is d e t e r m i n e d

0 < s 1 < 1 ' and 0 < ql by:

q2m-m°P t = s m (cf. Fig.5).

opt

opt

In selection

opt < m =< m

max

max

< 1 are given parameters,

= s q m°pt-I l l type 2b, fitness

for m = m

+ max

could,

whereas

1

for example,

be d e f i n e d

as

follows:

213

f

m

= I - s

with

(m = 1,2 .... ),

m

Is Sl qm-1

(10)

s

=

m

1

for m <-- mopt

(Sl - Sm

-

) qm-m°P t

f o r m > mop t

opt

(O < s 1 < I, 0 < q < I). The sequence {fm } as defined by Eq.(IO) approaches for m > mop t asymptotically

the value I - Sl~ i.e. the fitness fl of the

allele B I. It is largely arbitrary, which of the selection types 2a or 2b is used in the examinations. technical reasons

The authors decided in favour of type 2a, mainly for

(less numerical calculations).

2.3.3. Allele Distributions

2.3.3.1. Model •

. . ° . ° .

. . . . . .

Derived from the Different Models

la (Model I with the Additional Condition

. ° . . . . . . . o . . . o . ° . . o . . . . . . . . . . . . . .

. . . . .

•

. . . .

that Genotype B.Bl.is Unable to Form Gametes of Type B_) . . . .

. . . . . .

. ° ° . ° ~

. . . o ° . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . .

Table

I shows the frequencies of the genotypes B.B. l j (i,j = 1,2,...) in the population. They were calculated under the assumption of random mating (Hardy-Weinberg-Law)

from the allele frequencies x

(m = 1,2,...). The table m contains also the probabilities with which the different genotypes form the different types of gametes. They were calculated taking into account

(2) and

(5). The frequency x' of B gametes in the population is calculated as the m m sum of the products of the frequencies of the different genotypes BiB. with J the probabilities of B gametes being formed by these genotypes 3. Calculation m for m = I, for example, gives the following result:

2 X I

(from BIBI)

+ XlX 2

(from BIB2)

X I =

oo

+

j~3 x Ixj (I - ~I P)

(from BIB3, BIB4,

...)

°=

I

(11')

=

2

+ ~ x2P

(from B2B2)

I ~-- x2xj p +~j=3

(from B2B3, B2B4,

~-- XlXj + 1p j= I

x2xj LJ =2

...)

f- xlxj]° j=3

h 3This probability 214

is O, if the phenotype does not form any Bm gametes.

Table

I. Production of gametes in model

Genotype

B1B 1

BIB j

(j>=2)

la

Frequency in population 2

Gametes Type

Probability

x1

B1

1

J

2XlX j

B2

¼P

Bj_I

4P

B.

1 -

1

B.B. 1 j

2

B.

x. 1

(i>2)

1

1-1

~P

B.

2x.x.

(2~i
lj

1

i+l

~P

i

-~( l - P)

B.J

I ~-(1 - P)

B.

1-1

1 ~P

Bj+ 1

¼P

B. i+I

1 ~P

B.

1 ~P

B

B.B. i 3

P

1

j-I

Here, Eq.(7) was used. In the same way, it follows:

(II")

x'= m

Xm + ~P

For the following, B

m

~Xm+l-

Xm)-

(I-

Xl) (xm - X m _ l ~

for m > I.

the distinction has to be made whether the alleles

(m = 1,2,...) are subject to selection or not.

2.3.3.1.1. No Selection.

In this case, the gamete frequencies x' (m = 1,2,...) m are already the allele frequencies of the next generation. Therefore, the formulas

(II) permit a calculation of the allele distribution

generation from that in the parent generation.

in the daughter

Thus, they are recurrence

formulas for the allele frequencies: Starting from the distribution x

m

= x (I) m

(m = 1,2 .... )

215

of the alleles B

m

the distribution

in generation

I, it is possible

in all subsequent

generations

in principle

by repeated application

these recurrence formulas.

The sequence

ber of equations,

in every allele distribution

because

terms on the right-hand

X'm = Xm + ~ P

x ~ = x M + ½P [ - x M 1

=

of

(11) is only formally an infinite num-

side of Eq.(ll)

[(Xm+l - Xm) -

to calculate

actually occurring all

vanish for sufficiently

(1 - X l ) ( X m - Xm_l) ]

( I - Xl)(X M - XM_1) ]

large m:

(2 =<

m

<

M),

,

(i-

x' = O for m > M + I m The maximum allele length, M, increases

each generation by one, provided

x I < 1 in the first generation. The latter, however, It follows from Eq.(lO)

(12)

i.e.:

i rex' = m m=l

Z mx m m=l

to generation.

This result

In the absence of selection, Therefore,

u.c.o,

is obvious

this result can be regarded as control formulas

The next question

is: Does a stationary

exist? A stationary

x' = x m m

It follows

is invariant

even without

from

calculation:

only causes a rearrangement

of the recurrence

(13)

of b-cistrons.

for the correct derivation

(II).

distribution

distribution

of the allele length

is defined by

(m = 1 , 2 , . . . )

from Eq.(13),

length from generation

and from the steady increase of the maximum allele

to generation which has been shown above,

stationary distribution the trivial distribution

cannot have a maximum allele length

that a

(an exception

{x I = I, Xm = O for m > I} ). Therefore,

be reached exactly from any initial distribution:

from Eqs.(ll)

length is stationary equations

216

is

it can never

Only a limiting distribution

could be stationary. It follows

la

by some transformations:

the mean of the allele length m in the population

generation

that

is a condition of model

and (13) that a distribution

{x } of the allele m if and only if the allele frequencies x satisfy the m

x2 -

xI + x1

2

=

O,

(14) Xm+ 1 - x m -

These equations

(1 - X l ) ( X m

can be transformed

x 2 = Xl(1

- Xl)

Xm+ 1 = Xm(2 -

This means

Xm-1)

Xl)

-

-

Conversely,

distribution

~-" m= 1

It is verified

Xl(l

-

distributions

-

1-(1

Xl -

a stationary

correspond

Xl)

the stationary

distributions.

They

have the common property decreasing

B I is always

that

sequence:

the most frequent

In

allele

state.

The Problem of Convergence.

Of course,

the existence

tions of allele length does not necessarily distribution

converges

distribution

towards a stationary

depend,

Therefore

form a monotonically

la, and in absence of selection,

On the other hand,

this parameter being the frequency

x I of the allele B I. All these distributions

in the stationary

Xm as given by Eq.(15)

because of

exactly to all geometrical

Xl, x 2 x3,..,

).

1

distribution.

form a special class with one parameter,

the frequencies

from m to

(m = 1 , 2 . . . .

system of allele frequencies xl)m-I

(m = 1,2,...)

holds:

as can be verified by substitution.

they constitute

for example,

probability

m easily by inference

for a given x I with 0 < x I ~ I, the numbers

they form a complete

model

every frequence x

Xl ) m - 1

satisfy the Eqs.(14),

Hence,

(m = 2,3,...).

Xl)

m + 1 that the following relationship -

(m = 2 , 3 , . . . ) .

as follows:

Xm_l(1

that in a stationary

Xm = X l ( 1

0

,

can be expressed by x I alone.

(15)

=

in the course of generations one.

of stationary

from a given initial

It is conceivable

that this could

on the form of the initial distribution,

P of u.c.o.

Theoretically,

distribu-

mean that the allele length

or on the

it would even be possible

that conver-

gence does not occur at all. This problem will not be examined

in a general

way

however,

(such an examination would be complicated).

be stated:

If the allele length distribution

initial distribution stationary

{x(1)}m to a stationary

distribution

is determined

The following,

can

{x } converges from a given m distribution {Xm} , then, this

uniquely by the initial distribution.

It does not depend on P. This follows from the invariance

of the mean allele

217

%

length together with the fact that the parameter x I of the stationary distribution (and, hence, this distribution itself) is determined by this mean allele l e n g t h : E(m) = m=1

mx (I) m =

m~ m=l

=

m

(mean of the geometrical distribution).

m~l(i_~l)m-I

~m=l

I =~--

It follows for the special class of

initial distributions (16)

x (1) 2

=

1 -

x

11)

'

x (1)

=

m

0

(m m 3 ) -

(i.e. only the alleles B 1 and B 2 are present): '~ xI

(17)

1 E(m)

1 x

I1)

1

+ 2(l-x

I1))

For some of the initial distributions

2 -

(1) xI

"

(16) and for several values of the

probability P, the convergence of allele length distribution {x } has been m examined by numerical calculation (iteration using the recurrence formulas (II') and (II"~ 4. The values of x

11 )

and P which were included in these

calculations are shown in Fig.3. In all these cases, the distribution {x } m converges towards the stationary distribution determined by Eq.(17). Hence, it is reasonable to presume that {x } converges for every combination of m [ values x~ I) and P. 1

As mentioned above, the stationary limiting distribution {x } for fixed m (I) is independent of P The speed of convergence, however, is strongly x1 influenced by P. As a measure of this speed, the time (in number of generations) required for the distribution {x } to approach the limiting m distribution {~ } up to a distance of O.001 (=maximum of absolute values of m the frequency differences Xm - ~m ) was determined. As shown in Fig.3, this time increases with decreasing P, if x

I 1)

is fixed. This means: The higher P,

the faster the convergence. More exactly: It looks, as if this "convergence time" (with fixed x~l))is" inversely proportional to P as would be plausible. On the other hand, the convergence time increases with decreasing x

, if

P is fixed. I.e. it is the longer, the rarer B 1 is in the initial distribution. It can be concluded from Eq.(17)

(see also Fig.3) that in the stationary

state the allele B I is always more frequent, and accordingly, the other alleles

4The Siemens 2002 computer of the Astronomisches Recheninstitut, Heidelberg, was used, which is a loan of the Deutsche Forschungsgemeinschaft.

218

( Gen erafion

Time required to approach the stationary distribution up to a distance of O. 001

Sfationary state

I n i t i a l state

1)

p

1

(7) xI

=0.8

0.1 O. 05 O. 01 0.001

-k_

~

t)

= 0.5

0.1 0.05 0.07 O. 001

".

(7) = XI

-

...

2

2

3 ~ 5 6 7 8

0

AII~

Genera tions

88 720 890 8 830

780 355 t760 1761 0

{~ 1

370

O. 05 O. Ol

730 3 660

length

Fig. 3. Stationary distributions of allele length in model la without selection for 3 different initial states (16), each with the "approximation times" for several values of P. Some of these P values are unrealistically high. But this followed from the requirement to cover, on the one hand, a relatively broad range of P values and to avoid, on the other hand, too small P values for technical reasons (convergence too slow)

B2, B3,... are less frequent,

than in the initial distribution.

It follows -

and this is important for practical applications - that the model generate any noteworthy polymorphism of the alleles B

m

la cannot

(m=l,2,...), - unless

there is selection.

2.3.3.1.2.

Selection.

In this case, the frequency of the allele B

m

in the next

generation becomes: (18)

x" = f x v / ~ m mm

(m = 1,2,...)

Here, x' is the frequency of the B gametes as calculated from Eqs.(ll') m m

(ll").

f

m

is their fitness,

and

and

219

oo

Y=>-f n= ]

n

x' n

is the mean fitness of all gametes in the population. case of no selection,

In distinction from the

the following inequality holds true in most cases:

i mx" % ~ _ mx m= I m m=1 m

,

i.e. the mean allele length is not invariant any more from generation to generation.

The obvious explanation is that the gametes Bi_l, Bj+ 1 (or Bi+ I,

B._I )j

produced by the genotype B.B. by u.c.o, are not subject to the same i j selection conditions as the parental gametes B.i and B., j - in spite of the fact that they contain together the same number of cistrons b.

Quite as in 2.3.3.1.1.,

the question arises, whether stationary distributions

of the allele length exist. It follows from Eq.(18)

that a distribution

{x } m

of allele length is stationary if co

(19)

Xm = fmXm / ~ =

fnx'n

(m= 1,2,...)

If x * is expressed, according to Eq.(11') and (11"), by x I and x 2 (for m=1) m or by Xm_l, Xm and X + l ( f o r m > 1), t h e n , f o r e v e r y m = 1 , 2 , . . . , an e q u a t i o n i s o b t a i n e d i n which a l l

Xl,X2,...

o c c u r s i m u l t a n e o u s l y . A g e n e r a l s o l u t i o n of

these (infinitely many) equations cannot be achieved.

Therefore,

ing a s s u m p t i o n was made t h a t f

and o n l y s t a t i o n a r y

distributions from E q . ( 1 9 )

(20)

Xm+1 x m

{x } w i t h x m

that

m

m

> 0 for all

> O for all

these distributions

m=l,2,..,

the restrict-

m=l,2,.., were c o n s i d e r e d . I t f o l l o w s satisfy

the equations:

X t

-

fm+1 m+l f x' m m

and hence, by Eqs.(ll),

(m= 1,2,...)

,

also the equations:

f2x (x2qP r(x3x2) (1 Xl)(X2Xl)]) = f x2(x1 P(x2 x1+x)) 1

fm+ 1Xm (Xm+ 1+~P ~ Xm+2-Xm+ 1 )- ( l-Xl ) (Xm+ I- X m ~ ) = 1

= fmXm+l(Xm+~P~Xm+l-Xm)-(l-Xl)(Xm-Xm_l) ~ )

(m ~ 2).

In principle,

these equations make it possible to express x

for m=3,4,..,

as f u n c t i o n of X l , X 2 , P , f l , f 2 , . . . f m _ ] :

(21)

xm ~ ~m(Xl,X2,P,fl,f2,...fm_l)

(m = 3,4,...).

m

successively

Conversely, by Eq.(21) (22) Hence,

to given frequencies

x I and x2, the numbers

together w i t h x I and x 2 satisfy Eqs.(20)

x

m

> 0

x3,x4,..,

provided

calculated

that:

for all m = l , 2 ....

they also satisfy Eqs.(19),

if additionally:

oo

m=1

Thus,

= 1 .

x

(23)

m

the search for s t a t i o n a r y

if they exist at all - involves

distributions

{x } w i t h x > 0 (m=l,2,...) m m the f i n d i n g of frequencies x I and x2, for

-

w h i c h the x

(m=3,4,...) c a l c u l a t e d a c c o r d i n g to Eq.(21) together w i t h x 1 m and x 2 satisfy the conditions (22) and (23). (Instead, it can also be presupposed

that f

> 0 for 1 ~ m ~ M, f = O for m > M, and s t a t i o n a r y m m {x } are looked for w i t h x > O for 1 ~ m ~ M, x = 0 for m m m m > M. In this case, the E q s . ( 2 0 ff.) exist only for m < M or m ~ M). distributions

But for the solution of this problem, Furthermore,

even if the existence

of s t a t i o n a r y

length could be p r o v e n for a special special p r o b a b i l i t y necessarily

P, the question,

be reached

too, no p r a c t i c a b l e

w a y seems

distributions

to exist.

of the a l l e l e

s y s t e m {f } of fitness values and a m whether a stationary distribution would

from a given initial

distribution,

w o u l d r e m a i n open.

This is the same s i t u a t i o n as in the case of no selection. The l a s t - m e n t i o n e d

problem,

view, has been examined formulas

(II) and

b e i n g alone of interest

empirically

(18). Again,

for s e l e c t i o n

from a p r a c t i c a l

types

point of

1 and 2a, a p p l y i n g

we started w i t h the special

initial

distribu-

tion (16).

2.3.3.1.2.1.

S e l e c t i o n Type

m > n). For the p a r a m e t e r s s I = 0.2 w e r e chosen.

,

1 (B m Has a S e l e c t i v e A d v a n t a g e of the fitness model

over Bn if

(8), the values

q = 0.9

(m=l,2,...) d e t e r m i n e d in this way, m the c a l c u l a t i o n was carried out for xIl) = 0.95 and P = O.01, - and for

x

11) =

With the fitness values

0 (only the allele B 2 is present

Table 2 shows the mean allele

(24)

f

o

m

for both cases.

(m-E (m)

Besides,

at the beginning)

length E(m)

x

and P = 0.05.

together w i t h the standard d e v i a t i o n

m

the allele

length w i t h the h i g h e s t

frequency

is

221

Table 2. Distribution of allele length in model la for selection type s l = 0.2, q = 0.9, starting from the initial distribution (16) x 1( 1 )

•~

= 0 •9 5 ,

P = 0.01

Most frequent

= O,

P = 0.05

Most frequent

allele length m

1)

x

l with

allele length

Frequency

E(m)

o

m

m

Frequency

E(m) m

1

1

O.9500

05O

0.2179

2

1.0000

2 000

0

20

1

0.9227

078

0.2699

2

0.4872

2 176

0.8615

40

1

O.8797

123

0.3361

3

0.2807

2 654

1.2306

6O

1

0.8168

192

0.4160

3

0.2435

3 438

1.5738

80

1

O.7293

296

0.5121

4

O.2081

4 5O7

1.8678

100

1

0.6155

451

0.6271

6

0.1886

5 751

2.O859

140

2

0.4006

999

0.9184

8

O.1685

8 315

2.3805

180

3

0.3164

2 992

1.2347

11

0.1515

I0 7O9

2.6O40

220

4

0.2732

4.364

1.4654

13

0.1427

12.880

2.7929

15

0.1345

14.844

2.9600

260

6

0.2440

5.867

1.6211

300

7

0.2299

7.358

1.7406

16

0.1276

16.628

3.1124

400

11

0.2010

10.758

1.9547

20

0.1158

20.469

3.4511

5OO

13

0.1857

13.637

2.1113

23

0.1062

23.658

3.7501

60O

16

0.1791

16.093

2.2412

26

0.O995

26.387

4.0224

700

18

0.1705

18.226

2.3557

28

0.0931

28.773

4.2748

8O0

2O

0.1632

20.111

2.4597

90O

22

0.1552

21.799

2.5562

1000

23

0.1515

23.330

2.6468

1100

25

0.1447

24.731

2.7326

1200

26

0.1423

26.023

2.8145

1300

27

0.1387

27.223

2.8931

1400

28

O.1350

28.343

2.9687

1500

29

0.1317

29.394

3.0418

tabulated.

In both cases,

the tendency of the most frequent

allele length as

well as of the mean allele length shows that the bulk of the distribution shifts constantly generations

towards higher allele length with increasing number of

(cf. also Fig.4).

frequent allele tendency permits

length

The calculation was broken off when the most

(and also E(m)) became about 30, but the observed

the conclusion

not tend to a stationary

that the distribution

distribution

also the standard deviation ~ means that the distribution

of allele length does

in either case. Not only E(m), but

increases with the number of generations. This m of allele length becomes more and more extended.

Generation

Fig. 4 Propagation of the distribution of allele length with time in model la with P = O.01, x(l) = 0.95 for selection type llwith s 1 = 0.2, q=O.9

I00

200

300

500

1000

5

I0

• '

IS

"

2 0

"

Allele

However,

the increase of ~

2.~

m 3o-limit of the distribution It is reasonable

is slower than that of E(m). Therefore, increases

constantly.

holds true generally

and the probability P of u.c.o.

for selection For a definite

however,

further calculations

Besides,

the results do not permit any conclusions

I, independ-

conclusion,

m

as to the exact way in

depends on the parameter values.

Selection Type 2a (=There is an Optimum Allele Le_ng!h ~. A great

number of combinations examined completely.

mop t = IO,

of model parameters

Therefore,

parameters mopt, mmax,

is possible.

it was decided

mma x = I00, ql = 0.9

They cannot be

to work with fixed values of the

and ql of the fitness model

This leaves free the parameter were chosen.

type

with other parameter values would be needed.

which the growth rate of E(m) and o

2.3.3.1.2.2.

the lower

to presume that this tendency which was found for special

values of the parameters ently of x

" " 3 d

length

(19):

.

Sl, for which the values O.1, 0.2, and 0.4

Fig. 5 shows the fitness values for these parameter values

depending on the allele length m. The calculation

of the allele length

223

o.5

,o

20

ao

mopt

~o

so

6b

~

Bb

9b

Fig.5. Relative fitness, length, m, for s e l e c t i o n 3 v a l u e s of s I

~oo

--

t

L~ngth of ellele ( m )

mmax f , of a l l e l e B (m=1,2,...) in d e p e n d e n c e on a l l e l e m . m type 2a w l t h m = I0, m = 100, ql = 0.9 and opt max

Frequency 1,0

$1 =

0.1

$1 :

0.2

SI =

O.l,

05

5

10

15

5

10

/5

5

10

75

Length of allele Fig.6. S t a t i o n a r y d i s t r i b u t i o n of allele length in model s e l e c t i o n type 2a w i t h mop t = 10, mma x = I00, ql = 0.9

224

la w i t h P = O.O!

for

S

=

0.1

S

=

o.2

S

=

o.4

Frequency

1.o

o5

.....

r..k I0

5

.., 15

5

I0

l~

Length o f

5

10

15

allele

Fig. 7. Stationary distribution of allele length in model la with P = 10-4 for selection type 2a with mop t = 10, mma x = 1OO, ql = 0.09

distribution {x } was carried out for all combinations of one of the three m

values of s I with one of the five values each of P:

5xlO - 3 ,

and 10 - 2 "

t h e two v a I u e s cases. tions than

For the parameter

x

10-4 , 5xiO -4,

of the initial

0 . 9 5 and 0 . 0 5 w e r e c h o s e n .

Altogether,

10-3 ,

distribution

(16),

t h e s e w e r e 30 d i f f e r e n t

I n a l l o f t h e m , {x } t e n d e d t o a s t a t i o n a r y distribution. The c a l c u l a m w e r e b r o k e n o f f when t h e a l l e l e f r e q u e n c i e s x d i d n o t c h a n g e b y more m

10 . 8

from one g e n e r a t i o n

P and S l ,

the stationary

tion (of x

I')

to the next.

distribution

is

For all

combinations

independent

of the initial

), but depends on s 1 and P (Tables 3-5, Figs.

there is a difference from case 2.3.3.1.1.

of values

of

distribu-

6 and 7). Hence,

(model la, no selection)

in which

the stationary distribution depends on the initial distribution, but not on P. In all cases,

the most frequent allele length (the mode)

is the same, and is

225

Table 3. Stationary distribution of allele length in model la for selection type 2a with mop t = IO, mma x I00, s 1 0. I, q 0.9 =

m

=

=

P

10 - 4

5xlO - 4

10 - 3

4

5xlO - 3

10 - 2

.

O.0OO0

5

O,OOOO

6

0.0002

0.OOOO

0.0004

0.OO18

7

O.0OOO

0.OOO0

O.0001

0.0037

0.0098

8

0.OO01

0.O012

0.0037

0.0263

0.0421

9

O.O106

0.0445

0.0722

0.1270

O.O131

10

0.9562

0.8082

0.6753

0.3496

0.2630

11

0.0325

0.1345

0.2114

0.2802

0.2467

12

0.0005

0.0111

0.0336

0.14'08

0.1634

13

0.0000

0,0006

0.0035

0.0522

0.0853

0.0000

0.0003

0.0153

0,0370

0.0000

0.0037

0.0137

16

0.0008

0,0044

17

0.0001

0.0013

18

0.0000

0.0003

14 15

19

0.0001

20

0.0000

E(m)

10.023

10.112

10.210

10.612

10.832

identical with the "optimum" allele length mop t = 10. However, with fixed Sl, the frequency of this allele length increases, and the distribution concentrates more tightly around this allele length, with decreasing P. A similar concentration is seen when P is fixed, and s I is enhanced. Generally, the mean allele length E(m) in the stationary state is somewhat larger than the mode m . This is a consequence of the skewness of distribution which is opt very obvious from Tables 3 - 5 (see also Figs. 6 and 7), and results from the asymmetry of fitness (see Fig.5). Apart from the shape of the stationary distribution, the rate of convergence of {Xm } towards this distribution, hand,

the shape of the initial

to influence

distribution

the convergence rate

the distribution

{x } s t a r t i n g m

0 . 0 5 B1

too, depends on s; and P. On the other

itself.

(i.e. Instead,

the value if

of x

11) ) s e e m s

s I and P a r e f i x e d ,

from +

0 . 9 5 B2

reaches every state between initial and final distribution by a nearly

not

Table 4. Stationary distribution of allele length in model t y p e 2a w i t h mop t = 10, mmax = 1OO, s 1 = 0 . 2 , ql = 0 . 9 m

la for selection

P

10 - 4

5xlO - 4

10 - 3

5xlO - 3

10 - 2

4 5

O.OOOO

6 7

O.OOOO

0.0003

O.OO00

O.0OOO

0.0008

0.0029

0.0010

0,0109

0.0218

8

0.0000

0.0003

9

0,0052

0.0233

0.0411

0.0964

0.1110

I0

0.9743

0.8806

0.7827

0.4382

0.3223

11

0.0203

0.0910

0.1581

0.2982

0.2833

12

0.0002

0.0046

0.0159

0.1159

0.1598

13

0.0000

0.0002

0.0011

0.0317

0.0675

0.0000

0.0001

0.0067

0.0229

0.0000

14

"

15

0.0011

0.0065

16

0.0002

0.0016

17

0.0000

0.0003

18

0.0001

19

0.0000

20 E(m)

10.016

10.077

constant number of generations 0.95 B I

10.150

I0.538

10.777

earlier than the distribution which starts from +

0.05 B 2 .

The extent of this "lead" depends on s I and P too. Fig.8 shows the time (in generations),

after which the maximum deviation of the frequencies

from the stationary

frequencies

"time of convergence" creasing

x 1) . When s 1 and x

d e p e n d e n c e on P i s n o t c o n s t a n t convergence

time increases

however,

decreases

ties

it

to fit

convergence the definite with

shift

increasing

in its

Fig.8

fixed,

on t h e o t h e r

direction.

First,

P. A f t e r

represents

to the points

sl, and increases with inhand,

having

reached

o n l y one o f d i f f e r e n t

rendered

o f t h e maxima t o w a r d s h i g h e r of Sl,

seems to be r e a l .

by t h e p a i r e d

its reason is not evident.

P-values,

the

as e x p e c t e d ,

which is

This behaviour

the

a maximum, possibili-

values

(PoIynomials of degree 4 in log P were chosen.)

value

time is unexpected;

are held

with decreasing

again.

smooth curves time.

m to 10 -3. With constant P, this

has reduced

decreases with increasing

I

x

o f P and

However, observed

of convergence

Table 5. ~tationary distribution of allele length in model la for selection t y p e 2a w i t h mop t = 10, mmax = 100, s 1 = 0 . 4 , q l = 0 . 9 m

P

10-4

IO

5xlO -4

10-3

5xlO -3

10-2

O.O000

O.0OOO

0.O000

O.O000

0.0001

0.0005

O.OO00

0.0001

0.0002

0.0033

0.0081

0.0024

O.0111

0.0206

0.0614

0.0794

0.9846

0.9262

0.8600

0.5450

0.3984

0.0129

0.0606

0.1116

0.2898

O.3109

2

0.0001

0.0020

0.0072

0.0820

0.1418

3

0.0000

0.0000

0.0003

O.O158

0.0463

0.0000

0.0023

O.0117

5

0.0003

0.0024

6

O.OO00

0.0004

4

7

0.0001

8

0.0000

9 20 E(m)

10.O11

10.O53

10.106

10.444

10.698

The results may be summarized as follows: Model la in connection with selection type 2a, and with additional, biologically meaningful assumptions about the probability P of u.c.o, leads to stationary distributions of the allele length, representing a polymorphism of a few alleles B

m

with moderate

lengths.

Now, the (unrealistic) assumption will be abandoned that at the beginning of the process alleles B

with m > I are already present in the population. m stead, the following initial distribution will be used: (25)

x I1)

= 1,

x(l)

= 0

for

In-

m > 1.

m

Besides, the existence of an event is assumed, by which genotype B I B 1 ,

apart

from gametes Bl, occasionally produces gametes carrying the duplication B 2.

No of generations. 5000

4000

~000

x l [I)

d

Model i

1 l

lO -5

t

IO 4

0.95

..t (6

000

o to -6 -I0 -5 10.4 -0 0 _

,'I "I

10"5 10.5 10-4 0 ] 0

~0.95 ~0.05

h

o

0.05 I I I 0.95 ~05

'A I

0

Ib i

2000

I

1o .6

lo

Ib la

Ib la

t 10 .4

5xlO "~

10 - 3

5xlO-3

I0 *2

---~ p

Fig. 8. Dependence of "convergence time" (=number of generations required to approach the stationary distribution up to a distance O.001) on P for models la and lb and selection type 2a with m _~ = lO, mma x = lOO, q! = 0.9, and some paired values of s I and xI]) (model ? ~ or s I and d (model lb)

This "initial

event" has been discussed above; here,

its special nature

will not be considered. In this case, (26)

Here,

~

the distribution

B ° + (l - d) B 1 + ~

d is the probability

In order to s i m p l i f y

of gametes B2 .

of duplication,

the model,

for genotype BIB l would be:

B o

B

o gametes are

is the deletion of cistron b. a s s u m e d to be l e t h a l .

In this

229

case, the frequency x v of B gametes in the population can be derived from m m Table ! after modification according to Eq.(26) analogous to model la:

, _

Xl

1 1 - -~-dx12

{ x 1 + ~IP ( x 2 _ x l

+

Xl2-) _ d x ~ }

(27') x I + x 21) -

P(x 2 =

dx~(2

- Xl)

X l +

2 - dx~

, _

,-gq l

{x 2 + =

PE(x3-x2)-(I-xl)(x2-xl) ]

(27")

+ dx~(1+x 2)

= x2 +

)

2 - dx~

x' m

1 1

2

{x m

1 + 7P~(Xm+l

-

Xm)-(1-Xl)(Xm-Xm-l~}

=

1 ---~-dx I (27"') p ~ X m + 1_Xm)_ ( l_Xl ) (Xm_Xm_ i~ =

+ d X l2X m

(m $ 3).

+

X

m

2 - dx2|

Further analysis will be carried out separately for the cases without selection and with selection against the alleles B

2.3.3.2.1. No Selection.

m

(m=l,2,...).

Here, the frequencies

already the allele frequencies

of gametes x' (m=l,2,...) are m in the next generation, i.e. formulas (27) are

again recurrence formulas for the distribution of allele length, {x }. They m starting from the initial distribution (25), to calculate

permit in principle,

{x } successively for every subsequent generation. m Unlike 2.3.3.1.1.,

the mean allele length E(m) is not invariant from one

generation to the next: Eq.(12)

(28)

E'(m) =

mx' m m=l

i.e.

E(m)

occur).

increases,

is now replaced by

] 1

mx 2

1 - z-kdXl

unless

x I equals

m=l

0

m

E(m) I

2

'

1 - zi--dXl

(meaning that no transitions BI+B 2

Again,

the next problem to be examined

distributions. (m=l,2,...)

From Eq.(27)

is the question for stationary

it can be seen that the allele frequencies

form a stationary

x

m if and only if they satisfy the

distribution,

equations

(29)

2 1- ~ X l ) X 1 ,

x2

= (1-(1-26)x

x3

= (2-Xl-6X~)X2-(l-(1-~)Xl)X

1

2 Xm+ 1 = ( 2 - X l - ~ X l ) X m - ( l - X l ) X m _ Here,

~ = d/P. On the other hand,

x I must be 0 in the stationary

wise E(m) would increase according have only the trivial

( m => 3).

1

to Eq.(28).

state,

With x I = O, however,

other-

Eqs.(29)

solution

x I = x 2 = x 3 = ... = O , which,

in turn, represents

therefore,

no distribution.

a stationary distribution

In the case of no selection,

of the allele length does not exist.

the contrary,

the allele length distribution

higher allele

lengths with increasing

2.3.3.2.2.

Selection

allele frequencies frequencies in 2.3.3.1.2

generation number.

lb. Quite as shown above for model

in the next generation

However,

usefulness.

Furthermore, allele

la, the

can be derived from the gamete

selection

they would lead still much less to any result of Therefore,

the problem will again be treated numerically.

type 2a (one fitness optimum;

length) will exclusively be examined:

results with selection shorter ones) convergence

still more towards

x' (m=l,2,...) using formula (18). The general considerations m about the existence of stationary distributions could be made

analogously. practical

in Model

is expanding

On

type

of a maximum from the

I (general advantage of longer alleles

in the case d = 0 (model

to a stationary

presence

It can be concluded

over

la) that for d > O, a forteriori,

distribution

can be expected

no

for this selection

type. Starting with the initial distribution was calculated

for the same numerical

(25), model

Ib with selection

values of the parameters

type 2a

mopt, mmax,

ql' Sl' and P as had been used in the case d = O (model

la). Each of the 15

paired values of s I and P were combined with the values

10 -4 , 10 -5, and 10 -6

of d. The results were as follows: In each of the 45 cases mentioned, verges

to a stationary

and coincides

distribution.

the allele length distribution This distribution

exactly with the stationary

distribution

{x } conm is independent of d,

for the same

231

Generation

d

=

0

d

=

10

-5

d

=

10 -z'

5 lO length of allele

15

lO0

500

1000

stationary distribution

5

I0

15

5

I0

15

Fig. 9. Convergence of the allele length distribution to the stationary state for model la/lb and selection type 2a with P = 10-3 , mop t = 10, mma x = 1OO, s 1 = 0.2, ql = 0.9, and 3 values of d. I.Comparison of the distribution at the same times

(s1,P)-combination in the case d = O. Only the speed of convergence depends besides of s] and P = on d, too. In Fig.8, generations,

the "convergence time" (= time in

after which the m a x i m u m deviation of the allele frequencies x

from their stationary values has become

m 10-3 ) is also given for the 45 cases

d > O. With fixed values for s 1 and P, the convergence time increases with decreasing d. This result is plausible.

Besides,

the convergence time in each

of the 3 cases d > O is longer than for the two comparable cases d=O. At the first glance, however,

232

this result seems to be a little surprising.

It can be explained,

if the different starting positions of the two distributions are

d= O ,xlh)=O,95

d=lO-S,xlh) =I

d=lO ~, x;1) =1

325

233

2OO

Genen ~ 425

Genec 333

Gener. 500

Gener. 725

GeneK 633

Gener 1225

Gener. 1133

I00

t~

G en er

~. Genett tO00

r-

sfotio nary

.

5

tO

rd i s t r i bution

4FL,.

15

S

I0

15

5 tO length of UUele

15

Fig. 10. C o n v e r g e n c e of the allele length d i s t r i b u t i o n to the s t a t i o n a r y state for m o d e l la/Ib and s e l e c t i o n type 2a w i t h P = 10-3, mop t = IO, mma x = 1OO, s l = 0.2, ql = 0.9, and 3 values of d. I I . C o m p a r i s o n of the d i s t r i b u t i o n s at tzmes shifted to c o m p e n s a t e for the d i f f e r e n t s t a r t i n g c o n d i t i o n s (see text for details)

taken into account:

In the case x

lead over the d i s t r i b u t i o n production demonstrate Here,

of B 2 gametes

> O, the d i s t r i b u t i o n

in the case x

from g e n o t y p e

{x } has a c e r t a i n m = O, w i t h w h i c h the a d d i t i o n a l

BIB l cannot pull up. Figs.9

and

IO

this for the special case of s I = 0.2, P = 10 -3 more exactly.

the a l t e r a t i o n

of the d i s t r i b u t i o n w i t h time is shown side by side for

the cases d = O w i t h x distributions l)OOO).

~1)

~1)

= 0.95,

are c o m p a r e d

The d i s t r i b u t i o n

d = 10 -5, and d = 10 -4. In Fig.

in the same g e n e r a t i o n

for d = O has a c e r t a i n

(l,

9 the three

100, 200, 500 and

lead as c o m p a r e d w i t h d > O.

233

This lead diminishes distributions

gradually,

but even at generation

have not pulled up (furthermore,

d = 10 -4 has a lead compared with d = 10-5). Therefore, distributions

generation numbers are shifted against each other.

time t d + I (generations)

is quite different:

and has almost disappeared

for d = 0 are given a

Now,

The shift t d

for d = IO -4. With this

the distributions

to d = O. This lead, however,

after about 500 generations.

convergence

time for d > 0 (cf. Fig.8),

the unequal

starting conditions.

Model 2 (Alternative Assumption

which has been mentioned

for d > 0 at the

starting positions.)

for d = |0 -5 and 133 generations

the situation

but the

The shift t d (depending

(The distributions

d > 0 have a certain lead as compared

2.3.3.3.

in Fig. 10 the same

that the distribution

in order to secure identical

is 225 generations

for

agrees as exactly as possible with the distribution

for d = O in the Ist generation.

"handicap"

the other

are not compared at the same number of generations

on d) has been chosen in such a way,

"handicap"

1,000,

also the distribution

is, indeed,

Hence,

for is small,

the longer

only a consequence

about P)= Now,

the alternative

in 2.3.1. will be examined briefly.

of

model,

Only the case

d = O (model 2a) will be considered. The probabilities

of the different

the different

genotypes,

probabilities

for the genotypes

types of gametes,

which are produced by

can again be taken from Table BIB j (j=2,3,...).

I excepting

These genotypes

the have now

the following gamete distribution:

(30)

I I-P) Bj + ½P B2 + ½P Bj_ I ( l - P ) B I + ~(

The frequency x' of the B m

Table

gametes

in the population

can be taken from

m

I after modification

according

to Eq.(30)

analogously

to 2.3.3.1

and

2.3.3.2: ,

I

(31')

xI

(31")

x 2' = x 2 + 1p[(I+xl) (x3-x2)-(l-Xl)(X2-2Xl) ]

(31"')

X'm = Xm + ~PE(l+Xl)(Xm+l-Xm)-(1-Xl)(Xm-Xm_l)]

=

xI

+

~PE(1+xl)x 2

-

2(1-Xl)Xl]

,

I

The structure of formulas ing formulas

(m >= 3)

(31) is similar to the structure

(11) for model

la, with the one difference

of the correspond-

that not only the

case m = I, but also the case m = 2 must be treated separately.

234

In the following, the case will be examined that the alleles B (m=],2,...) m are not subject to selection. Then, the gamete frequencies x' are already the m allele frequencies in the next generation. Again the question comes up, whether there are stationary distributions of the allele length. It follows from Eq.(31) that a distribution (x } of allele length is stationary, m if the frequencies x satisfy the equations: m (l+Xl)X 2 -

(32)

2(1-Xl)X

1 = 0 ,

(l+Xl)(X3-X2)-(1-Xl)(X2-2Xl)

= 0

(l+Xl)(Xm+l-Xm)-(1-Xl)(Xm-Xm_l)

, = 0

It was shown above that the correspondingEqs.(14) solutions

if and only

in model la have the

(15). In the same way, it can be demonstrated that the complete set

of solutions of Eqs.(32) is given by:

(33)

O -~ x I -~ 1

arbitrary,

x m = 2 X l \ 1 + Xl ]

(m=2,3,...)

For O < x I < I, the numbers Xl,X2,... frequencies

calculated from Eq.(33) are, indeed,

(i.e. they are non-negative,

and their sum is I). Hence, they

form a stationary distribution of allele length. The distribution (33) is almost a geometrical distribution with the parameter 2Xl/(l+Xl): only the frequency of B 1 is decreased as compared with its corresponding value in a geometrical distribution. Accordingly,

the frequencies

of the other alleles are increased by a constant factor, because the sum of frequencies must be equal to I. Again, the question, whether the distribution of allele length converges from a given initial distribution towards a stationary distribution cannot be answered generally. As in case 2.3.3.1.1., only examination of special cases can help to solve this problem. If there is convergence, the stationary distribution does not depend on P, but is determined uniquely by the initial distribution. Quite as in 2.3.3.1.1.,

this follows from the invariance of

the mean allele length E(m). For the special initial distribution

(16), the

stationary frequency x; of B 1 results as:

(34)

+~/ ~xl = -(l-x}l)) v (l-xll))

2 + 1 .

It follows from Eqs.(33) and (34) that B 1 is the most frequent allele in the

235

stationary state. Besides, and, correspondingly,

the stationary frequency of B 1 is always higher

that of B 2 is lower than the initial frequency.

Model 2a has these properties

in common with model

la.

3. Discussion

In the following, we shall discuss some of the formal and biological consequences of the examined models.

3.1. The First Event. Establishment of Duplications in the Population As mentioned above,

the event which leads to the first cistron duplication

has to be treated separately from u.c.o., as some other mechanism is required for its occurrence. cannot occur any more, B 2. In this case,

la, it was assumed that this first event

the first duplication must have been introduced into the

population from outside, population itself

In model

i.e. that the genotype BIB I is unable to form gametes

or it must have occured as an unique event in the

(this means a probability of almost 0), and must have found

a way to establish itself. In the model rate

Ib, it was assumed that gametes B 2 are formed at a recurrent

(= with a small, but not negligible probability)

At the first glance, ever,

from genotype BIB I.

only the second model seems to be realistic.

This, how-

is true only if one sticks to the deterministic model and to the

assumption of random mating. With regard to such very rare events, however, the limitations of this model become obvious. The calculations show that with these limitations random mating),

a duplication needs a long time in order to reach a fre-

quency at which u.c.o,

can occur at a non-negligible rate (Fig. lO). This

remains true even with an (unrealistically) event

(deterministic model and

(cf. 2.3.3.2.),

high probability d of the first

and with a selective advantage.

The reason is that

two rare events are required: I. The duplication must occur, and 2. the duplication must become homozygous,

before u.c.o,

allele containing more than two copies of the cistron

can lead to any

(allele Bm; m > 2).

The latter event could occur by chance in a random mating population. sidering the breeding structure of higher organisms, however,

Con-

it is much

more likely that it occurs population

group,

in a consanguineous

within a short time.

If the group starts growing

for the process of gene amplification This is the reason why the model

As expected,

in number,

a continuous

difference

"mutation"

generation

pressure),

With selection,

however,

BIB I homozygotes

does not affect

together with a selective advantage

3.2. The Consequences (Critical Evaluation The calculations homologous

of B 2 alleles

of u.c.o.,

For the final results of

and a realistic

selection model

it is unimportant,

whether

the

if There is No Selection

does not exceed one cistron length.

This limitation

It seems to be plausible

is

that Pk

of u.c.o.) will decrease with increasing distance of the cistrons

The chromatid

have deformed cistrons

loops

(cf. Figs.

I and 2) will be longer,

and the

distance k that the random mutations

so much that mispairing

is made impossible,

the distance between cistrons within an allele will be the larger, the time which has elapsed since these cistrons have originated However,

there is no reason for the assumption

O with k > I. Therefore, would change,

however,

of allele

because the longer

from a common

that Pk will become

the question has to be discussed,

if this oversimplification

is still lacking; variability

at all.

(Fig. 10). This

of the Models)

probability will increase with increasing

ancestor.

from

or never.

an oversimplification:

(probability involved:

to

were limited to the case that the shift of structure

cistrons

undoubtedly

from generation

is affected very little

results of this study:

seldom,

la (in ab-

the form of the final distribution

(assumption of an optimal allele length), primary event occurs

in model

the mean allele length remains constant.

is scattered more and more).

The rate at which it is approached

u.c.o,

con-

in the absence of

it turns out that the production

is one of the interesting

of the

pressure BI+B 2 leads to a

the maximum allele length increases

(the distribution

Comparison

of their long-time

can only be observed

(very slow) increase of the mean allele length, whereas sence of the "mutation" In both models

in a small

the conditions

la has also been examined.

for the allele frequencies

selection.

Besides,

are given.

two models has shown that an appreciable sequences

mating.

the duplication may attain a relatively high frequency

how the results

would be abandoned.

it may be presumed with caution

This discussion

that the

length will generally be increased compared with the

simplified model.

237

Which conclusions can be drawn from these models, if there is no selection? (Models la, Ib and 2). As mentioned above,

all the models

(la, Ib, 2) lead to an increase of the

maximum allele length from generation to generation, and, correspondly, also to an increase of the variance of the allele length which, however, diminishes more and more. On the other hand, in each model, B I will finally be the most frequent allele whether it has this property already in the initial state or not. In model la and 2a, the mean allele length remains constant from one generation to the next, whereas in model Ib (formation of additional allele B 2 from BIB|) the mean is growing gradually (presumably, this is true also for model 2b). In models la and 2a, the allele length distribution converges to a stationary distribution, at least if it has started from the special state where only B 2 exists apart from B|. In model

la, the stationary fre-

quencies of the alleles BI,B2,... form a geometric sequence (with a factor less than I). In model 2a, this is true only from B 2 onwards, but in this model, too, the complete sequence of the stationary allele frequencies is monotonously decreasing. In model Ib, on the other hand, a stationary distribution is not reached even after extremely many generations. Alteration of the allele frequencies, however, diminishes gradually, and the allele length distribution takes a shape similar to that found in model la: the system of allele frequencies forms a monotonically decreasing sequence. For all practical purposes, this means that without selection, the variability remains relatively small, even if longer and longer alleles are formed at a small frequency. The alleles with one or a few copies of the cistron are always more frequent than the alleles containing many copies. Probably, elimination of the simplifying assumption that mispairing can only lead to a shift of one cistron length would not change this picture appreciably; possibly, it would enhance the range of variability.

3.3. Selection Models and Their Consequences The assumption seems to be plausible that the alleles formed as consequence of u.c.o, are subject to different selection pressures.

In our models, the

possibility has not been considered that the probability P of u.c.o, is itself influenced by selection. Concerning selective advantages and disadvantages of the alleles, different assumptions were made: Selection type I (2.3.2., 2.3.3.1.2.1.) can be described as "the longer, the better". Here, the bulk of the distribution shifts constantly towards higher allele length. No stationary distribution is reached. Not only the mean, 238

but the standard deviation,

as well,

increases with the number of generations

albeit not quite as much as the mean. becomes more and more widespread. generalized beyond

It looks,

as if these results

one would expect long stretches

together with an appreciable

amount of variability

clustering of individual values around the mean. ization experiments with Gamma globulin

could be

(Delovitch & Baglioni,

and a certain

First results

of hybrid-

from protein sequence data

1973; see also Storb,

"the more,

of duplicated

loci for the labile chain seem to

confirm the great number of loci predicted

the principle

of allele length

the special cases examined.

With this type of selection, cistrons

The distribution

1972). Also on a priori reasons,

the better" would be reasonable

in this case,

if

the number of these cistrons defines

the versability

of the antibody response.

Indirect

intraindividual

variability

evidence

for an appreciable

response could be derived from many clinical man.

It can be hoped that hybridization

termine this range of variability

and serelogical

experiments

ness increases,

directly.

whereas above this value,

is O; and a second one,

it decreases

two different possibilities

in which the fitness decreases

up to which the fit-

again.

in which it approaches

Besides,

For the exact

were considered:

up to a certain maximum value,

greater than O. For technical reasons, examined more thoroughly.

in

will soon allow to de-

In selection type 2, an optimum allele has been assumed,

manner of this decrease,

of the immune

observations

asymptotically

One,

above which it to a limit

only the first possibility was

the differences

of the two models

are

relatively unimportant. This mode of selection was examined together with models and

Ib

(2.3.3.2.2.).

examinations

As mentioned

above,

|a (2.3.3.1.2.2.)

the most remarkable

was that the difference between these two models

i.e. that a "mutation"

pressure of new B 2 alleles

influence at all on the limiting distribution,

result of our is so small;

from BIB I zygotes has no

and so little influence on

the rate of approach to this distribution. In both models,

the allele distributions

tend towards

which the optimum allele is more frequent

stationary

than all the others,

frequency of other alleles decreases with increasing difference numbers

from the optimum state.

creasing probability different

alleles

of u.c.o.,

(Sl)

and with decreasing

(cf. Fig. ll). The stationary

of the initial distribution. of Crow & Kimura

The range of variability

Qualitatively,

(1970) who theoretically

states,

in

and the of cistron

increases with in-

fitness difference between distribution

is independent

this result is the same as that

derived a normal distribution

as

S I =0. I

I co

I S = 0.2

o io ~

5xlo -~

lo -3

5xlo -3

lo - 2

p

Fig. 11. Standard deviation of allele length in the stationary state for model |a and selection type 2a with mop t 10, m = 100, ql = 0 . 9 , a n d 3 values

of

s1

max

(continuous) approximation for the stationary distribution in their model (cf. 1.3), with the variance

(35)

V =

u~ k

"2c

where u is the probability of the allele B

changing over to something else m 2 (i.e. essentially the probability of u.c.o.), o k the variance of the

distribution of shift number k, and c a measure of the'~trength" of selection (comparable with sl)o The practical consequence of these results is the following: If we find in a special case a certain sequence of structure homologous cistrons for which u.c.o, is possible, and if the allele length in this case clusters around a certain value, then, the assumption is justified that this value represents an optimum allele length with maximum fitness. If the range of variability

240

is small,

this can either mean large differences

of similar length, indicates Again,

or a small probability P of u.c.o. A large variability

either small fitness differences,

it may asked,

assumption

in which direction

or relatively high value of P.

abandoning

of the simplifying

that mispairing with a shift of more than one cistron length does

not occur would influence

the result.

what the standard deviation not only the neighbouring u.c.o.

in fitness between alleles

We presume

in the stationary

that it would enhance some-

state:

In the general model,

but all alleles are involved

If the effect of this additional

involvement

in the process of

could be compared with

an increase of P, this would mean that the standard deviation o

would be m increases with P. This would be in agreement with the in-

increased, as ~ N ference which may be drawn from Eq.(35):

The variance V is proportional

Ok, and o k will increase if shift numbers ever,

greater

than

to

] can also occur. How-

this problem needs further examination.

Looking around for an example, globin polymorphism populations

one would be inclined to think of the hapto-

in man. Here, Hp 2 is the most frequent allele in many

(for ref. cf. Giblett,

]969).

In our notation,

it would re-

present very nearly the allele B 2. The Johnson type alleles, hand, which represent

on the other

type B 3 (or other alleles of still higher order),

to be very rare. This would point towards a selective advantage Hp 2 (possibly only under certain conditions, allele in some other populations), all higher alleles B

m

seem

of allele

as Hp I is the most frequent

and towards a definite disadvantage

of

(m=3,4,...).

3.4. Selection Relaxation Modern civilization

has led to a sharp decrease of mortality

the age groups before and during the reproductive

period.

especially

One consequence

the relaxation of selection especially by agents which were responsible former times for most of this early mortality. important,

how much a (sudden)

Therefore,

in

the question

in is

selection relaxation would influence a system

which had been built up before by u.c.o,

under the influence of selection,

for example the type 2a discussed

above. An answer can be derived from an

application

case of model

of the selection-free

tion of the selection phase as initial tion converges

is

at all from this initial

state.

la on the stationary distribu-

If the allele length distribu-

state - this is not yet proven but

may be presumed - it tends to a geometrical

distribution which is completely

determined by the mean allele length in the stationary

state of the selection

241

phase.

This follows from the fact that the mean does not alter if no

selection works. More precisely,

the parameter

limiting distribution

is the reciprocal

mean is approximately

equal to the optimal

x I of the geometrical

of the mean.

In our special case,

allele length. Hence,

bution would become dispersed more and more in both directions formerly optimal allele length;

the

the distri-

from the

the maximum cistron number would increase,

more and more alleles with fewer and fewer cistrons would also be formed, including B 1 - up to the point when the geometrical For the optimal allele length approximately

10, e.g.,

distribution

is reached.

the final frequency of B I would be

10%, and those of B2,B3,...

approximately

allele B20 would still have a frequency of nearly 0.5%. When the optimal allele length increases,

9%, 8.1% etc. The

1.4%, and B30 of nearly

the final frequency of B I in

the relaxation phase will become lower but, at the same time,

the decrease of

the other allele frequencies with increasing allele length will become slower:

the final state will more and more approach an uniform distribution.

In the haptoglobin

case mentioned

relaxed many generations a higher Hp

1

frequency,

before would be characterized,

than populations

which selection

is still at work.

of their significance factors,

loci, selection

relaxation

for infection protection.

who have only very few such cistrons, On the other hand,

Here,

likely because

in absence of other together with maintenance

to more and more individuals

and will possibly suffer from immune

It is not unreasonable

the number of autoimmune

These diseases are more or less neutral they do not influence

middle and higher age,

in

it would also lead to many persons with an

extremely high number of such cistrons.

because

is especially

spreading of the allele length distribution

that this could enhance

on the one side, by

with the same initial gene frequencies,

of a high mean would lead, on the one hand,

defects.

in which selection had

and, on the other hand, by a higher frequency of

Johnson alleles

For the Gamma globulin

above, populations

diseases

to suspect

and allergies.

from the point of view of selection,

reproduction

very much. 5 they can be a real danger .

But especially

in

5According to Schull & Neel (1965), frequency of allergic diseases seems to be decreased in children from consanguineous marriages. This would favour the hypothesis that very many different genes concerned with antibody production enhance the danger of allergies.

242

3.5. Problems Which Go beyond the Examined Models: For Example Mutations In our discussions, complicate

we have deliberately

neglected

some other factors which

the situation when the models are applied to concrete

The most important of these problems If a point mutation,

is the occurrence

of random mutations.

for example a single base substitution,

which is only present

once,

situations.

hits a cistron

this will have one of three different

con-

sequences: I. The function degeneracy

is not impaired,

either because

of the code, or because

the change occurred within

the amino acid substitution

the

is inert

functionally. 2. The function is altered

in a way that does not impair the vitality of the

individual very much. 3. The function

is vitally altered or completely

destroyed;

the individual

is seriously debilitated. The third possibility

is a frequent one, because

the destroyed

function cannot

be replaced. If there are many cistrons possibility

for a certain function,

no.3 will hardly ever occur: Even a completely

be substituted

functionally by other cistrons.

globulin cistrons mentioned

above:

would lead to a slight decrease

Inactivation

of relaxed

inert cistron can

To return to the immune of some of them by mutation

in the spectrum of different

which the organism is able to produce antibodies. conditions

on the other hand,

selection,

Especially

antigens

against

under the

this would not have any dramatic

consequences;

the decline would occur very slowly. How the effects of random mutation with those of u.c.o.,

and especially,

their combined effects,

how selection relaxation

- this problem remains

3.6. Possible Protective Mechanisms

combine

influences

to be examined.

against u.e.o.

The Highly Redundant DNA Parts The DNA of higher organisms redundancy. nucleolus

Conspicuous

organizer

including man contains

of high

among them - and relatively well examined - are the

regions,

forming hybridization

sequences

which produce rRNA. Bross & Krone

experiments,

have about 416 rRNA producing

have estimated

cistrons.

(1972) per-

that normal human beings

These cistrons which are located

243

close to each other within the nucleolus organizer regions should be exposed to relatively frequent u.c.o., because they are obviously structure homologous, but not position homologous. On the one hand it is difficult to figure out a mechanism by which selection

should maintain a very specific optimum

number of these cistrons. Furthermore,

the system is expected to be relatively

insensitive towards inactivation by random mutation, for reasons explained above (the function of all these cistrons is identical). In conclusion, one would expect an enormous amount of variability for these cistrons. From the results of the above-mentioned authors, it appears, as if there would, indeed, be some variability. The individual values, however, seem to be clustered around a mean. The variability seems to be smaller than expected. We do not deny that a plausible selection model could be constructed for explanation. However, another possibility has to be considered: Does the organism have a special mechanism for protection against u.c.o.? In this connection,

it is interesting to remember that normal crossing over does not

occur at random either, but is subject to certain restrictions. It has been shown repeatedly in different organisms that constitutive heterochromatin does not take part in crossing over. According to Natarajan & Gropp (1971), the following pattern of meiotic behavior is observed: "a) The heterochromatic segments remain condensed in all stages of meiotic prophase, in contrast to euchromatin, b) they pair homologously,

similar to euchromatin,

till pachytene, c) they separate in early diplotene, because of the absence of any chiasma in these region". According to these authors, the same behavior was observed by others in the plants Salvia and Plantago, in

Drosophila,

and

in the tomato. Evidence is accumulating that nucleolus organizer regions coding for rRNA are located in these regions (Hedgehogs: Natarajan & Gropp, 1971; Microtus agrestis: Natarajan & Sharma,

1971). Natarajan & Gropp (1971) -

following a suggestion of Yunis & Yasmineh (1971) - discuss the possibility that the lack of chiasmata may be a mechanism to protect vital genes from crossing over and mutation. In our opinion, the protection from u.c.o, could be still more important.

3.7. Problems for Further Research By our theoretical investigations, a number of additional problems are suggested: What is the actual probability P of u.c.o.? On which conditions does it depend? In which way is it influenced by the distance between the structure homologous,

but not position homologous cistrons? How much is it decreased by smaller differences between these cistrons which are produced by random mutation? How does selection work to build up long series of neighbouring structurehomologous, but not position-homologous cistrons? Is u.c.o, the only mechanism involved here, or are other mechanisms also possible, for example, multiple replication cycles of restricted DNA areas during GI? Keyl (1966) discovered that two subspecies

of Chironomus thummi showed marked

differences in the DNA content only of certain chromosome bands which can most easily explained in this way. How large is the interindividual variability of allele length for systems which contain many redundant copies of identical (or very similar) cistrons? Knowledge on this variability could supply information on P, if s] (selection) is known, or on Sl, if P is known. The work mentioned above using hybridization technics seems to be promising. Which are the consequences of selection relaxation for these systems; especially, if not only u.c.o., but also the effects of random mutation are considered? How do special protective mechanisms interfere with u.c.o.?

References

Black,J.A., Dixon,G.H.

(1968). Amino acid sequence of the alpha chains of

human haptoglobins and their possible relation to the immunoglobin light chains. Nature 218, 736 Bridges,C.B.

(1936). The bar "gene", a duplication. Sci. 83, 210

Britten,R.J., Kohne,D.E. Bross,K., Krone,W.

(1968). Repeated sequences in DNA. Sci. 161, 529

(1972). On the number of ribosomal RNA genes in man.

Humangenetik 14, 137 Crow, J.F., Kimura,M.

(1970). An introduction to population genetics theory.

New York-Evanstone-London: Dayhoff,M.O.

Harper & Row

(1972). Atlas of protein sequence and structure, Vol.5. Silver

Spring, Maryland: Nat. Biomed. Res.Found. Delovitch,T.L., Baglioni,C.

(]973). Estimation of light-chain gene reite-

ration of mouse immunoglobin by DNA-RNA hybridization. Proc.Nat.Acad. Sci. Wash. 70, 173 Dixon,G.H.

(;966). Mechanisms of protein evolution. Essays Biochem. 2, ]48

Fisher,R.A.

(1922). On the dominance ratio. Proc. Roy. Soc.Edinburgh 42, 321

Giblett,E.R.

(1969). Genetic markers in human blood. Oxford-Edinburgh:

Blackwell Haldane,J.B.S. Harris,H.

(1932). The causes of evolution. London: Longmans Green

(1970). The principles of human biochemical genetics. Amsterdam-

London: North Holland Hilschmann,N., Kayne,M.,

Barnikol,H.U.,

Hess,M., Langer,B., Ponstingl,H.,

Suter,L., Watanabe,S.

(1969).

Structure and formation of anti-

bodies. In: Current problems in immunology H.E.Bock, E.Grundmann, KeyI,H.G.

(Bayer-Symposium I), O.Westphal,

Eds., p.69. Berlin-Heidelberg-New

(1966). Increase of DNA in chromosomes.

Vol.l, C.D.Darlington,

Steinmetz-

York: Springer

In: Chromosomes today,

K.R.Lewis, Eds., p.99. London: Oliver & Boyd

Kohne,D.E.

(1970). Evolution of higher-organism DNA. Q.Rev. Biophys. 3, 327

Lewis,E.B.

(1951). Pseudoallelism and gene evolution.

Symp. Quant. Biol. Li,C,C.

16, 159

(1955). Population genetics.

May,H.G.

(1917).

Cold Spring Harbor

Chicago: Univ. of Chicago Press

Selection for higher and lower facet numbers in the bar-eyed

race of Drosophila and the appearance of reverse mutations.

Biol.Bull.

33,

361 Mayo,O.

(1970). The role of duplications

Metz,C.W.

Amer.Natur. Nance,W.E.

in evolution. Heredity 25, 543

(1947). Duplication of chromosome parts as a factor in evolution. 81, 81

(1963). Genetic control of hemoglobin synthesis.

Natarajan,A.T.,

Gropp,A.

chromatic segments in hedgehogs. Natarajan,A.T.,

Sci. 141, 123

(1971). The meiotic behavior of autosomal hetero-

Sharma,R.P.

(1971).

Chromosoma (Berl.)

35, 143

Initiated uridine induced chromosome

aberrations in relation to heterochromatin and nuclear organization in

Microtus agrestis L. Chromosoma (Berl.) 34, 168 Nei,M., Kojima,K.-I.,

Schaffer,H.E.

(1967). Frequency changes of new

inversions in populations under mutation-selection

equilibria. Genet. 57,

741 Ohno,S.

(1970). Evolution by gene duplication.

Berlin-Heidelberg-New

York:

Springer Ohno,S.

(1972). Origin, maintenance and significance of genetic polymorphism.

In: The biological significance of the histocompatibility on a colloquium held at Titisee (Schwarzwald), E.GHnther, E.Albert, F.Kueppers, Schull,W.J., Neel,J.V.

14-15, 1971,

K.Bender, Eds., Humangenetik

14, 173

(1965). The effect of inbreeding on japanese children.

New York: Harper & Row

246

October

antigens. Report

Smithies,O.

(1964). Chromosomal rearrangements and protein structure. Cold

Spring Harbor Symp.Quant. Biol. 29, 309 Smithies,O., Connell,G.E., Dixon,G.H.

(1962). Chromosomal rearrangements and

the evolution of haptoglobin genes. Nature 196, 232 Spofford,J.B.

(1969). Heterosis and the evolution of duplications. Amer.Natur.

103, 407 Stephens,S.G.

(1951). Possible significance of duplication in evolution.

Adv. Genet. 4, 247 Storb,U.

(1972). Quantitation of immunoglobin genes by nucleid acid hybrid-

ization with RNA from myeloma and spleen microsomes. J.Immunol. Sturtevant,A.H.

108, 755

(1925). The effects of unequal crossing over at the bar locus

in drosophila. Genet.

IO, 117

Sturtevant,A.H., Morgan,T.H.

(1923). Reverse mutation of the bar gene correlated

with crossing over. Sci. 57, 746

Drosophila. Biol. Bull. 26, infra-bar in Drosophila. Amero

Tice,S.C.

(1914). A new sex-linked character in

Wright,S.

(1929). The dominance of bar over

221

Natur. 63, 1034 Yunis,J.J., Yasmineh,W.G.

(1971). Heterochromatin,

satellite DNA, and cell

function. Sci. 174, 1200 Zeleny,C.

(1919). A change in the bar gene of drosophila involving further

decrease in facet number and increase in dominance. J.Gen.Physiol. Zeleny,C.

(1921). The direction and frequency of mutation in the bar-eye

series of multiple allelomorphs of Zeleny,C.

2, 69

Drosophila.

J.Exp.Zool. 34, 203

(1922). The effect of selection for eye facet number in the white

bar-eye race of

Drosophila melanogaster.

Genet. 7, 1

Prof. Dr. F. Vogel Institut fHr Anthropologie und Humangenetik der Universitgt D-6900 Heidelberg MSnchhofstr. 15a Federal Republic of Germany

247

Population genetics of unequal crossing over

Recommend Documents