281
Genetica 100: 281–294, 1997. c 1997 Kluwer Academic Publishers. Printed in the Netherlands.
Population genetics models of transposable elements John F.Y. Brookfield & Richard M. Badge Department of Genetics, University of Nottingham, Queens Medical Centre, Nottingham NG7 2UH, UK Accepted 22 April 1997
Key words: copy number, Drosophila, inbreeding, population genetics, transposable elements
Abstract The control of transposable element copy number is of considerable theoretical and empirical interest. Under simple models, copy numbers may increase without limit. Mechanisms that can prevent such an increase include those in which the effect of selection increases with copy number, those in which the rate of transposition decreases with copy number, and those where unlimited increase in copy number is prevented by the consequences of functional heterogeneity in the transposable element family. Finite population sizes may attenuate the power of natural selection to act on transposable element copy number in a number of ways that may be of particular importance in laboratory populations. First, a small host population size will create occasional periods in which the variance between individuals in copy number is diminished, and with it the power of natural selection, even when the expected variance is Poisson. Second, small population sizes will produce high-frequency transposable element sites, systematically reducing the variance in copy number. The consequences will be particularly profound when the selective damage of transposable elements follows from their heterozygosity, as when ectopic exchange limits copy number. Introduction There are millions of species of organisms on earth, and each shows significant genetic variation. In each species or strain, vast numbers of experiments could be carried out to investigate development, genetics, physiology, ecology, and behaviour. In the face of this effectively infinite number of potential experiments for biological scientists to perform, the only rational course is to choose some organisms, to investigate them in detail, and then to assume that the features found will hold true in others. This is the idea behind experimental model systems in biology. Almost always, the grounds for inference from such model systems are not generalised principles of scientific induction, but homology, whereby properties of the model system are held in common by many species because all have retained features found in their common ancestor. Thus, one can be perfectly confident that lions (Panthera leo) will have introns in their genes and that the new Escherichia coli strain 0157 will not, despite neither having been tested, not because of any specific adaptive argument,
but because all the closest relatives of these species that have been examined have shown these characteristics. Population geneticists describe and model the variability present in populations. Their main justification is that, in the Darwinian framework, evolution always proceeds through changes in the frequencies of genotypes. They use model experimental systems as much as do other biologists. However, their process of inference from these systems is made more complex by the fact that the genetic variation existing within a species (and the evolutionary changes that it is undergoing) comprises a collection of alleles at each locus that are not normally individually homologous to corresponding alleles in other species. However, while a particular set of alleles may be unique to a given species, this may not be true of the forces determining the levels and types of variability that are seen. Thus, in describing variability within populations, population geneticists are constantly aware of the need to distill from this the information that can form a basis for inference to other species. This inference usually takes the form of mathematical models. Such models are riddled with
282 assumptions of constancy for parameters that might be expected to change with time, and therefore between species, such as mutation rates, effective population sizes and selection coefficients. The result is that their predictions are inevitably wrong in detail when applied to individual species. Notwithstanding such inaccuracy, these models continue to play a useful role in the process of inference. Transposable genetic elements present an interesting new form of genetic variability that we need to understand. Their variation is, however, analogous to the other forms of genetic variation that have formed the focus of mainstream population genetics. An example is the variability between individuals within populations in the positions occupied by transposable elements in their chromosomes. Using neutral models described by a parameter , determined by the product of transposition rate and population size (Charlesworth & Charlesworth, 1983; Langley, Brookfield & Kaplan, 1983), a reasonable fit can be made to data derived from in situ hybridisations to polytene chromosomes of Drosophila melanogaster (Montgomery & Langley, 1983; Bi´emont et al., 1994; Aulard et al., 1995). Variation between individuals in the positions of the transposable elements is very high, consistent with values considerably greater than one.
Mechanisms of copy number control Population genetics can also consider the forces determining transposable element copy number. Most view transposable genetic elements as parasitic or selfish DNA, for which an increase in copy number through replicative transposition is balanced by selection operating against individuals with copy numbers above average (Orgel & Crick, 1980; Doolittle & Sapienza, 1980). A transposition-selection balance of this kind is not, however, formally analogous to a mutationselection equilibrium from single-locus populations genetics. In the latter, the rate of gain of harmful mutant alleles is weakly negatively dependent upon the number already present at a locus, whereas the rate of loss through selection is positively dependent upon the number that exist. Hence the equilibrium is stable. For a transposition-selection balance, both the rate of gain of new copies by transposition and the rate of loss of copies by selection are positively copy-number dependent. This means that the conditions for an equilibrium are more restrictive (Brookfield, 1982). Charlesworth and Charlesworth (1983)
and Charlesworth (1985) showed that the change in mean copy number per generation is given by
n = Vn @ @lnnw + n(un
)
(1)
Here n is the mean copy number, un is the rate of transposition in an individual with n copies, Vn is the variance in copy number, w is the mean population fitness, and is the rate of deletion. The consequences of (1) are central to this work. The first term quantifies the impact of selection. This depends linearly on the variance in copy number, reflecting the expected attenuation in the effectiveness of selection resulting from inbreeding, reducing, as it does, all kinds of genetic variability. When site frequencies are low and there is no linkage disequilibrium between chromosomal sites occupied by transposable elements, then this variance is Poisson, and equal to the mean n. This is usually seen in wild populations of D. melanogaster. The formula has a further outcome. It can be combined with a simple model of fitness variation with copy number, where each extra copy lowers fitness by the same proportional amount, and also with a Poisson copy number variance. If so, the result is that the @ ln w=@ n term becomes independent of n. The consequence is that if the rate of transposition, un , is also independent of n, then the rate of increase of n will be linearly dependent upon n. If an element family can invade the host, it will increase copy number exponentially, reducing host fitness to zero. What are the possible mechanisms through which copy number can be controlled and this increase without limit be prevented? Such mechanisms can be grouped into three. The first is when @ ln w=@ n decreases with increasing n. It has been suggested this situation could arise through ectopic recombination between transposable element copies (Langley et al., 1988; Charlesworth & Langley, 1989). If a transposable element is present on only one of the homologous chromosomes at meiosis, as is normal in D. melanogaster, pairing may occur between the transposable element and a member of the same family located elsewhere. Recombination within the transposable element sequence would generate aneuploid gametes. This idea has considerable theoretical appeal. The involvement of two copies in the process leads to an expectation that the resulting reduction in fitness may increase with the square of the copy number. This would stabilise copy number in the face of replicative transposition occurring at a constant rate per copy. Also, in organisms with high heterozygosity for trans-
283 posable element sites, such as D. melanogaster, one would expect the maintenance of large numbers of sequence families, each at relatively low copy numbers, as are seen. In mammals, which show low heterozygosity for the position of their interspersed repetitive DNAs, the interspersed repetitive fraction of the genome can be dominated by a small number of sequence families, as with the LINE-1 and Alu sequences of humans (Deininger et al., 1992). An experimental prediction is that the density of transposable elements in regions of low recombination should be higher at equilibrium than in regions of high recombination. There is some weak evidence in favour of this, such as an elevation of the abundance of elements in rare inversions (Eanes, Wesley & Charlesworth, 1992). However, a major survey of the relationship between transposable element abundance and recombination rate in D. melanogaster failed to find the negative relationship expected (Hoogland & Bi´emont, 1996). A relationship between fitness and copy number that is of the right form for stability could, of course, arise for reasons that have nothing to do with ectopic recombination, for example arising from epistatic interactions between insertion mutations. A second possibility allowing stable maintenance of copy number is if the transposition rate per copy drops with increasing copy number to allow, in other words, un to drop with n. Notwithstanding the fact that a transposable element family showing such dependence may allow its host population to survive when one failing to do so would not, no advantage accrues to a mutant element showing such restraint if it shares a genome with others that do not. The evolution of self-restraint is thus expected only under very restricted conditions (Charlesworth & Langley, 1986). However, there will clearly be host functions that are required for transposition, and if the number of elements in the genome and thus the rates of transposition are high, these functions may become limiting, and the per-copy transposition rate may drop. Particularly interesting in this context is the possibility that multimers of host factors required for transposition may be required to generate transposition complexes, and that, in conditions of host factor limitation, total transposition may drop because few complete complexes are formed. There may thereby be generated a high copy number/low transposition state. However, with most families of Drosophila elements undergoing much less than one transposition per genome during the full complement of germ line cell divisions, it is unlikely that
transposition normally occurs often enough for such host factor limitation to be relevant. A third mechanism, frequently seen for class II transposable elements, which move via DNA intermediates, is control being produced through increasing numbers of non-autonomous elements in a functionally heterogeneous family. Brookfield (1991, 1996) considered theoretically this situation and showed that, if it is transposition itself that generates the selective cost of transposable elements, and if a stable balance is created between transposition and selection (and sometimes when it is not), non-autonomous elements can usually invade and will replace the autonomous forms.
The influence of finite population size The effective population size determines the impact of transposition and selection on the equilibrium frequency spectrum of transposable element sites (Charlesworth & Charlesworth, 1983; Langley, Brookfield & Kaplan, 1983). For a given transposition rate, the mean frequency of sites increases through drift as the population size drops. The high variability in transposable element positions in D. melanogaster implies that the reciprocal of the effective population size is less than the combined effects of transposition and selection. Few have considered the effect of finite population size on copy number control. Kaplan, Darden and Langley (1985) considered heterogeneous families of transposable elements in a neutral model in which autonomous copies supply transposase that is used by both themselves and non-autonomous copies. Their conclusions were that at stationarity the proportion of complete elements would be low, and the time to extinction of the element family may be short in small population sizes. A more detailed and specific model for the Drosophila P transposable element family, incorporating finite population size and also heterogeneity between copies, has been produced by Quesneville and Anxolab´eh`ere (in press). Finite population size can reduce the variance in copy number in two ways. If population size is small, the variance between individuals in a given generation may, by chance, be much less than expected. This reduces the variance in fitness, and thus the power of selection, and causes a rise in mean copy number. A series of chance rises could result in the reduction of fitness to a very low level. In small populations, furthermore, the frequencies of transposable element
284 sites will be high, and the expected variance in copy number will drop relative to the Poisson variance.
The copy number in the ith individual is symbolised by Ci . Transposition
Materials and methods Using simulation, we consider the quantitative impact of small host population sizes upon the control of copy number. In models 1 and 2 we consider a family of transposable elements that increase their copy number by transposition, but which are subject to selection at the level of the hosts which, following Brookfield (1991), we model as dependent upon the total transposition. Model 2 differs from model 1 by including the impact of population size upon the expected copy number variance. In model 3 we hypothesise that selection arises through ectopic exchange and increases with the number of heterozygous sites in the genome. Brookfield (1991) produced a model for heterogeneous sequence families in which the rate of transposition was dependent upon an interaction between transposase and DNA. It was supposed that the rate-limiting step in transposition was the binding of a transposable element-encoded transposase protein to a transposable element. This was hypothesised to occur at a rate dependent upon the product of the concentration of unbound protein and unbound DNA. A quadratic equation gives the proportion of transposase protein bound to the target DNA at equilibrium, and this, coupled with the assumption that unbinding of transposase protein is accompanied by transposition, predicts the transposition rate. The model thus could be extended to non-autonomous elements that produce no protein but that titrate the proteins from autonomous elements. Here, while protein-DNA binding is retained, element heterogeneity is not, and the amount of transposase protein increases linearly with the number of element copies. A simpler model of transposition is introduced for model 3.
Model 1: Finite population size but Poisson variance in copy number Sampling In each generation the transposable element copy number in each of N haploid individuals is sampled from a Poisson distribution whose mean is given by the mean copy number in the gametes of the previous generation.
Following Brookfield (1991), it is assumed that the concentration of elements in the cell is given by the copy number Ci , and, because all elements are autonomous, the concentration of transposase molecules is also given by Ci , which is the number of elements encoding them. If, at any time, a proportion b of the transposase is bound to its DNA targets, the rate of binding is kCi2 1 b 2 , since Ci 1 b represents both the concentration of unbound DNAs and of unbound proteins, and k is a rate constant. The rate of unbinding is bCi =h, where h is a half-life of the interaction. This unbinding of transposase and target is accompanied by replicative transposition, which therefore also has this rate. By equating the two rates of binding and unbinding, a quadratic equation gives the proportion of protein bound to DNA at equilibrium as
(
b
)
(
= (1 + 2khCi
p+ 1
)
)
4hkCi =2hkCi
And the rate of replicative transposition, which occurs at unbinding, is bCi =h
p+
= (1 + 2khCi
1
)
4hkCi =2h2 k
(2)
For the ith individual this is symbolised by Ti . In our simulations, k h 1.
= =
Fitness and inheritance Following Brookfield (1991), the fitness of the ith individual is given by wi
= exp(
s:Ti2
)
(3)
In an infinite population size, the dependence of fitness on the square of the transposition rate results in a stable equilibrium between transposition and selection. The mean population fitness, w , is the mean of wi across the N individuals in the population in this generation, i.e., it is
XN i=1
wi =N
(4)
The mean copy number in the gametes of this generation is thus
285
XN ( i + i) i ( i=1
C
)
T w = N:w
Poisson sampling from a distribution with this mean gives the next generation. If the mean population fitness drops below 10 6 , the death of the population is said to have occurred. If the mean copy number drops to zero, the extinction of the element is said to have occurred.
All simulations were started with a mean number of one element per individual. For each of the three values of s used (s 0.1, s 0.2, and s 0.3), the equilibrium copy number in a large population was estimated by allowing ten populations of N 5000 to evolve for a hundred generations, and then finding the mean of their final values. For smaller populations, between 100 to 1000 replicates were performed, each lasting 500 generations, and the times when populations died or elements became extinct were observed. At the end of 500 generations, the mean copy number was calculated for all populations that still possess the element. For each set of conditions the mean population death rate and the mean element extinction rate were calculated. The haploid population size, N, was lowered until one or other of these rates rose to around one event every thousand generations or above.
=
Transposition, fitness, and inheritance The number of occupied sites in the ith individual is Ci , as above. The level of transposition in the individual, Ti , the fitness of this individual, wi , and the population mean fitness, w , are given by (2), (3), and (4) above. The impact of these events is felt in the spectrum of gametic frequencies, pj , at the 100 sites. Consider the jth site. The number of individuals in the population that have the element at this site can be called Pj . (Pj is, of course, binomially sampled from N with a mean of pj .) Let us first consider these Pj individuals who have the site occupied. Their contribution to pj in the next generation will be
Simulation results
=
or absence of a transposable element at the jth site is determined by randomly sampling the gametes, such that the probability that the site is occupied is given by pj in the previous generation. The probabilities of the presence of elements at each of the sites are independent, indicating linkage equilibrium.
= =
Model 2 This differs from model one solely in that the transposable elements now occupy specific genomic sites (100 in all), each of which has a population frequency in each of the generations. All sites are in linkage equilibrium in the gametes. Transposition is replicative and copies a transposable element to a site unoccupied in that individual. The frequencies of transposable elements at each of the sites are affected by drift, transposition, selection, and a low frequency excision process. Sampling At the start of each generation, each of the 100 sites has a frequency of occupation in the gametes from the previous generation. The gametic frequency at the jth site is called pj . In the ith individual the presence
XP ( j
i=1
1
) ( )
wi = w:N
(5)
Because shows the rate of loss of elements by deletion from their chromosomal sites, (1 ) is the proportion that remain. With 0, fixation of elements at a given site becomes an absorbing barrier, and the mean copy number must inexorably increase in a finite population. In all our simulations was fixed at 0.001. Transposition copies elements from the sites occupied in a given individual to new sites that are unoccupied. The effect of transposition on pj is thus felt in the N Pj individuals that lack the jth site. This is given by
=
X
N Pj i=1
((
Ti wi = 100
) )
Ci w:N
(6)
Thus, in the ith individual Ti new insertions are created in a typical gamete as a result of transposition. These are distributed evenly amongst the (100 Ci) unoccupied sites. In (6) as in (5) the contributions of the ith individual to pj are weighted by wi =w. The value for pj used for the next generation is thus the sum of (5) and (6).
286 Simulations
Fitness and inheritance
The populations were initiated with a mean copy number of one element per individual, with this comprising two sites each having pj of 0.5, and the 98 other sites having pj of zero. As in model 1, k h 1, and for each of the s values initially 10 replicates of populations of size 5000 were allowed to evolve for 100 generations. At the end of this time, the mean copy number was noted for each population, the copy number variance, and the ratio of variance to mean. The fact that individual sites show non-infinitesimal frequencies results in this ratio having an expected value less than the unity found in a Poisson distribution. As above, for each s value, decreasing values of N were tried, until the rate of death of the populations became high. For populations with high rates of death, only 100 rather than 500 generations were simulated.
Selection depends upon the opportunities for ectopic pairing at meiosis. We assume that this occurs only at heterozygous sites, and thus the number of ectopic recombination events will depend upon the number of these. Thus, if H is the number of heterozygous sites in an individual, there are H H 1 =2 pairings possible between these sites. We assume that the expected number of ectopic recombinations is given by cH H 1 =2, where c is a constant, and that the probability of r ectopic recombination events, conditional upon H , P rjH , is Poisson distributed with this mean. Such an ectopic pairing is assumed to result in the production of two unbalanced chromosomes, the presence of either of which always results in an inviable gamete. It is only through the production of such gametes that fitness is reduced. Suppose that there is a single ectopic recombination event. This involves two heterozygous sites. We assume, for simplicity, that all heterozygous sites are on different chromosomes. A single event will thus create two inviable chromosomes. However, since these are heterozygous, each has a 50% chance of being passed on. A viable gamete requires all chromosomes to be viable, so, with two chromosomes destroyed, fertility will be one quarter. Generally, if x chromosomes are destroyed, fertility will be 1/2x . Chromosome destruction requires the presence of the transposable element, and thus the probability that the element-bearing homologue at a heterozygous site will be included in a gamete, conditional upon x chromosomes being destroyed in the diploid, is H x =2H . The 2 is because the site is heterozygous, and the factor H x =H results from the requirement that the particular site considered is not among the x that are destroyed. Although every recombination event destroys two chromosomes, we assume that, because recombination leaves the transposable element itself intact, multiple events may involve the same site. Consider when the number of unequal recombination events, r, is two. For a total of two chromosomes destroyed (x 2), the second event must involve the same two sites as the first. The probability of this is 2= H H 1 . For x 3, the second recombination event must involve one of the two sites involved in the first event, and one of the other (H 2) sites. The probability of this is 4 H 2=H H 1 . For x 4, the second event must involve two sites not involved in the first recombination, occurring with probability H 2 H 3 = H H 1 .
= =
Model 3: Ectopic recombination This model differs from models 1 and 2 in that selection is not the result of transposition but rather arises entirely from the harmful effects of ectopic recombination, the frequency of which is determined by the number of heterozygous sites of transposable elements. The model is thus diploid. Sampling Each of the one hundred sites has a frequency of occupation in the gametes, with the jth site having frequency pj . The genotype of the ith individual at the jth site, which could be homozygous for the presence of an element, homozygous for its absence, or heterozygous, is determined by sampling two alleles. Hardy-Weinberg equilibrium and linkage equilibrium are assumed, and all sites are assumed to be autosomal. Transposition The transposition model is simplified, with each individual having a copy number Ci expressed per haploid genome and equal to the number of sites homozygous for transposable element’s presence plus half the number of heterozygous sites. The number of new sites generated is the copy number multiplied by a constant t.
(
(
)
)
(
(
)
)
(
)
=
( (
=
(
(
)(
)( (
)( (
))
))
))
=
287 Generally, the probability of x chromosomes destroyed by r recombination events, P xjr is given by a series in which:
( )
P (xjr) =
P (x
j
2r
1)(H + 2
x)(H + 1
x) + P (x 1jr 1)(H H (H 1)
Thus, we combine the dependence of r on H , and of x on r and H , to calculate the fitnesses and inheritance of heterozygous sites dependent upon r, H and x. The fitness of an individual with H heterozygous sites, wH , is given by
1 X =
r=0
(j
P rH
1 X )
x=0
(j)
P x r :2
x
The proportion of the gametes of such an individual who inherit an element at a given heterozygous site, which we call IH , is
= (wH H )
1
X1 r=0
(j
P rH
1 X )
x=0
( j )(
)
P x r : H x :2
x
( +1)
In the ith individual, with H heterozygous sites, and Ci transposable element copies per haploid genome, the contribution to pj for a site will be: For a homozygous element site:
(1
)
( )
x):2(x
j
1) + P (x r
1):x(x
1)
a frequency of 0.2. Populations occasionally showed rapid increases in copy number. In models 1 and 2, such increases are associated with very marked reductions in the mean fitness. Here, however, the mean copy number can, under certain circumstances, increase until transposable elements are fixed at all available sites. Such fixations result in all sites being homozygous, and the resulting population fitness is unity. Thus, the test for an unrestricted increase in transposable element copy number here did not involve mean fitness, but rather the mean copy number. If mean copy number exceeded 40, an ‘invasion’ was said to have occurred. With the rate of transposition, t, set at 0.1, and of 0.001, the recombination constant c was varied from 0.01 to 0.05. To speed the analysis, the largest population sizes tried here were diploid populations of N 200. N was then, for each set of conditions, systematically lowered until many populations exhibited ‘invasion’ of the elements to high numbers within 500 generations of simulation.
=
:wH = N:w
Results
For a heterozygous site:
(wH (IH + (1
+1
)( )(
IH tCi 100
Ci
) 1 ))=(N:w)
(The wH IH term is the contribution to pj by inheritance of the element-bearing chromosome at the heterozygous site, while the wH 1 IH ) term is the contribution to pj of new transpositions into the empty homologue.) For a site homozygous for the absence of the element:
(
( )(
wH : tCi 100
Ci
) 1=(N:w)
The value for pj in the gametes for the jth site is the sum of these contributions across the N diploid individuals in the population. Simulations Populations were started with a mean of one element per haploid individual, but now with five sites each with
Table 1 shows simulation results for the model in which population size is small but the variance in copy number between individuals was Poisson. There are occasional occurrences of a rapid rise in copy number. These arise because of chance reductions in the copy number variance between individuals. This creates a high copy number in all the offspring, and if such an effect persists for more than one or two generations, the fitness drops enormously (below 10 6 of full fitness, which is taken here as signifying extinction of the host). Thus, the row ‘D .. ’ shows the mean number of times that the host population dies per thousand generations. The probability of these death events is very strongly dependent upon population size, and also on the strength of selection. This is shown in Figure 1. When selection is strong (s 0.3), the equilibrium copy number is low, and the probability of a rise in copy number through random loss of copy number variance is less. As selection gets weaker, the equilibrium copy number increases, and host population deaths arise fre-
=
=
288 Table 1. The impact of small population sizes on the rates of death of the host and extinction of the transposable element when copy number variance is Poisson. M is the mean copy number, n the number of replicates, and D and E are respectively the rates of population death and element extinction per 1000 generations Population size (N )
s = 0.1
s = 0.2
s = 0.3
5000
M = 7.49 (n = 10) D = 0.000 E = 0.000 M = 7.65 (n = 100) D = 0.000 E = 0.000 M = 7.73 (n = 500) D = 0.000 E = 0.000 M = 7.80 (n = 500) D = 0.048 E = 0.000 M = 7.99 (n = 500) D = 0.541 E = 0.000 M = 8.44 (n = 500) D = 1.801 E = 0.000 M = 8.91 (n = 1000) D = 4.648 E = 0.000
M = 3.64 (n = 10) D = 0.000 E = 0.000 M = 3.68 (n = 100) D = 0.000 E = 0.000 M = 3.71 (n = 500) D = 0.000 E = 0.000 M = 3.73 (n = 500) D = 0.000 E = 0.000 M = 3.71(n = 500) D = 0.000 E = 0.000 M = 3.82 (n = 500) D = 0.000 E = 0.000 M = 3.78 (n = 500) D = 0.000 E = 0.000 M = 3.78 (n = 500) D = 0.000 E = 0.000 M = 3.82 (n = 500) D = 0.016 E = 0.000 M = 3.80 (n = 500) D = 0.078 E = 0.000 M = 3.99 (n = 1000) D = 0.298 E = 0.024 M = 4.12 (n = 1000) D = 0.618 E = 0.015 M = 4.12 (n = 1000) D = 1.234 E = 0.034 M = 4.07 (n = 1000) D = 2.397 E = 0.120 M = 4.30 (n = 1000) D = 3.980 E = 0.127
M = 2.34 (n = 10) D = 0.000 E = 0.000 M = 2.36 (n = 100) D = 0.000 E = 0.000 M = 2.38 (n = 500) D = 0.000 E = 0.000 M = 2.38 (n = 500) D = 0.000 E = 0.000 M = 2.34 (n = 500) D = 0.000 E = 0.000 M = 2.44 (n = 500) D = 0.000 E = 0.000 M = 2.40 (n = 500) D = 0.000 E = 0.005 M = 2.37 (n = 500) D = 0.000 E = 0.010 M = 2.37 (n = 500) D = 0.000 E = 0.051 M = 2.32 (n = 500) D = 0.000 E = 0.110 M = 2.39 (n = 500) D = 0.004 E = 0.389 M = 2.45 (n = 500) D = 0.009 E = 0.649 M = 2.42 (n = 1000) D = 0.032 E = 0.964 M = 2.46 (n = 1000) D = 0.109 E = 1.321 M = 2.48 (n = 1000) D = 0.320 E = 2.217
100
50
40
30
25
20
18
16
14
12
11
10
9
8
quently, even when the population size is fairly large. These increases in death rate are reflected in increases in the mean copy number in extant populations as the population size drops.
=
For s 0.3 and a resulting small equilibrium copy number, then, when host population size is small, elements often become extinct through a process anal-
289
Figure 1. The relationship between the rate of population death and the haploid population size in model 1 (which assumes Poisson variance in copy number). The rates shown are the expected numbers of deaths per 1000 generations. Values are given for three values of s, which expresses the strength of selection against individuals with high levels of transposition. A death is defined as when the host mean fitness drops below 10 6 . k = h = 1.
ogous to drift. This is shown by the high values of E. In model 2, small population sizes create high site frequencies and the mean value of the ratio of the copy number variance to its mean drops increasingly below one. This reduction in the copy number variance decreases the power of selection. This is reflected in the increase in mean copy numbers in Table 2 relative to those with corresponding selection constants and population sizes in Table 1. The increase in mean copy number and decreased copy number variance renders the population increasingly vulnerable to the catastrophic increases in copy number (which leads to host population death) that arise when copy number variance is temporarily diminished. Thus, comparing Figure 2 to Figure 1, it is clear that the host death rate starts to rise at a higher population size for model 2 than with an equal s in model 1. The columns of Tables 1 and 2 showing data for s 0.3, however, show that the presence of site frequencies lowers the extinction rate of the elements. When population size is low, and site frequencies high, the sampling variance in copy number is also low. The probability of random loss of all copies of the element in the population is thus reduced. The most profound effects of small population size on copy number come in model 3 when selection occurs through ectopic recombination. Table 3 and Figure 3 show the results for the mean copy numbers observed with various values of c (the rate of ectopic exchange per heterozygous site2 – here ‘site2 ’ expresses the dependence of ectopic exchange upon the numbers of possible pairings between sites, not the numbers of sites). When c is small, then, even with diploid population sizes of one hundred or more, there
=
Figure 2. The relationship between the rate of host population death and haploid population size for model 2, in which individual transposable element sites have frequencies. For a given s, significant rates of death now start at larger population sizes than are seen in model 1. k = h = 1, = 0.001.
Figure 3. The relationship between the rate of ‘invasion’ of the population by transposable elements, expressed per 1000 generations, and the diploid population size in model 3. An ‘invasion’ is said to have occurred if the mean copy number rises above 40. The curves differ in their values of c, which expresses the relationship between the frequency of ectopic recombinations and the number of heterozygous transposable element sites. = 0.001, t = 0.1.
can be major increases in copy number. Only a few cases of extinction of the elements were observed, all occurring after a small number of generations. It is clear from the simulations, however, that the dynamics of rapid increases in copy number in model 3 are qualitatively different from those of models 1 and 2. For models 1 and 2 the probability of host death is approximately constant during the 500 generations of the simulations. The population evolves to an approximately steady mean copy number that can, nevertheless, be disrupted by chance decreases in the copy number variance, leading to massive copy number increase. However, for model 3, the ‘invasions’ occur mainly in the last 100 generations of the 500 generations of simulation. This is shown on Figure 4, in which the proportions of populations that have not succumbed to massive increases in the copy number are plotted against the number of generations simulated. Seven sets of values are given, chosen to yield
290
Table 2. The impact of population size when site frequencies are specified. R is the mean value for the ratio of the variance in copy number to its mean. = populations run for 100 generations N
s = 0.1
s = 0.2
s = 0.3
5000
M = 8.28 (n = 10) R = 0.92 D = 0.000 E = 0.000 M = 8.48 (n = 100) R = 0.85 D = 0.082 E = 0.000 M = 8.72 (n = 100) R = 0.84 D = 0.185 E = 0.000 M = 9.00 (n = 100) R = 0.86 D = 0.400 E = 0.000 M = 9.15 (n = 100) R = 0.85 D = 1.640 E = 0.000
M = 3.76 (n = 10) R = 0.96 D = 0.000 E = 0.000 M = 3.98 (n = 100) R = 0.93 D = 0.000 E = 0.000 M = 3.92 (n = 100) R = 0.93 D = 0.000 E = 0.000 M = 4.00 (n = 100) R = 0.92 D = 0.000 E = 0.000 M = 3.95 (n = 100) R = 0.90 D = 0.000 E = 0.000 M = 4.06 (n = 100) R = 0.91 D = 0.000 E = 0.000 M = 4.04 (n = 100) R = 0.89 D = 0.000 E = 0.000 M = 4.09 (n = 100) R = 0.86 D = 0.000 E = 0.000 M = 4.26 (n = 500) R = 0.88 D = 0.057 E = 0.000 M = 4.45 (n = 500) R = 0.85 D = 0.668 E = 0.000 M = 5.03 (n = 1000 ) R = 0.77 D = 3.240 E = 0.000
M = 2.42 (n = 10) R = 0.97 D = 0.000 E = 0.000 M = 2.50 (n = 100) R = 0.96 D = 0.000 E = 0.000 M = 2.47 (n = 100) R = 0.94 D = 0.000 E = 0.000 M = 2.56 (n = 100) R = 0.93 D = 0.000 E = 0.000 M = 2.53 (n = 100) R = 0.92 D = 0.000 E = 0.000 M = 2.57 (n = 100) R = 0.92 D = 0.000 E = 0.000 M = 2.60 (n = 100) R = 0.93 D = 0.000 E = 0.000 M = 2.61 (n = 100) R = 0.92 D = 0.000 E = 0.000 M = 2.61 (n = 500) R = 0.90 D = 0.000 E = 0.000 M = 2.64 (n = 500) R = 0.89 D = 0.000 E = 0.000 M = 2.79 (n = 500) R = 0.83 D = 0.102 E = 0.000 M = 2.88 (n = 500) R = 0.81 D = 0.448 E = 0.000 M = 3.13 (n = 1000 ) R = 0.78 D = 0.980 E = 0.000
100
90
80
70
60
50
40
30
25
20
18
16
291 elements when copy number is initially small. Thus, the extinction rate E was calculated for generations 100 to 500, giving (7 10 6 1)/(760 550 401 289) 0.012 per 100 generations, or 0.120 per 1000 generations. A similar approach was applied to the invasion rate with model 3.
+
=
+
+ +
+
+
Discussion Figure 4. The changes with time in the proportion of populations (among those that have retained the transposable element) that have succumbed to neither population death (for models 1 and 2) nor invasion (for model 3). The seven sets of parameters illustrated all give high rates of one or other of these processes. ‘M1’ symbolises model 1, etc.
reasonably high rates of death or ‘invasion’ over the 500 generations. However, while the four data sets simulating models 1 and 2 show decays in the proportions of populations surviving that are approximately exponential, in model 3 the rate at which populations succumb to invasion increases greatly in the course of the simulations. The absolute rates after 500 generations are not higher than some death rates seen in models 1 and 2, but are clearly still increasing. The values of D and E are calculated by counting, every hundred generations, the proportion of populations in which the fitness has dropped below 10 6 (a death event, D) and the proportion of populations in which copy number has dropped to zero (an extinction event, E). The values given are the average values of these, expressed relative to the numbers of populations that started the period of 100 generations with elements still present in them. For example, consider the case of model 1, with s 0.2 and a population size, N , of nine. After 100 generations, 205 of the 1000 starting populations had died, and the elements were extinct in 35 of the remainder. One hundred generation later, 203 of the 760 remaining populations had died, and there were a further 7 extinctions. The next 100 generations saw 139 deaths and 10 extinctions from 550 starting populations. The next 100 generations saw 106 deaths and 6 extinctions from 401 starting populations, then, finally, there were 66 deaths and 1 extinction from 289 starting populations in the last 100 generations. The death rate was (205 203 139 106 66)/(1000 760 550 401 289) 0.2397 per hundred generations, or 2.397 per 1000 generations. Here, as in all simulations, the majority of extinctions occurred in the first 100 generations, showing the increased probability of loss of the
=
+ +
+ =
+
+
+
+
+
These simulations reveal that the powerful forces of transposition and selection, in combination with a finite population size, can produce rapid increases in copy number, possibly greatly reducing host fitness, if the effects of small population size cause the variance in copy number to drop. Are these effects biologically realistic? They appear to paint a picture in which wild populations teeter on a knife-edge, with outbreeding being necessary to prevent extinction through their transposable elements running riot. Undoubtedly, the host death effect in models 1 and 2 are exaggerated. It is assumed that transposition rate increases more than linearly with copy number, and stability in infinite population size models is possible only because of the accelerating impact of transposition on fitness. As a result, even without the catastrophic increases seen in copy number, mean fitness is low, which may overstate the impact of transposition on fitness in wild populations. Furthermore, it seems highly probable that the very high transposition rates included in the model would be prevented by other forces, such as titration of host factors. The outcomes of model 3 are different in kind from those of models 1 and 2, in that the probability of ‘invasion’ increases during the 500 generations of simulation. This cannot simply be due to the rate of increase in copy number through replicative transposition being too slow for copy number to attain the necessary mean value before then, because an exponential increase in mean copy number at rate t 0.1 could increase the mean from 1 to 40 within 40 generations. Rather, the attainment of high copy number comes through what is, at first, a slow and gradual accumulation of high frequency sites. Because selection operates only against heterozygous sites, each site will have an unstable equilibrium frequency at 50%, above which copy number will be selected upwards through homozygous individuals being fitter than heterozygotes. As time goes by, the number of sites that have reached a stable high frequency will increase. These sites are still the source of transposition, and thus will increase the
=
292 Table 3. Model in which selection results from ectopic recombination between heterozygous sites. N is the diploid population size. I = rate of ‘invasion’ per 1000 generations : in these cases all populations had been invaded to high copy number after 500 generations and no means and ratios were produced
y
N
c = 0.01
c = 0.03
c = 0.05
200
M = 7.44 (n = 100) R = 0.90 I = 0.000 E = 0.000 M = 7.74 (n = 100) R = 0.89 I = 0.000 E = 0.000 M = 8.28 (n = 100) R = 0.88 I = 0.00 E = 0.000 M = 9.60 (n = 100) R = 0.84 I = 0.120 E = 0.000 M = 12.21 (n = 100) R = 0.77 I = 0.263 E = 0.000 M = 14.74 (n = 100) R = 0.73 I = 1.220 E = 0.000
M = 2.01 (n = 100) R = 0.96 I = 0.000 E = 0.000 M = 2.05 (n = 100) R = 0.95 I = 0.000 E = 0.000 M = 2.06 (n = 100) R = 0.94 I = 0.000 E = 0.000 M = 2.10 (n = 100) R = 0.95 I = 0.000 E = 0.000 M = 2.19 (n = 100) R = 0.92 I = 0.000 E = 0.000 M = 2.23 (n = 100) R = 0.92 I = 0.000 E = 0.000 M = 2.39 (n = 100) R = 0.90 I = 0.000 E = 0.000 M = 2.56 (n = 100) R = 0.90 I = 0.000 E = 0.000 M = 3.10 (n = 100) R = 0.81 I = 0.000 E = 0.000 M = 4.28 (n = 100) R = 0.72 I = 0.000 E = 0.000 M = 8.50(n = 100) R = 0.54 I = 0.020 E = 0.020 M = 16.03 (n = 100) R = 0.45 I = 0.980 E = 0.000 M = (n = 100) R= I = 2.786 E = 0.000
M = 1.19 (n = 100) R = 0.97 I = 0.000 E = 0.000 M = 1.20 (n = 100) R = 0.96 I = 0.000 E = 0.000 M = 1.22 (n = 100) R = 0.94 I = 0.000 E = 0.000 M = 1.24 (n = 100) R = 0.94 I = 0.000 E = 0.000 M = 1.30 (n = 100) R = 0.93 I = 0.000 E = 0.000 M = 1.30 (n = 100) R = 0.92 I = 0.000 E = 0.000 M = 1.33 (n = 100) R = 0.92 I = 0.000 E = 0.000 M = 1.37 (n = 100) R = 0.92 I = 0.000 E = 0.000 M = 1.79 (n = 100) R = 0.84 I = 0.000 E = 0.000 M = 2.27 (n = 100) R = 0.78 I = 0.000 E = 0.000 M = 3.69 (n = 100) R = 0.63 I = 0.000 E = 0.020 M = 5.43 (n = 100) R = 0.55 I = 0.000 E = 0.000 M = 10.54 (n = 100) R = 0.37 I = 0.240 E = 0.000 M = 14.96 (n = 100) R = 0.32 I = 0.713 E = 0.020
150
120
100
90
80
70
60
50
40
30
25
20
18
y y
293 upward pressure on frequencies at the remaining sites. Thus, it would seem that populations are subject to this slow and cumulative invasion process, with the early stages being speeded by the population size being low enough, and the selection against heterozygotes being weak enough, for sites to have a reasonable chance of drifting up over the 50% frequency threshold. Notwithstanding the apparent implausibility of these models, something similar seems to happen in the case of the Drosophila melanogaster retrotransposon copia. Nuzhdin, Pasyukova, and Mackay (1996) report that in some inbred lines of the Harwich strain, the number of copia elements has greatly increased, to over 130 relative to a starting number of around 50. This increase was accompanied by a very great increase (over 100 fold) in the total copia transposition rate. The cells of the high copy number lines had indeed become filled with virus-like particles, indicating a likely major drop in fitness. These results imply that the reductions in fitness that are often seen in inbred lines of Drosophila may result not merely from the exposure of recessive mutations of low fitness, but also through increases in transposable element copy number due to the elimination of the selection that formerly prevented this. Do the differing models make different predictions about the expected frequency spectra of transposable elements sites in wild populations of Drosophila melanogaster? Various types of experiments indicate that the effective size of D. melanogaster populations are very large (over 105 ), and this means that, even for processes such as transposition, which generates variability at reasonably low rates per generation, the heterozygosity will be high. This is reflected in the low frequencies of sites seen in surveys of transposable element positions. This is approximately in agreement with the Poisson variance in copy number between individuals assumed by model 1, but the non-negligible site frequencies observed imply that model 2 will fit the data better. The predicted site frequencies in model 3 may be highly bimodal, in that, as we have seen, there is heterozygote disadvantage, and sites will be driven up in frequency by selection once they pass a 50% frequency threshold. No such bimodal distribution is seen in wild populations. This is consistent with selection being sufficiently powerful to prevent any sites from rising to this threshold. However, since it is recombination that generates the selection, and large regions of the D. melanogaster genome are almost completely recombinationally inert, one might expect the drift to fixation of elements at sites in these regions. However,
the virtual absence of pseudogenes in Drosophila suggests that no sequences can rise to high frequency in these large populations without some selection in their favour, a result probably to be explained by the very high rate of loss through deletion of unselected DNA sequences in this genus (Petrov et al., 1996). While the models presented here thus may have some applicability to the Drosophila transposable elements, there are clearly cases which they do not fit. The human genome contains both very high copy numbers and very high homozygosity for the LINE-1 and Alu families, without an exponential copy number increase being observed. The explanation appears to reside in only a few copies of these elements (or maybe only a single ‘master’ copy) being the source of all replicative transpositions (Britten et al., 1988). This means that the rate of transposition has not increased in proportion to the number of copies. The reasons for this heterogeneity between elements in their probabilities of acting as sources of transposition is not clear. Certainly, many sites are very old, and, because elements at given sites are not subject to any selection to maintain their transposability, many will have lost by random mutations the cis-acting sequences required for mobility. Many (most in the case of LINE-1 sequences) will have become transpositionally inactivated due to mutations at the moment of insertion. There will also be context effects, in which the rate of transcription of elements, and with it the rate of transposition, will depend upon the genomic sequences surrounding the site of insertion.
Acknowledgements We thank the Molecular Evolution Laboratory, University of Nottingham, for use of computing facilities. RMB has been in receipt of BBSRC Research Studentship.
References Aulard, S., F. Lemeunier, C. Hoogland, N. Chaminade, J.F. Brookfield & C. Bi´emont, 1995. Chromosomal distribution and population dynamics of the 412 retrotransposon in a natural population of Drosophila melanogaster. Chromosoma 103: 693–699. Bi´emont, C., F. Lemeunier, M.P. Garcia Guerriero, J.F. Brookfield, C. Gautier, S. Aulard & E.G. Pasyukova, 1994. Population dynamics of the copia, mdg1, mdg3, gypsy and P transposable elements in a natural population of Drosophila melanogaster. Genet. Res. Camb. 63: 197–212.
294 Britten, R.J., W.F. Baron, D.B. Stout & E.H. Davidson, 1988. Sources and evolution of human Alu repeated sequences. Proc. Natl. Acad. Sci. USA 85: 4770–4774. Brookfield, J.F.Y., 1982. Interspersed repetitive DNA sequences are unlikely to be parasitic. J. Theor. Biol. 94: 281–300. Brookfield, J.F.Y., 1991. Models of repression of transposition in PM hybrid dysgenesis by P cytotype and by zygotically-encoded repressor proteins. Genetics 128: 471–486. Brookfield, J.F.Y., 1996. Models of the spread of non-autonomous selfish transposable elements when transposition and fitness are coupled. Genet. Res. Camb. 67: 199–209. Charlesworth, B., 1985. The population genetics of transposable elements, pp. 213–232 in Population Genetics and Molecular Evolution, edited by T. Ohta and K. Aoki. Springer-Verlag Berlin. Charlesworth, B. & D. Charlesworth, 1983. The population dynamics of transposable elements. Genet. Res. Camb. 42: 1–27. Charlesworth, B. & C.H. Langley, 1986. The evolution of selfregulated transposition of transposable elements. Genetics 112: 359–383. Deininger, P.L., M.A. Batzer, C.A. Hutchinson & M.H. Edgell, 1992. Master genes in mammalian repetitive DNA amplification. Trends Genet. 8: 307–311. Doolittle, W.F. & C. Sapienza, 1980. Selfish genes, the phenotype paradigm and genome evolution. Nature 284: 601–603. Eanes, W.F., C. Wesley & B. Charlesworth, 1992. Accumulation of P elements in minority inversions in natural populations of Drosophila melanogaster. Genet. Res. Camb. 59: 1–9.
Hoogland, C. & C. Bi´emont, 1996. Chromosomal distribution of transposable elements in Drosophila melanogaster: test of the ectopic recombination model for maintenance of insertion site number. Genetics 144: 197–204. Kaplan, N.L. T. Darden & C.H. Langley, 1985. Evolution and extinction of transposable elements in Mendelian populations. Genetics 109: 459–480. Langley, C.H., J.F.Y. Brookfield & M.L. Kaplan, 1983. Transposable elements in Mendelian populations. I. A theory. Genetics 104: 457–471. Langley, C.H., E.A. Montgomery, R. Hudson, N. Kaplan & B. Charlesworth, 1988. On the role of unequal exchange in the containment of copy number. Genet. Res. Camb. 52: 223–235. Montgomery, E.A. & C.H. Langley, 1983. Transposable elements in Mendelian populations. II Distribution of three copia-like elements in a natural population of Drosophila melanogaster. Genetics 104: 473–483. Nuzhdin, S.V., E.G. Pasyukova & T.F.C. Mackay, 1996. Positive association between copia transposition rate and copy number in Drosophila melanogaster. Proc. Roy. Soc. Lond. B 263: 823–831. Orgel, L.E. & F.H.C. Crick, 1980. Selfish DNA: the ultimate parasite. Nature 284: 604–607. Petrov, D.A., E.R. Lozovskaya & D.L. Hartl, 1996. High intrinsic rate of DNA loss in Drosophila. Nature 384: 346–349. Quesneville, H. & D. Anxolab´eh`ere, 1997. Dynamics of transposable elements in metapopulations: a model of P element invasion in Drosophila. In Press. Theoret. Pop. Biol.