Mamm Genome (2008) 19:221–225 DOI 10.1007/s00335-008-9104-2
COMMENTARY
From ENU mutagenesis to population genetics N. Avrion Mitchison Æ Bryan Clarke
Received: 16 January 2008 / Accepted: 17 February 2008 / Published online: 26 March 2008 Ó Springer Science+Business Media, LLC 2008
Introduction Current approaches to population genetics focus on evolutionary information extracted from genomics, where large-scale sequencing is telling us much about the operation of natural selection. Within this field an important issue is the nature and evolution of genetic dominance. Historically, inheritance in natural populations was viewed in terms of dominant ‘‘wild-type’’ alleles, following the discovery of the reserve of recessive alleles present as heterozygotes in wild Drosophila as revealed by inbreeding (reviewed by Keightley 1994). This reserve was thought to provide the feedstock for evolution in response to environmental change, a view that the new genomics has verified and much extended, not only in man but also in natural animal and plant populations (Eyre-Walker and Keightley 2007; Mitchell-Olds et al. 2007). The importance of inheritance pattern is not confined to the human population. The relative contribution of dominant and recessive variation is an important consideration in animal and plant breeding and in the management of wild populations (Edmands 2007). Inbreeding depression, the hallmark of recessive inheritance (and sometimes of heterozygous advantage), has long been known as an important factor in the design of agricultural breeding programs (Falconer 1989).
N. A. Mitchison (&) Institute of Ophthalmology, University College London, London, UK e-mail:
[email protected] B. Clarke Institute of Genetics, School of Biology, University of Nottingham, Nottingham, UK
Experimental mutagenesis potentially can provide information valuable for population genetics. A key procedure in the new genomics is to compare a variable that is subject to natural selection with a control variable supposedly unaffected in this way, typically nonsynonymous compared with synonymous nucleotide substitution (although there is evidence of at least some selection on synonymous mutations). This type of analysis has not yet been applied to the population genetics of dominance because of the lack of an appropriate neutral control. Our purpose here is to inquire how data from mutagenesis might fill this gap. The form of information from mutagenesis most relevant to man is the relative frequencies of mutants in the first (G1) and third (G3) generations in mice. Provided selective effects within the mutagenesis screen can be avoided, this ratio should provide an estimate of the dominant and recessive relative mutation rate that is unaffected by natural selection. The ratio could then be compared with the ratio recorded for human monofactorial disease, where natural selection has operated over many generations on recessive but not on dominant mutant alleles. Our purpose here is to alert the mutagenesis community to this need for G1:G3 ratio data. In addition, the dominant and recessive mutants, as they accumulate, should contribute much to understanding the cell biology of dominance.
The nature of dominance For most gene products the precise quantity is not crucial, and we can get by on half the normal amount. These are the wild-type genes in relation to which most deleterious mutations are recessive. Nevertheless, mutations do occur
123
222
N. A. Mitchison and B. Clarke: From ENU mutagenesis to population genetics
that on their own give rise to defective phenotypes, a form of inheritance first identified by Mendel as dominant. The forces of natural selection responsible for dominance of the wild type has been the subject of much debate, still not fully resolved (Hurst and Randerson 2000). The dominance or recessiveness of an allele often depends on the particular pleiotropic effects of the allele that is being observed. It can also depend on the circumstances of observation; for example, sickle hemoglobin heterozygotes (HbAHbS) show the recessiveness of the HbS allele at normal oxygen levels, but show its dominance at lower ones when heterozygous erythrocytes collapse. A mutation may sometimes result in complex phenotypes in which one feature is inherited as a dominant and another as a recessive. Such mutations can contribute to our understanding of the evolution of dominance.
Cell biological mechanisms responsible for dominance In principle, the mechanisms that give rise to dominance comprise (1) haploinsufficiency ‘‘where half a loaf is not enough’’ (Seidman and Seidman 2002) and (2) gain-offunction where the mutated protein disrupts an assembly of, for example, collagen or rhodopsin, renders the assembly unstable, or interferes with normal trafficking as in some rhodopsin mutants (Mendes et al. 2005). A third mechanism has been proposed: (3) dominant-negative activity where the mutant protein interferes with posttranscriptional maturation or some other normal function through a mechanism such as overproduction of a ratelimiting enzyme. The distinction between categories (2) and (3) is uncertain, even in the prototype example of p53 mutations in cancer (Blagosklonny 2000). Furthermore, the mechanism cannot always be assigned with certainty. For instance, the dominant mutations commonly affecting transcription and splicing factors have been interpreted in terms of haploinsufficency (Seidman and Seidman 2002) or alternatively as dominant-negative effects (Lopez-Bigas et al. 2006). Other circumstances that make dominance more likely have been noted. For instance, among enzyme cascades that give rise to human disease, those that are intracellular (e.g., in porphyria) exhibit dominant mutation more often than those that are extracellular (e.g., in complement deficiency), presumably because the intracellular concentration is more finely tuned. The mode of action of dominant mutations in the human population is revealed primarily by disease pathology, accompanied by detailed cell biology. Gene knockouts in mice provide further insight into mechanism, as also does knockdown of C. elegans homologs (Furney et al. 2006). A powerful general identifier of dominant function is transfection of the mutant gene into cells in culture (Johnston
123
et al. 1998; Mendes et al. 2005), although this could be cumbersome if testing becomes necessary in an expanded panel of cell lines. These are all approaches applicable to the G1 products of mouse mutagenesis screens, which between them can be expected to generate a clearer picture of the nature of dominance.
Genomics of dominance The primary source of information about dominance in the human population is the invaluable OMIM database (Online Mendelian Inheritance in Man, accessed via the NCBI website). OMIM keeps track of its own size and currently holds entries for 3850 diseases. Many of these entries contain information about inheritance, generally sourced from family histories. The various disease specialists who contribute to OMIM collect this information because of its importance for diagnosis and counseling. Hope for gene therapy is now an additional reason for doing so. Currently, OMIM lists 387 genes with known sequence and phenotype, of which *30% can be identified as having dominant and *40% as having recessive mutations. This proportionality suffers from ascertainment bias (dominant inheritance is easier to detect) and so cannot be directly compared with the outcome of mutagenesis. The appropriate comparison is with the frequency of dominant versus recessive disease, which is often not recorded in OMIM but can still sometimes be found in the literature. The genomics of dominant and recessive disease genes listed in OMIM has been well studied (Furney et al. 2006; Jimenez-Sanchez et al. 2001; Kondrashov and Koonin 2004; Lopez-Bigas et al. 2006), with the latter reference providing a comprehensive survey of previous work. For gene function, analysis via the GO (Gene Ontogeny) database yields transcription regulators and structural genes as those most likely to undergo dominant mutation, in line with the scattered observations already mentioned. These analyses do not distinguish between genes prone to dominant mutation because of their function and those where dominant mutations predominate because recessive alleles have been lost through natural selection (e.g., through selective loss of alleles with nearly recessive expression). For understanding gene evolution, a key tool is the nonsynonymous/synonymous substitution rate denoted by Ka/Ks or Kn/Ks (Lynch and Conery 2000). Here the synonymous substitution rate serves simply as a biological clock with which evolution via nonsynomymous substitution can be compared. The force of selection has been measured in this way by comparing paralogs (pairs of genes within the same species arising by duplication) and
N. A. Mitchison and B. Clarke: From ENU mutagenesis to population genetics
orthologs (where divergent evolution of a gene can be tracked in two related species). Both human paralogs and human-chimpanzee orthologs have been analyzed for the OMIM collection of dominant and recessive genes. As judged in this way the dominant disease genes are better conserved than the run of nondisease genes, indicating that they are subject to stronger selective restraints, while recessive disease genes are less constrained in this way. This may be because recessive mutations with a slightly deleterious effect tend to accumulate in natural populations (Charlesworth and Eyre-Walker 2007, Nielsen et al. 2007). Overall, the mode of inheritance of a gene is clearly an important determinant of its rate of evolution. A broadly similar analysis is being applied to the general human population, where Ka/Ks substitutions are compared over large numbers of genes from many human individuals (Nielsen et al. 2007; Williamson et al. 2005). The analysis provides strong evidence of recent population growth, in line with other estimates (Hawks et al. 2000). When the Ka (nonsynonymous) substitutions are categorized according to various biometric measures as ‘‘benign,’’ ‘‘possibly damaging,’’ or ‘‘probably damaging,’’ a clear relationship to selective pressure emerges, as expected. The strategy is not immediately applicable to inheritance pattern because of the lack of any clear-cut sequence signature of dominance, but our expectations are clear. Disadvantageous dominant and X-linked pathogenic alleles are eliminated relatively rapidly, while comparable recessive alleles are likely to persist much longer, exposing them to prolonged selective pressure. For that reason recessive mutations should be more influenced by past demographic bottlenecks, which in general would be expected to flush them out and eliminate them. In addition, prolonged selective pressure on heterozygotes, however slight in the case of nearly recessive alleles, will affect not only the frequency of the pathogenic mutation itself, but also the frequency of nearby polymorphic alleles (‘‘selective sweep’’) (Nielsen et al. 2007). When making either ortholog or paralog comparisons, the Ks substitution rate provides an essential negative (i.e., relatively unselected) control. This point underlines the potential value of mutagenesis screens, which could provide a control of a different type but able to serve a similar purpose. The way that natural selection operates predominantly on recessive pathogenic alleles, leaving monoallelic (dominant and X-linked) disease relatively unaffected, is summarized in the Appendix.
The contribution from mutagenesis (i) reservations Charlotte Auerbach, who discovered chemical mutagenesis, realized that it would provide a valuable tool but that it
223
could not be expected to mimic natural mutation exactly; this has turned out to be correct. In both cases larger genes are more vulnerable, as expected of a randomly distributed effect. Systematic surveys reveal A:T to T:A transversions and A:T to G:C transitions leading to missense mutation as the most common form of mutation induced by ENU in phenotype-driven mouse screens (Barbaric et al. 2007; Justice et al. 1999; Takahasi et al. 2007). In their large series Barbaric et al. (2007) found an apparent influence from neighboring DNA in the form of excess local G+C content. These changes do not mimic precisely the mutations found in human disease databases, where CpG doublet changes predominate (Vihinen et al. 2001). As a general rule these minor differences would be expected to even out over the length of a gene, leaving little reason to suspect that ENU mutagenesis would provide misleading information about the proportion of dominant and recessive mutations occurring naturally in the human population.
The contribution from mutagenesis (ii) frequencies and mechanisms Numerous statements have been made about the value of mouse mutagenesis screens. Here we briefly emphasize the contributions likely to be made to the understanding of genetic dominance and monofactorial human genetic disease, and more generally to how information gleaned from mutagenesis will intersect with evolutionary genomics. It is clear that we have much to learn from comparing the outcome of mouse mutagenesis with OMIM. The comparison emphasized here is with the relative proportion of dominantly and recessively inherited monofactorial disease in man. More generally, the comparison will test how much we have in common as mammals and where man and mouse differ. We will learn about bias in the ascertainment of genetic disease: the defects that already attract so much attention and those where little is so far known. Through learning from these two independent sources how genes go wrong, we will gain insight into natural selection. The comparison will, we expect, draw attention to the shortage of information in OMIM about the proportion of dominant and recessive inheritance and the need for further genetic epidemiology. A second comparison of interest will be between mutation of gene orthologs in man and mouse and of paralogs within mice, particularly those already identified as under selective pressure during human-chimpanzee divergence (Nielsen et al. 2005). This type of comparison is of particular value when hooked up to gene expression analysis (Kafri et al. 2006; Tirosh et al. 2007), now being used to characterize the divergence of function needed for survival of duplicate genes (Lynch and Conery 2000).
123
224
N. A. Mitchison and B. Clarke: From ENU mutagenesis to population genetics
A third valuable input from mutagenesis screens will be to provide many more examples of dominant and recessive mutation. These should strengthen the understanding of the cell biology of dominance and test the validity of the current views about mechanisms outlined above. A fourth question from the present standpoint concerns the genes mentioned above, such as transcription regulators, which have a relatively high proportion of dominant mutations. The question is whether this results solely from the function of these genes or whether there is an additional contribution from natural selection. Comparison with the outcome of mutagenesis should help answer this question. Two areas seem particularly well placed to lead the way in this analysis: vision and immunity. Both have a substantial and well-characterized range of monofactorial variation, and in both cases monoallelic (autosomal dominant or X-linked) disease occurs more frequently, even though as usual more genes undergo recessive mutation (Daiger et al. 2007; Eades-Perner et al. 2007). Both include monofactorial defective phenotypes acquired through mutation in any one of many single genes (e.g., retinitis pigmentosa, primary immune deficiency). Both are areas of current mutagenesis screening (Cook et al. 2006; Jablonski et al. 2005; Pinto et al. 2004). As judged by the human-chimpanzee genome comparison, the two areas have evolved at very different speeds (Nielsen et al. 2005), something that the outcome of mutagenesis could clarify. In the future, interest will grow in the area of brain research, hopefully in parallel with a growing understanding of psychometric genetics. The first findings look encouraging (Cook et al. 2007; Godinho and Nolan 2006; Reijmers et al. 2006). This is an area that we identify with our own humanity and where we are extraordinarily sensitive to individual differences. However, the chimpanzee genome comparison so far finds little evidence of strong genetic selection. Where do we stand in relation to dominant and recessive mutations in mice? Most of the current ENU screens have been set up to detect both types of mutation (Justice et al. 1999; The Mouse Phenotype Database Integration Consortium 2007), including the Mouse Mutagenesis Programme at Harwell, UK (Nolan et al. 2000), The GSFNational Research Center, Neuherberg, Germany (Hrabe´ de Angelis et al. 2000), the RIKEN Genomic Sciences Center, Yokohama (Masuya et al. 2004), the Canberra program (Hoyne and Goodnow 2006), and the North American programs (Clark et al. 2004). Yet, over the years since these programs started only the German program seems to have published its mutation frequencies, and only in preliminary form. We can only hope that as the work proceeds further dominant/recessive frequency data will accumulate and will eventually be published. Meanwhile, the excitement
123
has been in exploiting particular mutants and one fears that the matter of frequency is being left on the back burner. The screeners should please remember that their figures for the observed frequency of autosomal dominant, autosomal recessive, and X-linked mutations would be much appreciated. They are needed for fundamental population genetics, for medical genetics, and for the management of plant and animal species.
The contribution from mutagenesis (iii) sequences and hotspots As ENU studies progress, and as high-throughput sequences become ever less expensive, we can expect sequences of the mutated genes to accumulate. In the first instance these will surely be compared with the sequences of major OMIM genes already available, such as the X-linked BTK ([228 alleles) and DMD ([942 alleles), the autosomal dominants MYH7 ([36 alleles) and RHO ([23 alleles), and the autosomal recessives CPN3 ([438 alleles) and CNGA3 ([46 alleles). Of special interest from the present standpoint are genes that allow AD and AR mutations to be compared, such as DNFB1/DFNA3. For the monoallelic pathogenic mutations (XL and AD), the expectation is that the sequence variation generated by mutagenesis should match that occurring naturally in the human population, with matching hotspots and so on (hotspots have been identified in *5% of the genes listed in OMIM). Indeed, if they do not match, the mutagenesis screens will need to worry about ascertainment bias. However, when it comes to the G3 autosomal recessives, it becomes more interesting because they should lack the features imposed by their many generations of natural selection. We do not quite know what to expect. For instance, they might have more variation outside the OMIM-defined hotspots and might proportionately have more synonymous variation. Acknowledgment The authors thank Shomi Bhattacharya for help in preparing this commentary.
Appendix The relationship between mutagenesis and population genetics from the present perspective can be summarized thus: (1-PdomOMIM ):(1-PdomENU )::Ka:Ks where 1-PdomOMIM denotes the proportion of recessively inherited disease as referenced in the OMIM (Online Mendelian Inheritance In Man) database, not the proportion of dominant mutations, and 1-PdomENU is the frequency of recessive mutations obtained by ENU mutagenesis in mouse
N. A. Mitchison and B. Clarke: From ENU mutagenesis to population genetics
(i.e., the G3/G1 ratio). Ka and Ks are the frequencies, respectively, of nonsynonymous and synonymous nucleotide substitutions observed in the comparable part of human the population. The ‘‘::’’ sign signifies similarity of function but not numerical equality and should be read as ‘‘as.’’ On either side a parameter that is subject to natural selection is compared with a supposedly unselected control.
References Barbaric I, Wells S, Russ A, Dear TN (2007) Spectrum of ENUinduced mutations in phenotype-driven and gene-driven screens in the mouse. Environ Mol Mutagen 48:124–142 Blagosklonny MV (2000) p53 from complexity to simplicity: mutant p53 stabilization, gain-of-function, and dominant-negative effect. FASEB J 14:1901–1907 Charlesworth J, Eyre-Walker A (2007) The other side of the nearly neutral theory, evidence of slightly advantageous back-mutations. Proc Natl Acad Sci U S A 104:16992–16997 Clark AT, Goldwitz D, Takahashi JS, Vitatema MH, Siepka SM et al (2004) Implementing large-scale ENU mutagenesis screens in North America. Genetica 122:51–64 Cook MC, Vinuesa CG, Goodnow CC (2006) ENU-mutagenesis: insight into immune function and pathology. Curr Opin Immunol 18:627–633 Cook MN, Dunning JP, Wiley RG, Chesler EJ, Johnson DK et al (2007) Neurobehavioral mutants identified in an ENU-mutagenesis project. Mamm Genome 18:559–572 Daiger SP, Bowne SJ, Sullivan LS (2007) Perspective on genes and mutations causing retinitis pigmentosa. Arch Ophthalmol 125:151–158 Eades-Perner AM, Gathmann B, Knerr V, Guzman D, Veit D et al (2007) The European internet-based patient and research database for primary immunodeficiencies: results 2004–06. Clin Exp Immunol 147:306–312 Edmands S (2007) Between a rock and a hard place: evaluating the relative risks of inbreeding and outbreeding for conservation and management. Mol Ecol 16:463–475 Eyre-Walker A, Keightley PD (2007) The distribution of fitness effects of new mutations. Nat Rev Genet 8:610–618 Falconer DS (1989) Introduction to Quantitative Genetics, 3rd edn. Longmann, London Furney SJ, Alba MM, Lopez-Bigas N (2006) Differences in the evolutionary history of disease genes affected by dominant or recessive mutations. BMC Genomics 7:165 Godinho SI, Nolan PM (2006) The role of mutagenesis in defining genes in behaviour. Eur J Hum Genet 14:651–659 Hawks J, Hunley K, Lee SH, Wolpoff M (2000) Population bottlenecks and Pleistocene human evolution. Mol Biol Evol 17:2–22 Hoyne GF, Goodnow CC (2006) The use of genomewide ENU mutagenesis screens to unravel complex mammalian traits: identifying genes that regulate organ-specific and systemic autoimmunity. Immunol Rev 210:27–39 Hrabe´ de Angelis MH, Flaswinkel H, Fuchs H, Rathkolb B, Soewarto D et al (2000) Genome-wide, large-scale production of mutant mice by ENU mutagenesis. Nat Genet 25:444–447 Hurst LD, Randerson JP (2000) Dosage, deletions and dominance: simple models of the evolution of gene expression. J Theor Biol 205:641–647 Jablonski MM, Wang X, Lu L, Miller DR, Rinchik EM et al (2005) The Tennessee Mouse Genome Consortium: identification of ocular mutants. Vis Neurosci 22:595–604
225
Jimenez-Sanchez G, Childs B, Valle D (2001) Human disease genes. Nature 409:853–855 Johnston JA, Ward CL, Kopito RR (1998) Aggresomes: a cellular response to misfolded proteins. J Cell Biol 143:1883–1898 Justice MJ, Noveroske JK, Weber JS, Zheng B, Bradley A (1999) Mouse ENU mutagenesis. Hum Mol Genet 8:1955–1963 Kafri R, Levy M, Pilpel Y (2006) The regulatory utilization of genetic redundancy through responsive backup circuits. Proc Natl Acad Sci U S A 103:11653–11658 Keightley PD (1994) The distribution of mutation effects on viability in Drosophila melanogaster. Genetics 138:1315–1322 Kondrashov FA, Koonin EV (2004) A common framework for understanding the origin of genetic dominance and evolutionary fates of gene duplications. Trends Genet 20:287–290 Lopez-Bigas N, Blencowe BJ, Ouzounis CA (2006) Highly consistent patterns for inherited human diseases at the molecular level. Bioinformatics 22:269–277 Lynch M, Conery JS (2000) The evolutionary fate and consequences of duplicate genes. Science 290:1151–1155 Masuya H, Nakai Y, Motegi H, Niinaya N, Kida Y et al (2004) Development and implementation of a database system to manage a large-scale mouse ENU-mutagenesis program. Mamm Genome 15:404–411 Mendes HF, van der Spuy J, Chapple JP, Cheetham ME (2005) Mechanisms of cell death in rhodopsin retinitis pigmentosa: implications for therapy. Trends Mol Med 11:177–185 Mitchell-Olds T, Willis JH, Goldstein DB (2007) Which evolutionary processes influence natural genetic variation for phenotypic traits? Nat Rev Genet 8:845–856 Nielsen R, Bustamante C, Clarke AG, Glanowski S, Sackton TB et al (2005) A scan for positively selected genes in the genomes of humans and chimpanzees. PLoS Biol 3:e170 Nielsen R, Hellmann I, Hubisz M, Bustamante C, Clark AG (2007) Recent and ongoing selection in the human genome. Nat Rev Genet 8:857–868 Nolan PM, Peters J, Strivens M, Rogers D, Hagan J et al (2000) A systematic, genome-wide, phenotype-driven mutagenesis programme for gene function studies in the mouse. Nat Genet 25:440–443 Pinto LH, Vitatema MH, Siepka SM, Shimomura K, Lumayag S et al (2004) Results from screening over 9000 mutation-bearing mice for defects in the electroretinogram and appearance of the fundus. Vision Res 44:3335–3345 Reijmers LG, Coats JK, Pletcher MT, Wiltshire T, Tarantino LM et al (2006) A mutant mouse with a highly specific contextual fearconditioning deficit found in an N-ethyl-N-nitrosourea (ENU) mutagenesis screen. Learn Mem 13:143–149 Seidman JG, Seidman C (2002) Transcription factor haploinsufficiency: when half a loaf is not enough. J Clin Invest 109:451– 455 Takahasi KR, Sakuraba Y, Gondo Y (2007) Mutational pattern and frequency of induced nucleotide changes in mouse ENU mutagenesis. BMC Mol Biol 8:52 The Mouse Phenotype Database Integration Consortium (2007) Integration of mouse phenome data resources. Mamm Genome 18:157–163 Tirosh I, Bilu Y, Barkai N (2007) Comparative biology: beyond sequence analysis. Curr Opin Biotechnol 18:371–377 Vihinen M, Arredondo-Vega FX, Casanova JL, Etzioni A, Giliani S et al (2001) Primary immunodeficiency mutation databases. Adv Genet 43:103–188 Williamson SH, Hernandez R, Fledel-Alon A, Zhu L, Nielsen R et al (2005) Simultaneous inference of selection and population growth from patterns of variation in the human genome. Proc Natl Acad Sci U S A 102:7882–7887
123