Biochemical Genetics, Vol. 37, Nos. 9/10, 1999
Note
PC R Error and Molecular Population Genetics Norio Kobayashi, 1±3 Koichiro Tamura,2 and Tadashi Aotsuka 2 Received 2 Apr. 1999Ð Final 15 June 1999
INTRODUCTION The polymerase chain reaction (PCR) technique (Saiki et al., 1988) is a very useful tool for population genetic studies at the DNA level. However, there is an inherent problem in the PCR technique because the DNA fragments ampli® ed are subject to nucleotide substitution during the reaction processes. The rate and pattern of arti® cial substitution have already been estimated by several authors (e.g., Lundberg et al., 1991), and its in¯ uence on phylogenetic studies has been also discussed (Kwiatowski et al., 1991). However, the in¯ uence of PCR error on population genetic studies has not been well documented. Determinations of the sequences from PCR products were performed by two methods, cloning and direct sequencing. Assuming that the sequences obtained by the direct sequencing method are correct, we discuss the in¯ uence of the error on the estimates of various parameters commonly used in molecular population genetics. In addition, we evaluate the efficiency of Pfu polym erase to improve the ® delity of PCR. MATERIALS AND METHODS Four species of Epilachna ladybird beetles, E. vigintioctom aculata, E. niponica, E. pustulosa, and E. yasutomii, were used in this study. To estimate the rate and pattern of PCR error using Taq polym erase and Taq 1 Pfu polymerase, 33 and 22 beetles, respectively, were used. According to Steller (1990), total DNA for PCR template was extracted individually from fresh or alcohol-preserved specimens. For PCR ampli® cation, primer designs were as described by Kobayashi et al. 1
2
3
Division of Biological Science, Graduate School of Science, Hokkaido University, Sapporo, Hokkaido 060-0810, Japan. Department of Biology, Graduate School of Science, Tokyo Metropolitan University, 1-1 Minamiohsawa, Hachioji-shi, Tokyo 192-0397, Japan. To whom correspondence should be addressed at Division of Biological Science, Graduate School of Science, Hokkaido University, Sapporo, Hokkaido 060-0810, Japan. Fax: 1 81-11-746-0862. 317 0006-2928/99/1000-03 17$16.00/0
r 1999 Plenum Publishing Corporation
318
Kobayashi, Tamura, and Aotsuka
(1998). The amount of mtDNA in the total genom ic DNA extracted from living beetles was estimated to be 50 ng/individual. From the ratio of the total size of mtDNA (Tamura, unpublished) and the length of the COI region, the amount of the target DNA was estimated to be 5 ng/living beetle. As we used 1/100th of the genom ic DNA extracted from a beetle, about 50 pg of the target DNA was considered to be included in the reaction mixture for PCR ampli® cation. The reaction mixture (100 m l) for PCR contained 10 mM Tris±HCl, pH 8.3, 50 mM KCl, 2 mM MgCl 2, 0.001% gelatin, a 200 mM concentration of each dNTP, a 200 nM concentration of each primer, approximately 50 pg of target DNA, and 2.5 U of Taq. The amount of the PCR product was estimated by the intensity of the DNA band on the agarose gel. For PCR, several polymerases other than Taq are now available. Although Pfu is known to have better ® delity (Lundberg et al., 1991), this polymerase often fails to catalyze ampli® cation when the target DNA for PCR is relatively large (more than 1.5 kb long) (Barnes, 1994). Barnes (1994) reported that the combination of Taq and a small amount of Pfu improved both the efficiency and the ® delity of PCR ampli® cation (see also Cline et al., 1996). Because Epilachna target DNA (about 1.6 kb long) was not ampli® ed by Pfu alone, we used a mixture of 2.5 U Taq 1 0.0125 U Pfu and obtained sufficient ampli® ed DNA. Ampli® cation was performed for 25 cycles in a DNA thermal cycler using the following parameters: 94ÊC for 30 sec, 60 ÊC for 1 min, and 72 Ê C for 1 min, except for the last cycle, which was 72 ÊC for 8 min. For cloning an ampli® ed DNA fragment, we used the plasmid vector pUC118 with E. coli K12. MV1184 as a host. Single-strand template DNA for sequencing was obtained by the pUC118/ 119-M13KO7 system (Vieira and Messing, 1987). PCR products were puri® ed with the QIAquick PCR puri® cation kit (Quiagen Inc.) for the template of the direct sequencing. The sequencing was performed using an ABI autosequencer according to the manufacturer’ s instructions. The nucleotide sites examined started from the 58 terminal of the COI coding region and ran for 1 kb. The sequences determined by the direct sequencing method were deposited in the DDBJ database (accession numbers AB002182±AB002229).
RESULTS AND DISC USSION The results are summarized in Table I. When only Taq was used, 20 additional substitutions compared to the correct sequences by direct sequencing were observed in 13 of 33 sequences determined. The ® delity, de® ned as the proportion of correct sequence, was thus 60.6% , suggesting that nearly half of the DNA fragments ampli® ed contain at least one different nucleotide from the original target sequence of 1000 bp. Almost all the substitutions (19/20) observed were transitions, and among them AT ® GC were predom inant (15/19) (data not shown). The same mutation
PC R Error and Molecular Population Genetics
319
Table I. Fidelity and Rate of PCR Error for Taq and Taq Plus Pfu Polymerases Number of sequences with substitutions
Taq Taq 1 a b
Pfu
0
1
2
3
Fidelity (%) a
20 19
7 3
5 0
1 0
60.6 86.4
Error rate (3 10 2
5 b
)
7.3 1.6
Proportion of correct sequence of 1000 bp. Rate of nucleotide substitution per site per duplication.
bias in PCR with Taq was also reported by Tindall and Kunkel (1988). The implication for these observations may be the speci® c nature of mispairs (A:C or T:G) of Taq polymerase. We estimated the PCR error rate using the formula given by Hayes (1965), that is, 2 3 observed error number/total DNA length examined/effective number of duplications (ed). The ed can be estimated from the template±product ratio. As the amount of PCR product ampli® ed from 50 pg template DNA was measured to be 5 m g, the ampli® cation efficiency and the ed were estimated to be 105 and 16.6, respectively. Therefore, the error rate was computed to be 7.3 3 10 2 5/site/ duplication. This value is consistent with the reports by Lundberg et al. (1991) and Flaman et al. (1994) but is somewhat higher than the estimate of Cline et al. (1996). Although the number of substitutions was very small, it may be said that a predominant bias to speci® c site preference is common to Taq and Pfu. As the ampli® cation efficiency was the same as that in the case of Taq only, the rate of PCR error was estimated to be 1.6 3 10 2 5/site/duplication. The error rate is not different from those of previous reports (Lundberg et al., 1991; Cline et al., 1996). These studies and our observations indicate that the use of Pfu or the addition of a small amount of Pfu to Taq is very effective in reducing PCR error. To assess the effects of PCR error on population genetic studies, in which many closely related sequences must be compared, we estimated several population parameters commonly used for describing or analyzing DNA polymorphism , i.e., the number of haplotypes (k), the number of segregating sites (s), and the nucleotide diversity ( pÃ) according to Nei and Li (1979). These parameters were estimated for 16 individuals of E. vigintioctom aculata. Table II shows the population parameters estimated from cloned DNA and original PCR products. All parameters estimated from the data including PCR error (cloned DNA) were considerably larger than those estimated from the correct data (original PCR products). This will lead to an overestimate of the amount of DNA polymorphism in a population. Tajima (1989) developed a method for testing the neutral mutation hypothesis (D) comprising s and p. In our
320
Kobayashi, Tamura, and Aotsuka Table II. Various Population Parameters Estimated for 16 E. vigintioctomaculat a COI Gene Sequences Obtained from Cloned DNA and Original PCR Products
Cloned DNA Original PCR products
ka
sb
pÃc
14 8
27 20
9.2 8.1
a
Number of haplotypes. Number of segregating sites. c Nucleotide diversity. b
case, Tajima’ s D was estimated to be 0.51 from the data including the PCR error, although this value is 1.37 from the correct sequences. PCR error is thought to contribute to increasing s but adds very little to pÃ. Thus, Tajima’s D estimated from the data including PCR error is expected to be smaller than the true value in this case. Although the two values of D were not signi® cantly different from zero, the effect of PCR error on the estimation of D might be serious in other cases and may lead to misinterpreting the mechanisms of maintaining DNA polymorphism . In our study, the PCR error was observed at random nucleotide positions throughout the target DNA (data not shown). If this is always the case, PCR error would likely conceal the underlying factors, such as natural selection, operating on DNA polymorphism in nature. At this time, the best way to obtain error-free sequence data from PCR products seems to be direct sequencing. Recent improvements in sequencing technologies have enabled us to determine DNA sequences by using the direct sequencing method with ease. However, this method is not always applicable. The target sequence of PCR for a nuclear gene is often heterozygous . In this case, the sequence of neither allele can be determined correctly by direct sequencing and we are obliged to obtain the sequence data from cloned DNA. To obtain sequence data free from PCR error, we have to determine the sequence for several clones. Since only 60% of PCR products are expected to retain the original target sequence when only Taq is used, frequently three clones must be examined even if the gene is homozygous . When the target gene is heterozygous, more sequences must be determined. However, by adding a small amount of Pfu to Taq, the ® delity of PCR increased substantially (86.4%). ACKNOWLEDGMENTS We would like to thank two anonymous reviewers who gave helpful comments on an early draft of the manuscript. We express our thanks to Miss M. Kumagai for her technical assistance throughout the experiment. This work was partly supported by a Grant-in-Aid for Scienti® c Research (C) from the Ministry of Education, Japan (No. 06640906), to T. Aotsuka.
PC R Error and Molecular Population Genetics
321
REFERENCES Barnes, W. M. (1994). PCR ampli® cation of up to 35-kb DNA with high ® delity and high yield from 1 bacteriophage templates. Proc. Natl. Acad. Sci. USA 91:2216. Cline, J., Braman, J. C., and Hogrefe, H. H. (1996). PCR ® delity of Pfu polymerase and other thermostable DNA polymerases. Nucleic Acids Res. 24:3546. Flaman, J. M., Frebourg, T., Moreau, V., Charbonnier, F., Martion, C., Ishioka, C., Friend, S. H., and Iggo, R. (1994). A rapid PCR ® delity assay. Nucleic Acids Res. 22:3259. Hayes, W. (1965). The Genetics of Bacteria and Their Viruses, Wiley, New York. Kobayashi, N., Tamura, K., Aotsuka, T., and Katakura, H. (1998). Molecular phylogeny of twelve Asian species of epilachnine ladybird beetles (Coleoptera, Coccinellidae) with notes on the direction of host shifts. Zool. Sci. 15:147. Kwiatowski, J., Skarecky, D., Hernandez, S., Pham, D., Quijas, F., and Ayala, F. J. (1991). High ® delity of polymerase chain reaction. Mol. Biol. Evol. 8:884. Lundberg, K. S., Shoemaker, D. D., Adams, M. W. W., Short, J. M., Sorge, J. A., and Marhur, E. J. (1991). High-® delity ampli® cation using a thermostable DNA polymerase isolated from Pyrococcus furiosus. Gene 108:1. Nei, M., and Li, W. H. (1979). Mathematical model for studying genetic variation in terms of restriction endonucleases. Proc. Natl. Acad. Sci. USA 76:5269. Saiki, R., Gelfand, K. D. H., Stoffel, S., Scharf, R., Higuchi, S. J., Horn, G. T., Mullis, K. B., and Erlich, H. A. (1988). Primer-directed enzymatic ampli® cation of DNA with a thermostable DNA polymerase. Science 239:487. Steller, H. (1990). Rapid small scale isolation of Drosophila DNA and RNA. In Rubin Laboratory Method Book, 2nd ed., Rubin Laboratory Press, CA, pp. 185±186. Tajima, F. (1989). Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:585. Tindall, K. R., and Kunkel, T. A. (1988). Fidelity of DNA synthesis by the Thermus aquaticuc DNA polymerase. Biochemistry 27:6008. Vieira, J., and Messing, J. (1987). Production of single-stranded plasmid DNA. Methods Enzymol. 153:3.