Molecular Diagnosis Vol. 5 No.4 2000
Automated Mass Spectrometry: A Revolutionary Technology for Clinical Diagnostics JAMES LEUSHNER, MSe, PhD, NORMAN H. L. CHIU, BSe, MSe, PhD San Diego, California
For various diagnostic analyses and the studies of functional genomics, the use of an accurate and cost-effective analytic platform to analyze large numbers of samples is essential. An automated platform called MassArray (Sequenom, Inc, San Diego, CA), designed for high-throughput diagnostic analyses, has recently been validated. The platform combines miniaturized, two-dimensional chip arrays with proven high-fidelity enzymatic procedures and matrix-assisted laser-desorption/ionization time-of-flight (MALDI-TOF) mass spectrometry. Nanoliter dispensing of samples in high-density formats of 384 or greater results in improved throughput and reduced costs. Automation prevails from the initial assay design through sample processing and data analysis, for the most part eliminating the labor component of assay development and implementation. The MassArray platform is being used in the following areas: (1) molecular diagnosis of genetic disease and infectious agents, (2) pharmacogenomics, (3) paternity and/or identity testing, and (4) agriculture (e.g., marker-assisted breeding). MALDI-TOF mass spectrometry can also be used for analyzing proteins; therefore, genotype/phenotype testing can be performed on a single platform. Key words: mass spectrometry, single nucleotide polymorphism.
Mass spectrometry is perhaps one of the most important analytic techniques developed in the last century. By measuring the intrinsic molecular mass, mass spectrometry can analyze many different types of chemicals. The distinct molecular masses of the four natural bases of DNA make it suitable for mass spectroscopic measurements. With the recent development of such soft ionization techniques as matrix-assisted laser desorption/ionization (MALD!) [1,2], intact molecular ions can be produced. In MALDI, a DNA sample is cocrystallized with excess light-absorbing matrix. Under high vacuum, the sample crystal is irradiated with an From Sequenom, Inc, San Diego, CA. Reprint requests: James Leushner, MSc, PhD, Sequenom, Inc, 11555 Sorrento Valley Rd, San Diego, CA 92121. Email:
[email protected] Copyright © 2000 by Churchill Livingstone® 1084-8592/0010504-0010$10.0010
doi: 10.1 054/modi.2000.1957 4
ultraviolet laser pulse that vaporizes and ionizes both DNA and matrix at the same time. Because the matrix absorbs most of the laser energy, DNA fragmentation does not usually occur. In the case of time-of-flight (TOF) mass spectrometry, the molecular ions are accelerated and passed over a flight tube, during which the ions are separated according to their mass to charge ratios [3]. The smaller the ion, the faster it reaches the end of the tube. By measuring the ions at the end of the tube over a short time, a mass spectrum is generated. Under the optimal conditions, MALDI produces predominantly singly charged DNA molecular ions. This simplifies the calculation of molecular masses. Using MALDI-TOF mass spectrometry, molecular masses can be determined with very high accuracy (0.01 % to 0.1 %). The entire process, including data acquisition, can be completed in approximately 3 seconds.
342 Molecular Diagnosis Vol. 5 NO.4 December 2000 Fig. 1. Pictoral representation of MassArray technology. In the MassEXTEND reaction, a primer is hybridized at a position adjacent to the polymorphic site. (Top) Through primer extension, the polymorphism is determined by the mass of the extended primer (Panel 2), followed by the crystallization of the extended primer with a selected matrix on a SpectroCHIP. (Panel 3) Using the SpectroREADER (MALDI-TOF mass spectrometer), the molecular mass of the extended primer is accurately determined. (Bottom) The acquisition and interpretation of mass spectrum are automatically performed by a MassArray workstation.
Based on MALDI-TOF mass spectrometry, a high-throughput platform called MassArray (Sequenom, Inc, San Diego, CA) has been developed for the detection of genetic variation (Fig. 1) [4]. This fully integrated system is faster, less expensive, and more versatile than oligoarray-based methods. The technology has been used extensively for the analysis of nucleic acids. The power of MassArray resides in its ability to distinguish changes in the mass of DNA without using a reporting label. On the MassArray, a universal method, namely MassEXTEND (Sequenom, Inc), is used to detect substitution, deletion, and insertion mutations and/or polymorphisms. In the MassEXTEND method (Fig. 1), a specific region of genomic DNA is amplified. The amplification product is then used as a template for a specific primer extension reaction. A primer anneals at an upstream position adjacent to a polymorphic site and is extended by high-fidelity DNA polymerase in the presence of a preselected mixture of dideoxynucleotide triphosphates and deoxynucleotide triphosphates. After eluting the extended primer from the template, nanoliter quantities of the extended primer are dispensed in parallel onto a silicon chip, namely a SpectroCHIP (Sequenom, Inc) that consists of a miniaturized array of preloaded matrix. Nanoliter sample volumes improve the reproducibility of crystallization and compatibility with automated data acquisition [5]. Using a high-throughput MALDI-TOF mass spectrometer, SpectroREADER (Sequenom, Inc), the molecular mass of each extended primer is subsequently determined with high accuracy. Femtomolar quantities of DNA can be detected. The interpretation of mass spectrum is automatically performed by a MassArray workstation. Up to 3,840 different samples can be analyzed in each measurement. All the results can be easily linked to other customized databases. The system requires no special maintenance. Each MassEXTEND assay costs less than a dollar, which is similar to many other currently available assays. Multiplexing both the PCR and MassEXTEND
reactions allows for greater efficiencies, reducing costs per test to less than 20 cents. Several thousand single nucleotide polymorphisms (SNPs) have been validated using the MassArray system. When the identity of a nucleobase at a particular position varies at a frequency greater than 1% within a popUlation, the variation is defined as an SNP. For the human genome, there is estimated to be one SNP per 1,000 bp, the most frequent genetic variation. Many scientists now believe the determination of disease-related SNPs may hold the key to the development of personalized medicine [6]. Because DNA labeling is not required, each assay on the MassArray platform can be easily adopted to analyze multiple SNPs [6]. A select group of validated applications for the MassArray is described next.
Selected Applications The normal target plate in mass spectrometers requires large volumes of sample and a timeconsuming hit-and-miss multiple sampling procedure. Conversely, the SpectroCHIP requires 1Il,000th of the volume (nanoliters) and miniaturizes the target such that in most cases, only one laser shot is required. These miniaturized mass spectroscopic measurements on the SpectroCHIP provide unprecedented accuracy, high throughput through automation, and low-cost identification of genetic variations. The spectral tracing of defined probe masses provides highly reproducible and distinct allelic signatures. High-fidelity polymerases are used in the reaction; therefore, the measured diagnostic signals are of sufficient quality to be error free. This provides a simple, rapid assay procedure that can be adapted to high-throughput robotic systems. Hemoglobin S Mutation Detection Direct DNA detection of variant hemoglobins has usually relied on hybridization with allele-
Automated Mass Spectrometry •
-
PoIymoq>hIc S
Leushner and Chiu
-
----~)~(----~
MassEXTEND primer extension
PeR
! ! II I I I I I I I I I !
7
JTTI )(
l
.'
t Nanoliter Dispensing onto SpectroCH I Ps
t Mass Spectrometry using SpectroREADER
t Data Analysis using MassArray Workstation
_-
....- ._-._-. ._- ----.... .-- ---.-
--
343
344 Molecular Diagnosis Vol. 5 No.4 December 2000 specific oligonucleotide probes or restriction endonuclease analysis. Amplification methods have greatly facilitated implementation of all these detection methods. In the MassEXTEND assay for hemoglobin S (HbS), allele-specific hybridization procedures are replaced by allele-specific extension reactions, and stringency is obtained at three levels: the specific amplification, hybridization, and extension of the MassEXTEND primer. As in most probes or restriction fragment length polymorphism-based tests, the interpretation of the results can be labor intensive and therefore not applicable to large-scale screening programs because of prohibitive costs [7,8]. The HbS assay developed for the MassArray system has been easily automated and therefore can be used in large-scale applications. The HbS assay is also a good example of the dual power of the MALDI-TOF platform. The SpectroCHIP preloaded with a protein-specific matrix can be used to identify hemoglobins extracted directly from red blood cells. This simple change in matrix allows both genotyping and phenotyping to be performed on the same assay chip. This dual capability can be extended to other systems, allowing a universal platform for concomitant proteomic and genomic analyses. This is of primary importance when the ultimate translated product of the gene of interest undergoes extensive posttranslational modifications before activity. Most states in the United States have a screening program for sickle cell disease, making it an ideal candidate for a high-throughput analysis platform. Sickle cell disease refers to a collection of autosomal recessive genetic disorders characterized by a hemoglobin variant called HbS. Sickle cell trait is the heterozygote individual HbS/hemoglobin A (HbA). The molecular nature of this hemoglobin variant is the substitution of valine for glutamic acid at the sixth amino acid position in the l3-globin gene. The change from glutamic acid to valine in sickle hemoglobin was first reported in the late 1950s [9]. The MassEXTEND primer, Pcod5, binds adjacent to codon 6 in codon 5 and consists of the sequence CCATGGTGCACCTGACTC. Predetermined MassEXTEND reactions (Fig. 2) are performed to produce DNA fragments of distinct masses corresponding to each allele, defined by the polymorphism at the designated position. In the presence of dideoxynucleotide, dideoxyadenosine, and the remaining deoxynucleotides, Pcod5 is ex-
tended by four bases (CCA TGGTGCACCTGACTCCTGA) in the presence of a normal allele, and by seven nucleotides (CCATGGTGCACCTGACTCCTGTGGA) with the HbS alleles. In addition, an extension product of three nucleotides (CCATGGTGCACCTGACTCCT A) defines the hemoglobin C (HbC) allele. An additional mutation in codon 5 found in Thai populations results in a two-bp extension (CCATGGTGCACCTGACTCGA). The mass spectra obtained from these various genotypes are shown in Fig. 2. Miniaturized protein sample preparation on a SpectroCHIP by MALDI-TOF mass spectrometry provides accurate phenotypes from erythrocytes: HbA (15,867.22 Da), HbS (15,837.22 Da), and HbC (15,866.22 Da; Fig. 3). Accurate phenotyping is able to identify HbS/HbA and HbSIHbC, the masses of which differ by 1 Da. Phenotyping results confirm the genotype without ambiguity. Correct HbA-HbS ratios (60:40) are preserved in protein spectra. For patients administered transfusions, their genotype SS or SC can only be identified by DNA analysis of leukocytes. Their erythrocytes are dominated by transfused phenotype. Phenotypes of newborn samples are characterized by the presence of fetal hemoglobin (a- and "{-chains). SNP Mapping
Each of our genomes differ by approximately 0.1 %. Biallelic SNPs comprise four distinct types: one transition C'-'"7T (G'-'"7A) and three transversions C'-'"7A (G'-'"7T) , C'-'"7G (G'-'"7C) , and T '-'"7A (A'-'"7T). More than 60% of SNPs involve the C'-'"7T (G'-'"7A) variety, whereas the other three types represent approximately 10% each. In addition, 50% of all coding-sequence SNPs result in non synonymous codon changes. The typical frequency of SNPs in an entire population is approximately 11300 bp, and in genomic DNA from two equivalent chromosomes, it is approximately III ,000 bp (nucleotide diversity). By screening more individuals (more chromosomes), more base differences can be found, but the nucleotide diversity index remains unchanged. Rare variants in the population are at a frequency of less than 1%, each represented by only a small number of (or individual) chromosomes. Many of those polymorphic SNP sites are relevant in disease-association studies. Therefore, the new diagnostic target is the SNP map. Hundreds of
Automated Mass Spectrometry
•
Leushner and Chiu
345
7621.308
Homozygote
SS
7619.30a
Heterozygote
SC
unextanded
PROBEprlmlll" 5436.908
5000
6000
aoou
100(,
6659.20a
m/z
Fig. 2. Genotyping j3-hemoglobin assay. (Top) The mass spectrum of the homozygous sickle cell genotype; (panel 2) compound heterozygote hemoglobin S and hemoglobin C; (panel 3) the normal homozygous hemoglobin A individual; (bottom) the sickle cell trait individual with both hemoglobin A and hemoglobin S alleles.
Wild type Homozygote AA
t~~~"""""""'j~j .~-~. ~-
!5000
6000
7000
j
800G
6661.108
Heterozygote SA
7624.80a
5000
6000
7000
thousands of disease-related SNPs have been de~ termined using MassArray. At least 4,000 treatable diseases have a major genetic component. The future of diagnostics is in the many ongoing SNP discovery programs that are mining the clinically relevant, and thus diagnostically relevant, SNPs. Many scientists now believe the determination of
eooo
disease-related SNPs may hold the key to the development of personalized medicine [5}. Common polymorphisms in drug targets dictate that DNA sequence variations be. considered in the genomic screening processes aimed at new drug developmeEt. This will aid in the development of medications targeted to pathways in disease pathogenesis
346
Molecular Diagnosis Vol. 5 No.4 December 2000
.... ,..:
..~
...
i...
N 10
...
10
.... HbA
age: 61 years
HbA+ HbS
age: 13 years
i... HbS + HbC + HbF
age: unknown
HbS + HbF
i...
age: 22 years
Fig. 3. Phenotyping [3-hemoglobin assay. (Top left) Normal hemoglobin A (HbA) from a 61-year-old patient; (top right) HbA plus hemoglobin S (HbS) from a young individual with sickle cell trait; (bottom left) from an HbS plus hemoglobin C (HbC) compound heterozygote expressing fetal hemoglobin (HbF); (bottom right) from a patient with sickle cell trait expressing fetal hemoglobin at 22 years of age.
and medications that can be used to prevent diseases in genetically predisposed individuals. These pharmacogenomic studies will allow for the development of therapeutics to genetically identifiable subgroups of the population. The present strategy is the development of medications that are safe and effective for every member of the population. This strategy is sound from a marketing perspective but is a pharmacologic long shot because patients are genetically diverse, and they can have diseases with heterogeneous subtypes. Because DNA labeling is not required, each assay on the MassArray platform can be easily adopted to analyze multiple genetic variations. For example, multiple SNP-based tests (the combined polymorphisms in the factor V Leiden, factor V HR2, prothrombin, and MTHFR genesJ are already analyzed in a combined MassEXTEND assay and in large numbers in young women of childbearing age. Thousands of these tests are performed every week. This test is being performed routinely by MALDI-TOF. Additional high-throughput multiple SNP-based tests, such as hemochromatosis, cystic
fibrosis, Tay Sachs, P450, NATI, NAT2, and many others, have been validated on the platform. Bioinformatics has a major role in SNP assay design because SNP validation studies can involve thousands of assays. The assay design software by Sequenom, Inc, automatically inputs sequences and designs amplification and MassEXTEND primers, termination reactions, and the degree of multiplexing. Thousands of assays can be designed every day. The success rate for using the software to develop new assays is repeatedly greater than 85%. The failures are usually caused by the common problems Qf primer design, such as highly GC-rich regions and repeated sequences surrounding a polymorphism. Association studies used to find novel susceptibility genes involved in complex, multi gene diseases or to determine drug-friendly versus nonfriendly genotypes may require the characterization, or scoring, of tens or ideally hundreds of thousands of SNPs among each of thousands or tens of thousands of individuals. This is the realm of industrial genomics and pharmacogenomics. No
Automated Mass Spectrometry
gel-based or fluorescence-based technology can handle these new multi-SNP-based tests and be commercially viable. Paternity and/or Identity Testing SNP-based assays can also be used in the area of paternity and/or identity testing. Sequenom has adopted a series of 40 widely distributed SNPs to form a simple assay for human identity. The development of a genetic map based on SNPs has many advantages with respect to population-based analysis of the human genome. (1) SNPs have low mutation rates in populati.ons, and their allele frequencies can be estimated easily in any population. (2) SNPs are highly stable genetic markers compared with short-tandem repeat (STR) markers in which the high mutation rates can confound genetic analysis in populations. (3) Present amplification technologies produce artifacts that make STR determinations difficult to interpret. (4) Sequenom's MassEXTEND reactions allow SNPs to be assayed in an automated fashi,on, reducing the costs and complexity for the testing of samples. All data are automatically interpreted by the SpectroTYPER software (Sequenom, Inc). Nearly all 40 SNPs are amplified in a single multiplex PCR, which is then divided into two or three MassEXTEND reactions according to the termination requirements of each SNP. In the multiplexed reactions, the masses of each MassEXTEND primer and their terminated products are designed to be distinct. After the primer extension, all extended primers that correspond to different SNPs" can be easily separated in a single mass spectrum because their molecular masses are different by at least 50 Da. The major advantage of this approach is a significant reduction of the cost for detecting each individual SNP. In comparison, STR-based tests have proved to be expensive and not always easy to interpret. A key question is whether 40 markers are adequate to provide discriminatory power. Because SNPs are biallelic systems, each individual will have two alleles for a particular marker. This makes the information content per SNP marker relatively low compared with STR markers, which may have more than 10 alleles. SNPs have the same potential discriminatory power as highly polymorphic STRs and variable nucleotide tandem repeats, but more of them are needed. This is obviously true for forensic identification, such as complex problems involving
•
Leushner and Chiu
347
siblingship or drawing inferences from such forensic samples as mixed stains. SNPs gain in power when extended haplotyping canbe performed, tying several SNPs together. Several projects are ongoing to provide extended haplotypes over large distances, and these data will increase the discriminatory power of SNP paternity testing.
Conclusions With the creation of a genome-wide SNP map in the near future now a virtual certainty, there is a running debate about how to use it. Nobody knows how large test populations have to be or how many SNPs per DNA sample must be examined to yield meaningful data. Of course, the numbers will vary immensely according to the application. Sequenom is in the process of forming collaborations with academic and commercial groups to expand our list of SNP candidates. The MassArray platform allows for the automated sample processing and handling that can make multi-SNP analysis of thousands of individuals commercially viable. Present development has produced the SpectroCHIP in a 384 format that can be miniaturized further because each matrix spot (diameter, 300 fJ-m) is usually enough for obtaining a satisfactory MALDI mass spectrum. Thus, with even more miniaturization, the reaction volumes could be reduced more than 100-fold. This will reduce further the cost of genotyping. Such miniaturization also will enable higher-density DNA chips that will minimize sample-stage movement in mass spectrometers and increase the speed of signal acquisition for highthroughput analysis. All these future steps seem practical, and they ensure that the genotyping cost per SNP soon will be significantly less (<1 ¢ should be possible). Even with present technology that uses pooled samples in multiplex formats, the cost per SNP is low. Many alternative methods for the generation of DNA fragments are being developed, some based on restriction enzyme cleavage and others based on the incorporation of unique bases that can be cleaved by defined enzymes [12]. The system has also been used to detect unknown SNPs, micros atellite markers, signature DNA sequencing, and phenotypic mutation on proteins [12-18]. These latter techniques allow more robust sequencing of large polymorphic tracts and microsatellite repeats. Infra-
348
Molecular Diagnosis Vol. 5 No.4 December 2000
red MALDI will allow the analysis of DNA sequences rip to 1,000 bases, which will broaden the use of the technique, especially in the area of sequencing, and improve the analysis of hypervariable regions,. such as those found in class I and II HLA molecules. Many of the potential applications in the area of viral genotyping, such as the 5' nonstructural region of hepatitis C virus and bacterial typing using the 16S gene, will also become available with the advent of new fragmentation methods and instruments that are presently being validated. Received May 5,2000. Received in revised form July 14,2000. Accepted August 15, 2000.
References 1. Karas M, Hillenkamp F: Laser desorption ionization of proteins with molecular masses exceeding 10,000 daltons. Anal Chern 1988;60:2299-2301 2. Fitzgerald MC, Smith LM: Mass spectrometry of nucleic acids: The promise of matrix-assisted laser desorption-ionization (MALDI) mass spectrometry. Annu Rev Biophys Biomol Struct 1995;24:117-140 3. Cotter RJ: The new time-of-flight mass spectrometry. Anal Chern 1999;71:445A-451A 4. Jurinke C, van den Boom D, Cantor CR, Koster H: Automated genotyping using the DNA MassArray technology. Methods Mol BioI (in press) 5. Cantor CR: Pharmacogenetics becomes pharmacogenomics: Wake up and get ready. Mol Diagn 1999; 4:287-288 6. Little DP, Braun A, O'Donnell MJ, Koster H: Mass spectrometry from miniaturized arrays for full comparative DNA analysis. Nat Med 1997;3:1413-1416 7. Saiki RK, Chang C-A, Levenson CH, et al.: Diagnosis of sickle cell anemia and beta-thalassemia with enzymatically amplified DNA and nonradioactive
allele-specific oligonucleotide probes. New Engl J Med 1988;319:537-541 8. Saiki RK, Scharf S, Faloona F, et al.: Enzymatic amplification of beta-globin genomic sequences and restriction site analysis for diagnosis of sickle cell anemia. Science 1985 ;230: 1:350-1354 9. Ingram VM: A specific chemical difference between the globins of normal human and sickle-cell anemia haemoglobin. Nature 1956; 178:792-794 10. O'Donnell MJ, Tang K, Koster H, Smith CL, Cantor CR: High density, covalent attachment of DNA to silicon wafers for analysis by MALDI-TOF mass spectrometry. Anal Chern 1997;69:2438-2443 11. Koster H, Tang K, Fu D, etal.: A strategy for rapid and efficient DNA sequencing by mass spectrometry. Nat Biotechnol 1996;14:1123-1128 12. Laken SJ, Jackson PE, Kinzler KW, et al.: Genotyping by mass spectrometric analysis of short DNA fragments. Nat Biotechnol 1998;16:1352-1356 13. Fu D, Tang K, Braun A, et al.: Sequencing exons 5 to 8 of the p53 gene by MALDI-TOF mass spectrometry. Nat Biotechnol 1998;16:381-384 14. Braun A, Little DP, Koster H: Detecting CFTR gene mutations by using primer oligo base extension and mass spectrometry. Clin Chern 1997;43:1151-1158 15 Braun A, Little DP, Reuter D, Mtiller-Mysok B, Koster H: Improved analysis of micro satellites using mass spectrometry. Genomics 1997;46:18-23 16. Little DP, Cornish TJ, O'Donnell MJ, Braun A, Cotter RJ, Koster H: MALDI on a chip: Analysis of arrays of low- to sub-femtomole quantities of synthetic oligonucleotides and DNA diagnostic products dispensed by piezoelectric pipette. Anal Chern 1997;69:4540-4546 17. Little DP, Braun A, O'Donnell MJ, Koster H: Mass spectrometry from miniaturized arrays for full comparative DNA analysis. Nat Med 1997;3:1413-1416 18. Berkenkamp S, Kirpekar F, Hillenkamp F: Infrared MALDI mass spectrometry of large nucleic acids. Science 1998;281 :260-262