Fresenius J Anal Chem (2000) 366 : 677–690
© Springer-Verlag 2000
REVIEW
Martin R. Larsen · Peter Roepstorff
Mass spectrometric identification of proteins and characterization of their post-translational modifications in proteome analysis
Received: 16 December 1999 / Accepted: 17 December 1999
Abstract High-throughput DNA sequencing has resulted in increasing input in protein sequence databases. Today more than 20 genomes have been sequenced and many more will be completed in the near future, including the largest of them all, the human genome. Presently, sequence databases contain entries for more than 425.000 protein sequences. However, the cellular functions are determined by the set of proteins expressed in the cell – the proteome. Two-dimensional gel electrophoresis, mass spectrometry and bioinformatics have become important tools in correlating the proteome with the genome. The current dominant strategies for identification of proteins from gels based on peptide mass spectrometric fingerprinting and partial sequencing by mass spectrometry are described. After identification of the proteins the next challenge in proteome analysis is characterization of their post-translational modifications. The general problems associated with characterization of these directly from gel separated proteins are described and the current state of art for the determination of phosphorylation, glycosylation and proteolytic processing is illustrated.
Introduction As a consequence of new developments in advanced techniques for the determination of DNA sequences, the publicly available genome information has grown exponentially over the last decade. Genome sequencing projects that sounded optimistic only ten years ago have been completed and today more than 20 genomes from model organisms have been fully sequenced, ranging from simple organisms like E.coli to more complex organism like Saccharomyces cerevisiae, with the human genome, as the largest and
M. R. Larsen · P. Roepstorff (쾷) Department of Molecular Biology, University of Southern Denmark, Odense Univsersity, DK-5230 Odense M, Denmark e-mail:
[email protected]
most complex, planned to be completed in the beginning of the next century. Single-pass sequencing of cDNA has furthermore resulted in an explosion of the information contained in the expressed sequence tag (EST) databases, which currently contain approximately 3 million ESTs. Of these more than 1.5 million originate from the human genome and are estimated to cover between 50 and 90% of the total genome [1]. Since the majority of the entries in the common sequence databases are derived from cDNA or other genomic DNA information, very little if any information is contained in the databases about the expression level of the gene products, the protein location, function and regulation in the cell or about the post-translational modifications of the expressed proteins. Therefore, the need for development of reproducible and sensitive methods for analyzing the proteins expressed in a cell or cell type seemed obvious. The term proteome was introduced in 1995 [2] and later defined as the entire protein complement expressed by a cell or a cell culture at a given time [3]. Investigation of the proteome of different cell types is predominantly accomplished by the presently best method to separate the majority of the proteins in a cell, i.e., two-dimensional gel electrophoresis (2 DE), combined with a very sensitive method to identify the separated proteins relative to the genome, e.g. mass spectrometry. In 1993 a number of groups independently demonstrated that it was possible to identify proteins relative to their gene sequences by mass speetrometric peptide mass mapping [4–7]. Later, mass spectrometric sequencing of peptides was launched as an alternative method to correlate the protein with the genome [8]. The increased sensitivity of the mass spectrometric methods (femtomole level) has ousted other methods such as traditional amino acid sequencing for identification of electrophoretically separated proteins. Protein identification by mass spectrometric methods is now fully accepted and its use is widespread in numerous applications. Several studies have successfully used amino acid sequences generated by tandem mass spectrometry to search the EST databases (e.g. [9]) and for generation of DNA probes for cloning of unknown proteins [10].
678
Once the proteins are identified on the gel, the next obvious step is characterization of their post-translational modifications, if such are present in the proteins of interest. In higher eukaryotes a majority of all proteins are modified and these modifications are often essential for the function of the protein. Some of the modifications change the solubility of the protein, others are used as molecular switches to activate, inactivate or modify the biological activity of the protein and yet others are used to locate the proteins to different compartments in the cell. Since a modification either increases or decreases the molecular mass of the affected amino acid, mass spectrometry has with its unique sensitivity, high mass accuracy and its ability to deal with complex mixtures, proven to be the method of choice for micro-characterization of post-translational modifications in general [11–14]. It will be advantageous to be able to perform full characterization of the proteins directly from the small amounts of protein present in a spot on an electrophoretic gel instead of performing the timeconsuming and often difficult purification of the protein using traditional purification methods. However, only a limited number of studies using gel electrophoresis followed by mass spectrometric analysis for characterization of post-translational modifications in proteins directly from the gel has been reported. Below, we will give a short overview of the principles of protein identification in proteome analysis by different mass spectrometric methods. The main emphasis in this paper, however, will be on mass spectrometric micro-characterization of post-translational modifications in electrophoretically separated proteins. The examples described here will mainly be based on our current studies performed in collaboration with the Center for Proteome Analysis in Life Science at Odense University in Denmark [15]. Since a comprehensive review of the field is not intended, examples from other research groups will only be described when appropriate.
Protein identification Strategy The presently preferred protein identification strategy in proteome analysis is entirely based on mass spectrometric methods because these methods have reached a level of confidence at least comparable to that obtained by traditional amino acid sequencing of the electrophoretically separated proteins. Moreover, mass spectrometry is faster, simpler, tolerant towards a number of low molecular weight contaminants, much more sensitive than traditional sequencing (less than nano-gram of starting material), and allows identification of the proteins in simple mixtures. The proteins are identified by searching sequence databases with the mass spectrometric data set using specially designed algorithms. The strategy developed by Shevchenko and coworkers [16], illustrated in Fig. 1, has in a number of proteome studies proved to fulfill the criteria for highfidelity protein identification from electrophoretic gels.
Fig. 1 Strategy for mass spectrometric identification of proteins separated by gel electrophoresis
The strategy is built on the complementary use of two different mass spectrometric methods. The fast, high-throughput protein identifications are made by peptide mass mapping by matrix assisted laser desorption/ionization timeof-flight mass spectrometry (MALDI TOF MS). A small portion, typically less than 10%, of the peptides derived from tryptic in-gel digestion of the protein, is used for the analysis. The obtained peptide mass map is a unique fingerprint of the protein, which is searched against a comprehensive database containing the theoretically calculated peptide mass map for all the proteins contained in the database. The fidelity of the identification is evaluated by intelligently searching the peptide mass map for additional information as will be described below. It is our experience that the majority of the analyzed proteins can be identified unambiguously by this approach, provided that the protein sequence is contained in the searched database. For organisms where the genome is completely sequenced, only the proteins present in very low copy numbers or proteins that contain very few tryptic cleavage sites fail to be identified by this approach. Occasionally, heavily modified proteins are found to be very difficult to identify by peptide mass fingerprinting because most derived peptides will contain modifications and thereby not match the theoretically calculated peptide masses. In cases where the first approach fails, the remaining portion of the peptide mixture is desalted and analyzed by nanoelectrospray tandem mass spectrometry [17]. Using this technique a partial sequence or a fragment “fingerprint” of the selected peptide can be generated. The partial sequence is together with its associated masses assembled to generate a peptide sequence tag [18] which can be used as a highly specific probe to identify proteins in protein sequence databases or expressed sequence tag (EST) databases [9]. Alternatively, the fragment “fingerprint” from the uninterpreted MS/MS spectra can be used by a fast Fourier
679
Fig. 2 Identification of two proteins co-migrating in one spot on a 2 D gel of a protein preparation from yeast cells. (A) A section of the 2 D gel covering the area from where the spot was excised. The spot is indicated with an arrow. (B) High mass accuracy MALDI peptide mass map of the peptide mixture generated by tryptic ingel digestion. The peptides were purified on a microcolumn and eluted directly onto the MALDI target using α-cyano-4-hydroxy cinnamic acid in 70% acetonitrile as the matrix. Stars indicate the peptides assigned to the METL sequence and dots those assigned to the LYS9 sequence. The inset illustrates the signals observed by cleavage of adjacent tryptic cleavage sites of the sequence …RRIDTVV……ADLR leads to two peptide ion signals at m/z 2154.18 and 2310.25 differing by 156.07 Da corresponding to an arginine residue. (C) Two proteins were identified to be present in the spot with 13 and 11 matching peptides, respectively, by database searching
transformation algorithm (SEQUEST) to search sequence databases for identification of proteins [8, 19]. In cases where the protein of interest is unknown, tandem MS can be used to obtain enough information to generate specific DNA probes for cloning studies [20, 21]. Identification using peptide mass fingerprinting The MALDI MS peptide mass fingerprinting strategy has been reviewed recently [22]. The concept is illustrated by protein identification from a yeast protein preparation sep-
arated by 2 DE, and visualized by silver staining (Fig. 2 A). The estimated amount of protein in the spot is in the low picomole level (low ng). The specific 2 D gel had been stored dry for more than four years, but in spite of this, the quality of the data were as good as for a freshly prepared gel. From the gel coordinates the protein was estimated to have a molecular weight of 45 KDa and a pI value of 5.1. A protein spot was excised and subjected to in-gel digestion with trypsin as described previously [23]. A small portion, approximately 5–10%, of the in-gel generated peptides were desalted on a custom prepared micro-column [24] and analyzed by MALDI MS. The recorded peptide mass map is illustrated in Fig. 2 B. The peptide masses obtained were used to search the protein sequence database (NRDB, European Bioinformatics Institute, Hinxton, UK) using the database search program PeptideSearch [5, further developed at EMBL, Heidelberg, Germany]. A total of 49 peptides were used in the search. The protein METL from yeast (swissprot|P19358) was identified as the first hit with 13 matching peptides (Fig. 2 C). The second listed protein sequence, also with 13 matching peptides, is identical to METL but comes from another database (trembl). Detailed evaluation of the peptide mass map together with the second pass search feature in the database search program [22] revealed that one additional peptide could be assigned to the METL sequence due to incomplete cleavage, and that the matched peptides covered 35% of the se-
680
Fig. 3 MALDI peptide mass map of in-gel generated peptides from enolase 2. The matrix used in this experiment was α-cyano4-hydroxy cinnamic acid in 70% acetonitrile. The inset illustrates the signal from a peptide containing an oxidized methionine residue and the corresponding metastable fragment ion signal
quence. The distance to the randomly scoring peptides was relative large (6 peptides), except for the LYS 9_YEAST sequence with 11 matching peptides which might indicate the presence in the spot of this protein also (see below). To confirm that the identification of METL was unambiguous, other parameters like molecular weight, the distance in matched peptides to the background of randomly scoring proteins, the appearance of alternative cleavage patterns in the sequence coverage [25] and methionine oxidation was considered. Alternative cleavages are present at tandem tryptic cleavage sites, i.e., Lys-Lys, Lys-Arg, Arg-Lys or Arg-Arg, which are commonly found in proteins. Cleavage at either one or another of these sites in the protein generates a mixed population of peptides that differ in mass of either one arginine or one lysine. The possibility that a different protein should contain peptides that differ with exactly these masses is very low and can therefore support the identification of the protein. Three of such alternative cleavages could be found in the MALDI peptide map and all could be used to support the identification of METL. One of these, covering the METL amino acid sequence 187/188–207, is illustrated in the insert of Fig. 2 B. The molecular weight and the pI value calculated based on the sequence were found to be consistent with those estimated from the position on the gel (Fig. 2 A). Thus, the protein could be considered unambiguously identified. The high number of peptides detected in the MALDI
analysis compared to the relative low number of matching peptides indicate the presence of another component in the gel spot. Co-migration of two or more proteins is a general feature of 1D SDS PAGE but is also frequently observed in 2 DE. This, together with contamination of protein samples with human keratins during sample preparation, generates a population of peptides derived from different components. Resolving such mixtures has previously required chromatographic methods and subsequent sequencing of the separated peptides. However, the improved mass resolution and accuracy obtained by delayed extraction MALDI MS [26] allows identification of proteins in simple mixtures [27]. In the example shown above the peptide masses assigned to the METL sequence were subtracted from the peptide mass list and the remaining masses were used in another database search which unambiguously identified LYS 9 with 14 matching peptides including those which could be assigned to missed cleavages and methionine oxidation. The remaining peptide masses in the peptide mass map that could not be related to any of the two sequences could be assigned to tryptic peptides derived from human keratin contamination or trypsin related peptides generated by auto-proteolysis. Partial methionine oxidation induced during sample preparation or as a consequence of extended storing of the sample can also yield useful additional information for protein identification. Oxidation of methionine residues to methionine sulfoxide generates an ion signal in the peptide mass map 15.98 u above the non-oxidized species together with a pronounced metastable fragment ion originating from the loss of methanesulfenic acid (CH3SOH) from the oxidized ion [28]. The fragmentation takes place in the first field free drift region after the ion acceleration
681
in the MALDI instrument. Because the mass spectrum is recorded in reflector mode a less resolved peak will appear at a lower mass corresponding to the loss of methanesulfenic acid. This is illustrated in the MALDI peptide mass map (Fig. 3) from an in-gel digestion of a protein from the same yeast preparation as above. The peptide mass map identified the protein as enolase 2. The ion signal at 2543.18 Da was identified to the sequence stretch 3255, which include a methionine residue. The oxidized peptide is located 15.98 u above. The characteristic poorly resolved metastable fragment ion is located at a mass below that of the fully resolved non-oxidized and oxidized forms. Oxidation of tryptophan residues induced during the electrophoretic separation has been reported recently [29]. It might be used in a similar way, but is less frequently observed than methionine oxidation. The peptide fingerprinting strategy has a number of limitations. Firstly, the number of peptides included in the database search must generally be 8 or more to obtain a high fidelity identification. The main reason is that the protein sequence database is exponentially growing (at present more than 425 000 sequences) and consequently the number of accidentally matched proteins will increase. Thus, the protein must contain a sufficiently high number of tryptic cleavage sites in order to generate enough peptides in the mass range between 800 Da and 3500 Da suitable for identification. Secondly, highly modified proteins are difficult to identify by this approach because the modifications change the peptide masses and therefore they do not fit the theoretically calculated masses in the database. Thirdly, the rapidly growing databases increase the need for high mass accuracy (below 70 ppm) resulting in increasing instrument costs. We believe that the suppression effect caused by the presence of metal ions and other contaminants originating from the gel has for a long time been the real limitation for identification of very low amounts of proteins from gels. Recent development in matrix preparation methods, e.g., inclusion of nitrocellulose in the matrix preparation [30], and desalting on micro-columns prior to MALDI MS analysis [24] have in our hands considerably improved the sensitivity (see Fig. 5). Since MALDI MS peptide mass mapping is relatively straight forward, several attempts to automate the identification procedure have been made [31–33]. However, it seems that the sensitivity obtained with the automation procedures presently cannot match that obtained by careful manual preparation. Consequently, automated identification by peptide mass mapping has only been reported successfully from gels in which the proteins have been visualized by coomassie blue staining, i.e., relatively high protein amounts. Identification using partial sequence information When identification by peptide mass mapping fails, the remaining portion of the peptides from the in-gel digestion is desalted on a micro-column and analyzed by tandem
mass spectrometry to obtain partial sequence information [17, 34, 35]. With the development of the nanoelectrospray source for electrospray ionization with a flow rate below 30 nL/ min [17] the sensitivity has increased dramatically and presently sub-picomole analysis is routine in several research groups. The additional use of a time-of-flight ion analyzer as the second mass analyzer [36, 37], allows full isotope resolution of the fragment ions and thereby assignment of the charge states, facilitating the interpretation of the often complex fragment ion spectrum. Three possible strategies can be used to identify proteins based on partial amino acid sequence generated by tandem mass spectrometry. Firstly, the obtained amino acid sequence can be searched against the protein sequence database using the algorithms developed for search based on traditional amino acid sequencing. However, due to the increasing size of the protein databases an unambiguous identification requires a continuous sequence of at least 6– 8 amino acid residues. Such long sequences are often not readily assigned by tandem MS. Secondly, the mass of a selected peptide, the cleavage specificity (trypsin) and a partial amino acid sequence together with the associated masses can be assembled to a peptide sequence tag [18]. Thirdly, uninterpreted CID spectra obtained by LC MS/ MS of the generated peptide mixture can be used to identify the protein in the database [8, 19] using a special algorithm to match the uninterpreted CID data set with the theoretically derived fragmentation pattern for peptides derived by theoretical tryptic digestion of the proteins in the database. We generally use one of the first two concepts as illustrated with the above described peptide mixture derived from the spot containing the two proteins METL and LYS 9. A small portion of the peptide mixture (approximately 20%) was desalted on a micro-column and analyzed on a quadrupole time-of-flight mass spectrometer (Micromass, Manchester, UK) equipped with a nanoelectrospray ionization source. The doubly charged ions m/z 722.92 Da and 754.91 Da corresponding to a peptide from each of the two proteins, respectively, were selected and fragmented in the collision cell using argon as collision gas. The fragment spectra are shown in Figs. 4 A and B. Peptides preferentially fragment at the amide bond to generate N-terminal (B-type) ions and/or C-terminal (Y-type) ions [38]. In the case of tryptic peptides it is our experience that Ytype ions are more pronounced in the higher mass area. From the collision induced dissociation (CID) spectrum of the doubly charged ion at m/z 722.91 Da (Fig. 4 A) a partial sequence consisting of 7 amino acid residues could be assigned from the Y-type ion series. A database search using this sequence identified the protein as METL from yeast (search not shown). However, a peptide sequence tag consisting of the GG(L/I) sequence, the mass of the region of the sequence up to the G residue (971.57 Da) and the mass from the leucine/isoleucine to the entire peptide mass, also identified the same protein unambiguously in the database. The other selected ion (Fig. 4 B) could, based on the sequence tag generated from the STV sequence and asso-
682 Fig. 4 Peptide sequencing by tandem mass spectrometry using a quadrupole time-of-flight instrument (Micromass, Manchester, UK). (A) collision induced dissociation (CID) spectrum of a doubly charged ion at m/z 722.92 corresponding to the amino acid sequence 238–252 (FVIG...TGR) of yeast METL. The Y ion signal series is shown together with the deduced sequence. (B) CID spectrum of a doubly charged ion at m/z 754.91 corresponding to the amino acid sequence 91–104 (TDV...ALR) of yeast LYS9. The Y ion signal series is shown together with the deduced sequence
ciated masses, be identified as derived from LYS 9. This clearly illustrates that the sequence of only a limited number of amino acids, i.e., 2–3, together with other information derived from the experiment are needed to accurately identify a protein in the database. Peptide sequence tags are also now routinely used to search the expressed sequence tag databases that contain DNA sequences [9]. Alternative methods for protein identification in proteome studies based on partial sequence information have been developed. These are mainly relying on separation of the peptides by liquid chromatography (LC) combined
with electrospray tandem mass spectrometry [39-41]. The main limiting factor in this approach is the sensitivity of the LC system. However, automated, high-throughput protein identification by LC-MS/MS has been demonstrated at the 2-5 pmol level [42] and Yates and coworkers has recently demonstrated a two dimensional chromatographic approach in combination with electrospray tandem mass spectrometry for identifying proteins in a proteomic perspective [43].
683
Characterization of post-translational modifications in proteins separated by gel electrophoresis The second level in proteome analysis Once the proteins in the gel are identified relative to the genome sequence, the next obvious question is whether or not the proteins are co-/post-translationally modified. More than 200 different modifications are reported in proteins and the number is constantly growing. Some of the modifications are common, others are rare, but all post-translational modifications are in general essential for the function of the protein by inducing for example change in structure, solubility, biological activity, location in the cell, or interaction partners. The unique resolving power of 2 DE, especially the narrow range IPG gels [44], allows in the majority of the cases separation of different modified forms of the same protein into distinct spots on the gel. Heterogeneous phosphorylation of a protein causes a series of spots with the same molecular weight but different pI values, i.e., spots located like pearls on a string horizontally over the gel. Glycosylation of a protein gives a series of spots with different molecular weight and some times also different pI values depending on the nature of the glycan structure. Hence, the traditional time consuming purification procedures can be omitted if the characterization of the posttranslational modifications can be performed directly from the small spot on the gel. Therefore, the next level in proteome analysis after identifying the protein sequence is to be able to characterize the protein with respect to posttranslational modifications directly from the gel without further purification. Detection of modified proteins in gels Information about the nature of the modification in a protein is often advantageous prior to full characterization of the protein in the gel. A number of methods are available to selectively visualize the presence of different kinds of modifications in gel separated proteins. Phosphorylated proteins can be detected very sensitively by incorporation of a radiolabeled phosphate (32/33P) followed by visualization by autoradiography. This method can be used to detect all the different forms of phosphorylation that can take place in a cell, i.e., on tyrosine, serine, threonine or histidine. Immunoblotting with antibodies against specific types of phosphorylation is an alternative method which allows distinction between different types of phosphorylation [45]. This method, however, requires transfer of the protein to a membrane prior to reaction with the antibodies. The sensitivity is therefore limited by the transfer efficiency from the gel to the membrane. Detection of glycosylated proteins on a gel or a western blot can be accomplished by utilizing any of four different methods [46]:
• Periodic acid/Schiff (PAS) staining (sensitivity 1–10 µg of glycosylated protein). • Digoxigenin (DIG)/antiDIG alkaline phosphatase labeling is an extension of the PAS method where PAS is combined with an enzymatic detection method (sensitivity observed down to 0.1 µg glycoprotein). • Affinity blotting using labeled or enzyme conjugated lectins or other carbohydrate binding proteins. • Monosaccharide composition analysis of the liberated glycan using high-pressure anion-exchange chromatography with pulsed amperometric detection (HPAEC-PAD). Glycosylphosphatidylinositol (GPI)–anchored proteins have recently been characterized by 2 DE after GPI-specific phospholipase C partitioning in Triton X114 [47]. Other types of protein modification have been investigated by radioactive metabolic labeling followed by 2 DE and autoradiography (e.g. [48, 49]).
General mass spectrometric strategies for characterizing post-translational modifications in proteins The general strategies for the characterization of post-translational modifications in proteins by mass spectrometry include [14]: • Determination of the intact molecular weight by MS followed by comparison with the mass predicted from the amino acid sequence. • Peptide mass mapping of peptide mixtures derived by proteolytic digestion to identify the modified peptide, either directly from the mixture or after separation of the peptides. • Chemical or enzymatic removal of the modification from the peptide/protein monitored by mass spectrometry. • Sequencing of the modified peptide to locate the modified residue. The small amount of protein present in a spot from a 2 D gel does not allow a similar strategy. Although mass spectrometric analysis of intact proteins from gels after electro-blotting [50, 51], electroelution [52] or directly from the first dimension gel matrix [53] have been reported, these techniques are far from routine in most laboratories. They may require specially designed mass spectrometers and the sensitivity especially for larger proteins is insufficient. Therefore, the analysis must rely on the identification of the protein followed by careful evaluation of the peptide mass map obtained after in-gel digestion. A number of programs are available on the Internet which can assist such evaluations (e.g., PeptideSearch [54] or FindMod [55]). After identification of a presumably modified peptide, the presence of the modification can be verified by selective chemical or enzymatical removal of the modifying group, the exact location can be assessed by tandem MS,
684
Fig. 5 Desalting and concentration prior to mass analysis by MALDI MS. (A) MALDI peptide mass map of a small fraction of a tryptic peptide mixture generated by in-gel digestion of yeast GBLP (swissprot P38011). The peptide mixture was analyzed using α-cyano 4-hydroxy cinnamic acid (4HCCA) as matrix. A total of 9 peptides were detected resulting in a sequence coverage of 26%. (B) Another small fraction was desalted and concentrated on a micro-column. The peptides were eluted with the matrix solution (4HCCA) directly onto the MALDI target. A total of 17 peptide signals were detected resulting in a sequence coverage of 65%
or the peptide can be isolated by LC and characterized if sufficient amounts are present. Below, the general problems encountered when characterizing modified proteins in gels will be discussed. In addition, examples illustrating possible strategies for the characterization of specific types of modifications in proteins identified in proteome analysis will be given. General problems associated with the characterization of post-translational modifications in gel separated proteins Mass spectrometry is presently the only technique that allows characterization of post-translational modifications in proteins separated by gel electrophoresis. However, a number of criteria must be fulfilled before full characterization of the protein can be performed from the gel. The amount of protein must be sufficient to perform the analysis. In general, a spot which is clearly visible with coomassie blue staining will often provide enough data to localize and verify a modification. In some cases depending on the type
of modification a lower amount of starting material has been sufficient to characterize modifications in proteins from gels (see below). For complex heterogeneous modifications, e.g. N-linked glycans, even more starting material may be needed. Nearly complete sequence coverage, i.e., the fraction of the amino acid sequence that is represented by peptides in the peptide mass map, is essential to ensure observation of all modifications in the protein. The presence of contaminants, e.g., alkali metal ions, often causes reduced intensity or complete suppression of the peptide signals and consequently results in a decreased sequence coverage. Simultaneous removal of contaminants and sample concentration can be achieved by purification of the peptide mixture on micro-columns prior to mass analysis [24]. Comparison of Fig. 5 A and 5 B illustrate the effect of applying this procedure to peptide mixtures generated by ingel digestion. The direct MALDI MS analysis of a small aliquot of the peptide mixture (Fig. 5 A) results in detection of 9 peptide signals with a relative low signal-tonoise ratio, especially in the low mass area. Nine observed peptide masses unambiguously identified the protein as GBLP from yeast covering 26% of the sequence. Another aliquot of the same peptide mixture was desalted and concentrated on a micro-column prior to analysis by MALDI MS (Fig. 5 B). This resulted in the detection of more than 17 peptide signals covering in total 65% of the GBLP sequence. The sequence coverage can frequently be further improved by dividing the sample into several fractions using stepwise elution from the column with increasing organic solvent concentration. The use of LC- ESIMS instead of MALDI peptide mass mapping also increases the sequence coverage [56] but is more time and sample consuming. Co-migration of proteins in the gel as well as contamination with human keratins and other preparation-derived proteins also represent a major problem because the proteolytic digestion results in a mixed population of peptides representing different proteins. Due to ion suppression effects the sequence coverage will be lower for each of the identified proteins in the mixture. The use of narrow pI gels, which have better resolution and can accept heavier protein loading combined with very clean preparation work, may solve these problems. The presence of certain modifications reduces the ion yield of the corresponding peptide resulting in suppression of its signal. This is especially the case for acidic modifications, e.g., phosphorylation and glycosylation, especially if the glycan contains sialic acids. In addition, the heterogeneity of the glycan results in reduced signal intensity for each molecular species compared to the nonmodified peptide. In such cases chromatographic separation of the peptides in the mixture may be needed [57]. Modified proteins are often separated from the non-modified protein in the 2 D gel. However, a given spot may contain several molecular species caused by modification of one of several possible positions in a protein. This means that all molecules in a spot contain the same modification but not necessarily at the same position. In such cases only a small amount of modified peptides are found in the pres-
685
Selected modifications Phosphorylation
Fig. 6 Strategy to localize phosphorylated sites in gel-separated proteins by a combination of MALDI peptide mass mapping, phosphatase treatment and immobilized metal affinity chromatography
ence of the non-modified peptides. This may result in failure to observe the modified peptide [58, 59].
Fig. 7 Localization of phosphopeptides in peptide mass maps after in-gel digestion using alkaline phosphatase treatment monitored by MALDI MS. (A) MALDI peptide mass map of in-gel generated peptides from in vivo phosphorylated human calgranulin B. The inset shows signals corresponding to the non-phosphorylated and the phosphorylated peptide (m/z 2175.86 and 2255.85, respectively) and to the metastable fragment signals corresponding to loss of H3PO4 and HPO3, respectively. (B) MALDI peptide mass map of the same sample after ontarget treatment with alkaline phosphatase. The peak for the phosphorylated peptide at m/z 2255.85 disappears confirming that it was phosphorylated on a Ser or Thr residue. The matrix used in these experiment was α-cyano-4-hydroxy cinnamic acid in 70 % acetonitrile
Reversible phosphorylation of cellular proteins is one of the most general principles for regulation of protein activity. It has been estimated that more than 30% of the entire protein complement in mammalian cells are at some stage phosphorylated [60]. The reversible modification mechanism is based on the interplay between two different enzyme groups: protein kinases which phosphorylate specific residues in the proteins, mainly serine, threonine, tyrosine or histidine residues, and protein phosphatases which catalyze the reverse reaction. Since protein phosphorylation results in a mass increase of 80 Da per phosphate group, mass spectrometry is the obvious method for characterizing this modification in a protein. The strategy used in our laboratory for site specific determination of phosphorylation is illustrated in Fig. 6. Two possible ways are used to identify the phosphorylated peptide: differential peptide mapping before and after treatment with alkaline phosphatase or another phosphatase (e.g., [61]) or analysis of the phosphorylated peptides purified by immobilized metal affinity chromatography (IMAC). The first approach will be illustrated by identification of the phosphorylation site in an in vivo phosphorylated human protein. Phosphorylated proteins were detected in the gel by autoradiography after incorporation with 33P. The spot
686
of interest was excised according to the autoradiogram and submitted to in-gel tryptic digestion. An aliquot of the generated peptide mixture (2%) was desalted on a microcolumn prior to MALDI MS (Fig. 7 A). The protein was unambiguously identified based on the peptide mass map to be human calgranulin B. Examination of the MALDI peptide map in Fig. 7 A revealed a set of ion signals possibly originating from a serine/threonine phosphorylated peptide (the area of the spectrum is enlarged). Alfa-cyano-4hydroxy cinnamic acid (4HCCA) was used as matrix for MALDI MS because this matrix gives a characteristic metastable loss of H3PO4 and HPO3 [62], revealed in the spectrum by peaks with slightly poorer resolution. The presence of a peak corresponding to the non-phosphorylated peptide species (m/z 2175.86 Da) most likely represents contamination from a neighboring spot containing the non-phosphorylated protein. The phosphorylated peptide (m/z 2255.85 Da) was identified to be the C-terminal tryptic peptide of calgranulin B (MHEGDEGPGHHHKPGLGEGTP). Within this peptide the underlined threonine residue is the only potential phosphorylation site provided that phosphorylation on the histidine residues has not taken place. This assumption is reasonable since histidine phosphorylation gives a different metastable fragmentation pattern [63]. To verify the presence of a phosphorylated threonine on the peptide, the peptide mixture was treated with alkaline phosphatase directly on the MALDI target [64, 22]. The resulting spectrum (Fig. 7 B) shows a mass decrease of 80 Da of the formerly phosphorylated peptide confirming removal of a phosphate group. Frequently, the signals originating from phosphorylated peptides are suppressed in the presence of non-phosphorylated peptides. Therefore, selective isolation of the phosphopeptide(s) using immobilized metal affinity chromatography (IMAC) [e.g., 65–68] prior to MS analysis may be advantageous. This procedure allows identification of the phosphorylation sites in β-casein from as little as 500 fmol protein applied on a SDS-gel [68]. The IMAC column material is packed in constricted GeLoader tips and the phosphopeptides are either eluted with a matrix solution directly onto the MALDI target or eluted with a high pH buffer for further analysis. The approach is illustrated for the above mentioned human calgranulin B. The MALDI peptide mass map of a small aliquot of the peptide mixture derived by tryptic in-gel digestion, and of a 5% aliquot of the peptide mixture purified on a micro IMAC column are shown in Fig. 8 A and 8B, respectively. Comparison of these two figures shows only very little non-specific binding. Upon on-target dephosphorylation with alkaline phosphatase a mass decrease of 80 Da is observed confirming that the signal at m/z 2255.85 Da represents a phosphopeptide containing one phosphate group (Fig. 8 C). An alternative way to identify phosphorylation sites in proteins is by so-called parent ion scanning [69]. This technique allows detection of modified peptides by recording of a loss of specific diagnostic fragment ions from the peptide. Different modifications yield different diagnostic low-mass ions during collision induced dissociation (CID), thus, phosphorylated peptides yield fragment ions at m/z
Fig. 8 Localization of phosphopeptides in peptide mixtures after in-gel digestion using immobilized metal affinity chromatography (IMAC). (A) MALDI peptide mass map of the in-gel generated peptide mixture. (B) MALDI spectrum of the IMAC purified phosphopeptide. Only the phosphopeptide signal together with the signals corresponding to losses of H3PO4 and HPO3 is observed. (C) On-target alkaline phosphatase treatment of the IMAC purified phosphopeptide confirms phosphorylation by a loss of 80 Da. The matrix used in this experiment was α-cyano-4-hydroxy cinnamic acid in 70% acetonitrile
63 and 79 corresponding to PO2– and PO3–, respectively. This technique is mainly used in combination with LC ESIMS, but has also successfully been applied to the identification of phosphorylation sites by direct mass spectrometric analysis of the peptide mixture derived by in-gel digestion of electrophoretically separated proteins (e.g. [70]). Glycosylation Glycosylation is the most common of all known protein modifications. It is observed in all eukaryotes and has recently also been found in prokaryotes [71]. The majority of the glycosylated proteins are secreted proteins or membrane associated proteins. The biological role of the glycans varies from conformational stability, protection against degradation to essential molecular and cellular recognition in for example development, growth, function and cellular communication [72, 73]. Glycans are often complex branched structures composed of several different carbohydrate residues. These glycans can be attached either to asparagine residues in the consensus sequence Asn-XxxSer/Thr/Cys (N-linked glycosylation) or to serine or thre-
687
onine (O-glycosylation). In rare cases other amino acid residues, e.g., cysteine or lysine, may also be glycosylated. Each glycosylated site may contain many different glycan structures leading to pronounced heterogeneity (micro heterogeneity). In addition different sites may be only partially glycosylated (macro heterogeneity). Therefore, full characterization of glycoproteins becomes a major challenge. The use of mass spectrometry in glycoprotein analysis has recently been reviewed [13, 74, 75]. The general strategy for mass spectrometric characterization of purified glycoproteins is to measure the molecular weight of the intact protein by mass spectrometry, followed by identification of the sites of glycosylation. Frequently, differential peptide mapping before and after treatment of the peptide mixture with appropriate endoglycosidases is performed. Structural characterization of the glycan structure can be performed after chemical or enzymatic liberation of the glycan monitored by high-performance anion exchange chromatography with pulsed amperometric detection (HPAEC-PAD) or by mass spectrometry. Alternatively, direct mass spectrometric analysis of the glycopeptide often in combination with sequential exoglycosidase treatment can be performed [57]. The general methods used for glycan structure determination cannot be directly implemented in proteome analysis because: • The amount of protein in a gel generally is too low. • Glycopeptide signals are often suppressed in the presence of the non-glycosylated peptide especially if the glycans are terminated with the negatively charged sialic acid residue. • The glycan heterogeneity combined with frequent multiple metal ion adduct formation distributes the glycopeptide signal into several peaks resulting in low abundant signals. • The presence of sialic acid causes pronounced metastable and prompt fragmentation in MALDI MS which complicate interpretation of the spectra. In spite of these difficulties, a few reports have described characterization of N-linked glycans in glycoproteins separated by gel electrophoresis. In a study of human interferon-γ three different protein bands were observed by SDS-PAGE [57]. The intact molecular weights of the protein in the different bands as determined by MALDI MS after electroelution of the proteins, indicated the presence of the non-, mono- and di-glycosylated species in the three bands, respectively. By direct MALDI MS analysis of the peptide mixture generated by in-gel digestion one of the glycosylated peptides was observed. Subsequent sequential treatment of the peptide mixture with different exoglycosidases and endoglycosidases monitored by mass spectrometry allowed determination of the glycan structure in this site. Signals for the peptide containing the other glycosylation site were not observed in the mass spectrum of the peptide mixture. After microbore HPLC separation of the mixture, a fraction containing the typical glycopeptide peak pattern was identified by MALDI mass spectrome-
try. Using the same strategy as above it was possible to demonstrate that it contained the second glycosylated peptide and to identify the glycan structures on this site. For N-linked glycans Kuster and coworkers have recently demonstrated that it is possible to perform in-gel endoglycosidase treatment followed by extraction of the liberated glycan and mass spectrometric characterization of the glycans [76]. By performing the treatment in a buffer containing H216O and H218O in a 1 : 1 proportion followed by in-gel tryptic digestion and mass spectrometric analysis of the resulting peptide mixture, it was possible to determine which asparagine residues was glycosylated, because the aspartic acid residues obtained by conversion of asparagine would contain equal amounts of 16O/18O [77]. The method, however, does not allow site specific assignment of the different glycan structures. Packer and coworkers have characterized human alpha2HS glycoprotein and human alpha l-protease inhibitor [78]. The different forms of the glycoproteins were separated on narrow pI range 2 DE gels (pH 4.5–5.5), transferred to a PVDF membrane following enzymatic (for N-linked glycans) or chemical (for O-linked glycans) release of the glycans. The glycans were extracted, separated by HPAECPAD, and the isolated glycans were analyzed by ESIMS. Specific proteolytic processing Proteolytic processing of proteins is normally associated with removal of signal peptides or activation of pro-proteins to create functional proteins. However, specific proteolytic processing is also observed during environmental changes (e.g. [79, 80]) and cell aging (e.g. [81]). The products of specific processing are observed in 2 DE as distinct protein spots all identified to be derived from the same protein but located below the full length protein and often at different pI value. In contrast, non-specific processing results in a smear on the gel. Identification of the exact processing site is often essential to understand its biological significance. Isolation of C-terminal peptides in peptide mixtures derived by tryptic digestion has been performed by retaining all other peptides on anhydro-trypsin columns [82] or by observing unexpected peptide masses during mass analysis of the peptides. In proteome analysis localization of the exact site of processing requires a full sequence coverage or specific identification of N-and/or C-terminal peptides. In a recent study of stress induced changes in protein expression in Saccharomyces cerevisiae a total of 10 spots on a silver stained 2 D gel were identified to represent C- and/or N-terminal processed forms of enolase 2. The estimated amount of protein in the spot is in the low picomole level (low ng). The approximate sites of C-terminal processing were determined by comparing the high mass accuracy MALDI peptide mass maps obtained after in-gel tryptic digestion of the full length and the processed forms of the protein. However, all peptides in the processed forms were terminated with lysine or arginine, indicating that the processing was either caused by a trypsin like activity or that the C-terminal peptide gener-
688
Fig. 9 Localization of a C-terminal processing site in a truncated form of enolase 2 observed after stress exposure. (A) The m/z 1600 to 2150 region of a MALDI MS peptide mass map obtained after in-gel digestion using endoproteinase Asp-N in buffer containing H216O and H218O (1 : 1). (B) The isotope distribution of the signal for the C-terminal peptide in which incorporation of 18O has not taken place during digestion. The peak at m/z 1639.76 corresponds to the C-terminal peptide. (C) The isotopic distribution of the signals from internal peptides in which 50% 18O incorporation has taken place during digestion. The matrix used in this experiment was α-cyano-4-hydroxy cinnamic acid in 70 % acetonitrile
ated by the tryptic digestion was not detected in MALDI, possibly because it was too small (1–4 amino acid residues) and therefore masked by matrix signals. Another in-gel digestion was performed with endoproteinase Asp-N. This digest was performed in a buffer containing H216O and H218O in the proportion 1 : 1. This results in incorporation of 16O/18O (1 : 1) in all newly generated carboxytermini except the original C-terminal peptide, resulting in a double signal spaced by 2 Da for these peptides. The peptide mass map obtained by this approach for one of the C-terminally processed enolase 2 spots is shown in Fig. 9 A. The enlarged parts of the spectrum illustrate the signal for the
C-terminal peptide (Fig. 9 B) and examples of signals derived from internal peptides (Fig. 9 C). The identity of the C-terminal peptide was further confirmed by mass mapping after submitting the peptide mixture to digestion on a micro-column containing immobilized trysin resulting in a loss of 172 Da corresponding to removal of the two terminal amino acids (-TA), to methylesterification of the acidic residues (resulting in a mass gain of 14.02 Da per acidic group) and to sequencing by tandem mass spectrometry. No labeling procedures have been found that allow identification of N-terminal processing. Instead, peptide mass mapping after in-gel digestion with two or more different enzymes was used to putatively locate the processing position. Other types of modifications Acylation is stable under the conditions for in-gel digestion and MALDI mass mapping. Therefore acylated peptides can be identified by correlating the mass of unassigned peaks with the masses of expected peptides, for which no signals were obtained, and the masses of possible modify-
689
ing groups [83]. Using this method acetylated proteins have been identified in yeast proteome analysis. An example is shown in Fig. 5, where peptides that could not be matched to the GLBP sequence were investigated. This resulted in one peptide that could correspond to acetylation of the N-terminal peptide (m/z 1042.533 Da). The identity of the peptide was verified by methylesterification of an aliquot of the peptide mixture resulting in incorporation of the expected number of methyl groups (2) in this peptide. This further illustrates the need for desalting and concentration prior to MALDI analysis to increase sequence coverage because this peptide was not detected without this step. Using a similar concept, myristoylation of the N-terminus of the major bovine brain Go alpha isoforms has been reported [84]. Other post-translational modifications may be identified by using mass spectrometry in combination with gel electrophoresis by observation of unexpected peptide mass shifts during mass analysis. Thus, characterization of lysine methylation has recently been reported [85] by the identification of the modified residues by high mass accuracy MALDI MS followed by quantification of the degree of methylation by Fmoc (9-fluorenylmethyl chloroformate)based amino acid analysis.
Conclusion and outlook Mass spectrometric protein identification by high-mass accuracy (10–50 ppm) peptide mass mapping or partial sequencing by MS/MS combined with improved bioinformatic tools allow high-fidelity protein identification at femtomole sensitivity, i.e., of proteins in weakly silver stained spots or spots detected by autoradiography. The majority of all proteins from organisms for which the genome is known can be identified by mass spectrometric peptide mass mapping. Identification based on EST-databases or by homology requires generation of sequence information. Electrospray ionization tandem mass spectrometry provides a more sensitive and faster alternative to traditional amino acid sequencing by Edman degradation and normally provides sufficient information for such identifications and even for construction of oligonucleotide probes and subsequent cloning. The next natural step in proteome analysis after identification is characterization of post-translational modifications. Several techniques are now available for site-specific characterization of protein phosphorylation in proteome analysis. Characterization of protein processing, although requiring laborious procedures, is also possible on a sensitivity level compatible with 2 DE. Characterization of the site-specific glycan structures on glycoproteins during proteome analysis must still be considered a challenge where new improved methodology need to be developed in order to meet the sensitivity required in proteome analysis. Another major challenge is the ability to detect partially modified sites and to quantify the degree of modification. In such cases, the modified peptide will often be
ignored or absent because the non-modified peptide is observed. Methods for selective purification of such partially modified peptides are needed for reliable analysis and quantification. In conclusion, mass spectrometry is the method of choice for protein identification in proteome analysis and has given encouraging results in the characterization of post-translational modifications in electrophoretically separated proteins. However, further developments are still needed before full characterization of all the proteins visualized in the gel during proteome analysis will become possible. Acknowledgements The Danish Biotechnology program is acknowledged for financial support. This work is part of the activities at the Center for Experimental Bioinformatics founded by the Danish National Research Foundation. Drs. Peter Mose Larsen and Stephen J. Fey from the Center for Proteome Analysis, Odense University, are acknowledged for supplying several of the 2 D gels used for the studies described.
References 1. Khan J, Saal LH, Bittner ML, Chen Y, Trent JM, Meltzer PS (1999) Electrophoresis 20 : 223–229 2. Wasinger VC, Cordwell SJ, Cerpa-Poljak A, Yan JX, Gooley AA, Wilkins MR, Duncan MW, Hartis R, Williams KL, Humphery-Smith I (1995) Electrophoresis l6 : 1090–1094 3. Wilkins MR, Pasquali C, Appel RD, Ou K, Golaz O, Sanchez JC, Yan JX, Gooley AA, Hughes G, Humphery-Smith I, Williams KL, Hochstrasser DF (1996) Biotechnology (NY) 14 : 61–65 4. Henzel WJ, Billeci TM, Stults JT, Wong SC (1993) Proc Natl Acad Sci USA 90 : 5011–5015 5. Mann M, Højrup P, Roepstorff P (1993) Biol Mass Spectrom 22 : 338–345 6. Pappin DJC, Højrup P, Bleasby AJ (1993) Curr Biol 3 : 327– 332 7. James P, Quadroni M, Carafoli E, Gonnet G (1993) Biochem Biophys Res Commun 195 : 58–64 8. Eng JK, McCormack AL, Yates JR (1994) J Am Soc Mass Spectrom 5 : 976–989 9. Neubauer G, King A, Rappsilber J, Calvio C, Watson M, Ajuh P, Sleeman J, Lamond A, Mann M (1998) Nat Genet 20 : 46–50 10. Chen RH, Shevchenko A, Mann M, Murray AW (1998) J Cell Biol 143 : 283–295 11. Annan RS, Carr SA (1997) J Protein Chem 16 : 391–402 12. Qin J, Chait BT (1997) Anal Chem 69 : 4002–4009 13. Burlingame AL (1996) Curr Opin Biotechnol 7 : 4–10 14. Andersen JS, Svensson B, Roepstorff P (1996) Nat Biotechnol 14 : 449–457 15. Center for Proteome Analysis in the Life Sciences, Science Park Odense, DK-5230 Odense M Denmark. 16. Shevchenko A, Jensen ON, Podtelejnikov AV, Sagliocco F, Wilm M, Vorm O, Mortensen P, Shevchenko A, Boucherie H, Mann M (1996) Proc Natl Acad Sci USA 93 : 14 440–14 445 17. Wilm M, Mann M (1996) Anal Chem 68 : 1–8 18. Mann M, Wilm M (1994) Anal Chem 66 : 4390–4399 19. Dongre AR, Eng JK, Yates JR 3rd (1997) Trends Biotechnol 5 : 418–425 20. Lingner J, Hughes TR, Shevchenko A, Mann M, Lundblad V, Cech TR (1997) Science 276 : 561–567 21. Shevchenko A, Wilm M, Mann M (1997) J Protein Chem 16 : 481–490 22. Jensen ON, Larsen MR, Roepstorff P (1998) Proteins: Structure, Function and Genetics 2 : 74–89 23. Nawrocki A, Larsen MR, Podtelejnikov AV, Jensen ON, Mann M, Roepstorff P, Görg A, Fey SJ, Larsen PM (1998) Electrophoresis 19 : 1024–1035
690 24. Gobom J, Nordhoff E, Mirgorodskaya E, Ekman R, Roepstorff P (1999) J Mass Spectrom 34 : 105–116 25. Jensen ON, Vorm O, Mann M (1996) Electrophoresis 17 : 938– 944 26. Vestal ML, Juhasz P, Martin SA (1995) Rapid Commun Mass Spectrom 9 : 1044–1050 27. Jensen ON, Podtelejnikov AV, Mann M (1997) Anal Chem 69 : 4741–4750 28. Jiang X, Smith JB, Abraham EC (1996) J Mass Spectrom 31 : 1309–1310 29. Swiderek KM, Davis MT, Lee TD (1998) Electrophoresis 19 : 989–997 30. Kussmann M, Nordhoff E, Nielsen H, Larsen MR, Haebel S, Mirgorodskaya E, Jensen C, Roepstorff P (1997) J Mass Spectrom 32 : 593–601 31. Jensen ON, Mortensen P, Vorm O, Mann M (1997) Anal Chem 69 : 1706–1714 32. Traini M, Gooley AA, Ou K, Wilkins MR, Tonella L, Sanchez JC, Hochstrasser DF, Williams KL (1998) Electrophoresis 19 : 1941–1949 33. Quadroni M, James P (1999) Electrophoresis 20 : 664–677 34. Yates JR, Speicher S, Griffin PR, Hunkapiller T (1993) Anal Biochem 214 : 397–408 35. Jensen ON, Wilm M, Shevchenko A, Mann M (1999) Methods Mol Biol 112 : 571–588 36. Morris HR, Paxton T, Panico M, McDowell R, Dell A (1997) J Protein Chem 16 : 469–479 37. Shevchenko A, Chemushevich I, Ens W, Standing KG, Thomson B, Wilm M, Mann M (1997) Rapid Commun Mass Spectrom 11 : 1015–1024 38. Roepstorff P, Fohlmann J (1984) Biomed Mass Spectrom 11 : 601 39. Davis MT, Lee TD (1998) J Am Soc Mass Spectrom 9 : 194– 201 40. Qin JHC, Zhang X (1998) Rapid Commun Mass Spectrom 12 : 209–216 41. Haynes PA, Fripp N, Aebersold R (1998) Electrophoresis 19 : 939–945 42. Ducret A, Van Oostveen I, Eng J, Yates JR, Aebersold R (1998) Protein Sci 7 : 706–719 43. Link AJ, Eng J, Schieltz DM, Carmack E, Mize GJ, Morris DR, Garvik BM, Yates JR 3rd (1999) Nat Biotechnol 17 : 676–682 44. Gianazza E, Celentano F, Magenes S, Ettori C, Righetti PG (1989) Electrophoresis 10 : 806–808 45. Soskic V, Gorlach M, Poznanovic S, Boehmer FD, GodovacZimmermann J (1999) Biochemistry 38 : 1757–1764 46. Packer NH, Ball MS, Devine PL (1999) Methods Mol Biol 112 : 341–352 47. Sherrier DJ, Prime TA (1999) Electrophoresis 10 : 2027–2035 48. Melkonian KA, Ostermeyer AG, Chen JZ, Roth MG, Brown DA (1999) J Biol Chem 274 : 3910–3917 49. Gromov P, Celis JE (1998) Electrophoresis 19 : 1803–1807 50. Strupat K, Karas M, Hillenkamp F, Eckerskorn C, Lottspeich F (1994) Anal Chem 66 : 464–470 51. Eckerskorn C, Strupat K, Schleuder D, Hochstrasser D, Sanchez JC, Lottspeich F, Hillenkamp F (1997) Anal Chem 69 : 2888– 2892 52. Haebel SJC, Andersen SO, Roepstorff P (1995) Protein Sci 4 : 394–404
53. Loo JA, Brown J, Critchley G, Mitchell C, Andrews PC, Ogorzalek Loo RR (1999) Electrophoresis 20 : 743–748 54. http://www.mann.embl-heidelberg.de/Services/PeptideSearch/ PeptideSearchIntro.html 55. http://www.expasy.ch/tools/findmod/ 56. Roepstorff P, Schram KH, Andersen JS, Rafin K, Baldursson T, Kroll J, Poulsen K, Knudsen J, Kristiansen K (1995) Mol Biotechnol 4 : 1–12 57. Mortz E, Sareneva T, Haebel S, Julkunen I, Roepstorff P (1996) Electrophoresis 17 : 925–931 58. Spangfort MD, Ipsen H, Sparholt SH, Aasmul-Olsen S, Larsen MR, Mortz E, Roepstorff P, Larsen JN (1996) Protein Expr Purif 8 : 365–373 59. Kussmann M, Lassing U, Sturmer CA, Przybylski M, Roepstorff P (1997) J Mass Spectrom 32 : 483–493 60. Hubbard MJ, Cohen P (1993) Trends Biochem Sci 18 : 172– 177 61. Zhang X, Herring CJ, Romano PR, Szczepanowska J, Brzeska H, Hinnebusch AG, Qin J (1998) Anal Chem 70 : 2050–2059 62. Annan RS, Carr SA (1996) Anal Chem 68 : 3413–3421 63. Medzihradszky KF, Phillipps NJ, Senderowicz L, Wang P, Turck CW (1997) Protein Sci 6 : 1405–1411 64. Larsen MR, Moertz E, Mose Larsen P, Roepstorff P (1996) Proc., 2nd Siena 2D Electrophoresis Meeting: From Genome to Proteome, Siena, Italy, Sept. 16–18, [abstract page 206] 65. Posewitz MC, Tempst P (1999) Anal Chem 71 : 2883–2892 66. Li S, Dass C (1999) Anal Biochem 270 : 9–14 67. Figeys D, Gygi SP, Zhang Y, Watts J, Gu M, Aebersold R (1998) Electrophoresis 19 : 1811–1818 68. Jensen ON, Stensballe A, Andersen S (2000) Electrophoresis (in press) 69. Carr SA, Huddleston MJ, Annan RS (1996) Anal Biochem 239 : 180–192 70. Neubauer G, Mann M (1999) Anal Chem 71 : 235–242 71. Moens S, Vanderleyden J (1997) Arch Microbiol 168 : 169–175 72. Varki A (1993) Glycobiology 3 : 97–130 73. Lis H, Sharon N (1993) Eur J Biochem 218 : 1–27 74. Harvey DJ, Kuster B, Naven TJ (1998) Glycoconj J 15 : 333–338 75. Packer NH, Harrison MJ (1998) Electrophoresis 19 : 1872–1882 76. Kuster B, Wheeler SF, Hunter AP, Dwek RA, Harvey DJ (1997) Anal Biochem 250 : 82–101 77. Kuster B, Mann M (1999) Anal Chem 71 : 1431–1440 78. Packer NH, Lawson MA, Jardine DR, Sanchez JC, Gooley AA (1998) Electrophoresis 19 : 981–988 79. Godon C, Lagniel G, Lee J, Buhler J-M, Kieffer S, Perrot M, Boucherie H, Toledano MB, Labarre J (1998) J Biol Chem 273 : 22 480–22 489 80. Brouquisse R, Gaudillere JP, Raymond P (1998) Plant Physiol 117 : 1281–1291 81. Hallak ME, Bongiovanni G (1997) Neurochem Res 22 : 467–473 82. Harris RJ, Chamow SM, Gregory TJ, Spellman MW (1990) Eur J Biochem 188 : 291–300 83. Wilkins MR, Gasteiger E, Gooley AA, Herbert BR, Molloy MP, Binz PA, Ou K, Sanchez JC, Bairoch A, Williams KL, Hochstrasser DF (1999) J Mol Biol 289 : 645–657 84. McIntire WE, Dingus J, Schey KL, Hildebrandt JD (1998) J Biol Chem 273 : 33 135–33 141 85. Yan JX, Sanchez JC, Binz PA, Williams KL, Hochstrasser DF (1999) Electrophoresis 20 : 749–754