Appl Microbiol Biotechnol DOI 10.1007/s00253-016-7940-7
BIOTECHNOLOGICALLY RELEVANT ENZYMES AND PROTEINS
Bioinformatic analysis of fold-type III PLP-dependent enzymes discovers multimeric racemases Anders M. Knight 1,2 & Alberto Nobili 1,3,4 & Tom van den Bergh 5 & Maika Genz 1 & Henk-Jan Joosten 5 & Dirk Albrecht 6 & Katharina Riedel 6 & Ioannis V. Pavlidis 1,7 & Uwe T. Bornscheuer 1
Received: 10 August 2016 / Revised: 9 October 2016 / Accepted: 12 October 2016 # Springer-Verlag Berlin Heidelberg 2016
Abstract Pyridoxal-5′-phosphate (PLP)-dependent enzymes are ubiquitous in nature and catalyze a variety of important metabolic reactions. The fold-type III PLP-dependent enzyme family is primarily comprised of decarboxylases and alanine racemases. In the development of a multiple structural alignment database (3DM) for the enzyme family, a large subset of 5666 uncharacterized proteins with high structural, but low sequence similarity to alanine racemase and decarboxylases was found. Compared to these two classes of enzymes, the protein sequences being the object of this study completely lack the C-terminal domain, which has been reported important for the formation of the dimer interface in other fold-type III enzymes. The 5666 sequences cluster around four protein templates, which also share little sequence identity to each other. In this work, these four template proteins were solubly expressed in Escherichia coli, purified, and their substrate profiles were evaluated by HPLC analysis for racemase activ-
ity using a broader range of amino acids. They were found active only against alanine or serine, where they exhibited Michaelis constants within the range of typical bacterial alanine racemases, but with significantly lower turnover numbers. As the already described racemases were proposed to be active and appeared to be monomers as judged from their crystal structures, we also investigated this aspect for the four new enzymes. Here, size exclusion chromatography indicated the presence of oligomeric states of the enzymes and a nativePAGE in-gel assay showed that the racemase activity was present only in an oligomeric state but not as monomer. This suggests the likelihood of a different behavior of these enzymes in solution compared to the one observed in crystalline form. Keywords Decarboxylase . PLP-dependent enzymes . Protein-function analysis . Racemase
Anders M. Knight and Alberto Nobili contributed equally to this work Electronic supplementary material The online version of this article (doi:10.1007/s00253-016-7940-7) contains supplementary material, which is available to authorized users. * Uwe T. Bornscheuer
[email protected] 1
Institute of Biochemistry, Department of Biotechnology and Enzyme Catalysis, Greifswald University, Felix-Hausdorff-Str. 4, 17487 Greifswald, Germany
2
Division of Biology and Bioengineering, California Institute of Technology, 1200 E. California Blvd. MC 210-41, Pasadena, CA 91125, USA
3
Department of Cancer Immunology and Virology, Dana-Farber Cancer Institute, Boston, MA 02115, USA
4
Present address: Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA 02115, USA
5
Bio-Prodict, Nieuwe Marktstraat 54E, 6511 AA Nijmegen, the Netherlands
6
Institute for Microbiology, Department of Microbial Physiology and Molecular Biology, Greifswald University, Friedrich-Ludwig-Jahn-Str. 15, 17487 Greifswald, Germany
7
Present address: Department of Biochemistry, University of Kassel, Heinrich-Plett-Str. 40, D-34132 Kassel, Germany
Appl Microbiol Biotechnol
Introduction In recent years, the number of available protein sequences has tremendously increased (currently, almost 64 million protein sequences have been deposited at http://www.ncbi.nlm.nih. gov/refseq/) (Tatusova et al. 2014) followed by a parallel increase in available protein structures (Berman et al. 2000). Databases have become orphanages for protein sequences that often have unknown or even misleading annotations, leaving vast numbers of them without a clear identity or understood biochemical function. To enable a potential access to these unexplored enzymes, in silico protein function analysis has developed as a tool to identify biocatalysts with novel activities. For example, in silico prediction methods have been used by us to identify novel enzymes by elucidating a distinct sequence motif that lead to the discovery of esterases and lipases capable to accept bulky tertiary alcohols (Henke et al. 2002) and to identify unique (R)-selective amine transaminases (Höhne et al. 2010). While multiple sequence alignment can give significant insight into the activities of uncharacterized proteins, these methods are less robust when comparing large families with low sequence identity (Joosten 2007). In comparison, the structural information serves as basis for a structure-guided multiple sequence alignment as used in the generation of a 3DM database, and this also enabled to connect orphan enzyme structures to their function. Thus, four proteins could be confirmed experimentally as (S)-selective transaminases (Steffen-Munsberg et al. 2013). More recently, we focused on superfamilies of PLP-dependent enzymes and were able to sort an entire subfamily of fold-type I PLPdependent enzymes comprising more than 12,000 sequences into 28 different enzymatic activities by identifying tight sequence-structure-function relationships (Steffen-Munsberg et al. 2015). PLP-dependent enzymes are of cardinal importance to the chemical industry. The list of chemical reactions that this type of enzymes can perform is frequently updated. Currently, PLP-dependent enzymes have been described to take part in more than 236 different chemical reactions, making the knowledge of PLP chemistry central to the biocatalytic field. Of the seven existing fold classes (Steffen-Munsberg et al. 2015), we focus in this work on the 3DM database for foldtype III of the PLP-dependent enzymes. Compared to other platforms that limit protein comparison to sequences whose identity is above 30 %, the commercially available 3DM database generates a structure-based alignment that is able to include all the publicly available sequences that belong to the fold-type III PLP-dependent enzymes. In the newly created 3DM database, a total of 23,981 homologous sequences with 214 crystal structures were prepared for comparison, with sequence identities as low as 4 %. The sequences in the fold-type III are commonly characterized as homodimers, where each monomer has a classical
α/β-barrel structure with a second β-strand domain necessary for dimerization (Eliot and Kirsch 2004; Percudani and Peracchi 2003; Schneider et al. 2000). The enzymes that belong to this category have been reported as having either alanine racemase or amino acid decarboxylase enzymatic activity. Alanine racemases are employed for the biosynthesis of the peptidoglycan in the bacterial cell wall (Azam and Jayaram 2016). Such enzymes are indispensable for bacterial survival, but they do not have a human homolog. Therefore, they represent the perfect target for the generation of novel antibiotics that act as specific alanine racemase inhibitors (Anthony et al. 2011). For synthetic purposes, racemases represent an appealing alternative because they give access to the D -enantiomers of natural proteinogenic L -amino acids (Espaillat et al. 2014; Wu et al. 2012) and can be used in dynamic kinetic resolutions to achieve 100 % yield (Galkin et al. 1997; Soda et al. 2001). Amino acid decarboxylases are the enzymes responsible for the biosynthesis of polyamines. Cells that are depleted of polyamines stop their cell cycle, and for this reason, these enzymes are target in studies against proliferative diseases (Jackson et al. 2004; Jackson et al. 2000). Despite the commonly accepted dimeric structure of enzymes in this subfamily, 23 % of the total sequences in our database lacked the C-terminal domain comprising part of the active site believed to be required for dimerization, as well as one of the residues which take part in proton transfer in the commonly accepted catalytic reaction mechanism. Only two prior publications have commented on these enzymes. The first publication by Eswaramoorthy and coworkers reports that the alanine racemase Ybl036c from Saccharomyces cerevisiae exists as monomer and displays alanine racemase activity in vitro (Eswaramoorthy et al. 2003). Much later, Ito et al. published the crystal structure of YggS, but surprisingly—despite being able to bind PLP— both Ito’s and Eswaramoorthy’s enzymes seemed to be unable to catalyze any racemization (Ito et al. 2013). There have been reports of alanine racemases characterized as monomers. A later analysis of these proteins showed that, while they did exist in solution as monomers, they were in a monomerdimer equilibrium. It was hypothesized that their activity took place in the dimeric form (Ju et al. 2011). To investigate and clarify the literature-reported aspects, we have analyzed a cluster of 5666 sequences (23 % of the entire database) generated automatically by the 3DM algorithm due to their minimal sequence similarity (sometimes no similarity at all) with classical fold-type III PLP-dependent enzymes like the alanine racemase from Geobacillus stearothermophilus (1SFT) or the ornithine decarboxylase from Mus musculus (7ODC). These sequences clustered around the crystal structures of 1CT5 (S. cerevisiae), 3R79 (Agrobacterium tumefaciens), 3CPG (Bifidobacterium adolescentis), and 3SY1 (Escherichia coli). When searched on the RCSB PDB database (www.rcsb.org)
Appl Microbiol Biotechnol
(Berman et al. 2000), each of these protein structures resulted in entries where each protein was described as Bhypothetical protein^ (1CT5), Buncharacterized protein^ (3R79), Bunknown protein^ (3CPG), and as an unspecified BNortheast Structural Genomics Consortium Target OR70^ yet to be published (3SY1). The visual inspection of these proteins, the existence of broad substrate racemases with a larger catalytic center (Espaillat et al. 2014), and the scarce documentation prompted us to investigate the function of these representative enzymes from the orphan subfamilies found in the 3DM database.
Materials and methods Chemicals All chemicals were purchased from Sigma-Aldrich (Munich, Germany), Fluka (Buchs, Switzerland), Roth GmbH (Karlsruhe, Germany), and Acros Organics (Geel, Belgium) at the highest purity and used without further purification. 3DM database A 3DM protein superfamily system was developed for the alanine racemase superfamily. The generation of 3DM systems has previously been described in more detail (Kuipers et al. 2010b). To develop the alanine racemase 3DM system, the crystal structure with PDB ID, 1SFT chain A (Ala racemase from G. stearothermophilus (Shaw et al. 1997)), was used as a template for an initial structural alignment that contained 214 crystal structures in total. From this structural alignment, 43 structures were selected that showed a sequence Fig. 1 Visualization of fold-type III annotated activity and representative orphan enzyme comparisons. left Sequence distribution within the fold-type III 3DM database for PLP-dependent enzymes. The enzymes belonging to the racemase and the decarboxylase subfamilies have been studied and documented. Almost a fourth of the sequences available online are uncharacterized and hence orphan of a function. right Evolutionary relationships of the 3DM subfamilies. In the figure, the sequences that identify the orphan proteins object of this study have golden points, decarboxylases have blue points, and racemases have orange points
identity of at most 80 % compared to the other structures in this selection. These structures are thus sequentially unique, and each represents a distinct part of the sequence space of the alanine racemase superfamily. Therefore, these 43 structures were used as templates for generating 43 distinct subfamilies. In this step, four structures that belong to the Borphan^ subfamilies were selected. The 43 template structures were used in an all-to-all structural alignment method to determine the structurally conserved Bcore^ of the superfamily. The four structures that belong to the orphan subfamilies contained a much smaller part of this conserved core and were structurally the most distant subfamilies included in the 3DM system. For each subfamily template structure, a BLAST search was performed. Using the iterative multiple sequence alignment method of 3DM (Kuipers et al. 2010b), proteins could be included for which no structures were available in the 3DM system. This resulted in a structure-based multiple sequence alignment that contained 23,981 sequences. A numbering scheme was applied to this alignment so that all structural equivalent residues in the 3DM system were assigned the same number (3D numbers). For each protein in the alignment, data was gathered from online databases such as UniProt and the literature was mined to extract mutation data using the Mutator module of 3DM (Kuipers et al. 2010a). The sequences of the crystal structures contained in the 3DM database were taken from the RCSB protein data bank (Berman et al. 2000), and they were aligned using the online software Clustal omega (Sievers et al. 2011). The evolutionary history was inferred using the minimum evolution method (Rzhetsky and Nei 1992). The optimal tree with the sum of branch length = 51.99702955 is shown in Fig. 1, drawn to scale with branch lengths in the same units as those of the evolutionary
Fold-type III PLP-dependent enzymes
31%
46%
23%
decarboxylases
orphans
racemases
Appl Microbiol Biotechnol
distances used to infer the phylogenetic tree. The evolutionary distances were computed using the Poisson correction method (Zuckerkandl and Pauling 1965) and are in the units of the number of amino acid substitutions per site. The rate variation among sites was modeled with a gamma distribution (shape parameter = 1). The ME tree was searched using the CloseNeighbor-Interchange (CNI) algorithm (Nei and Kumar 2000) at a search level of 1. The neighbor-joining algorithm (Saitou and Nei 1987) was used to generate the initial tree. The analysis involved 43 amino acid sequences. All ambiguous positions were removed for each sequence pair. There were a total of 802 positions in the final dataset. Evolutionary analyses and method description were taken from MEGA7 (Kumar et al. 2016). Plasmids and expression host The sequences of the 3DM subfamily leader proteins were ordered as synthetic genes (E. coli codon optimized in case of 3SY1) and cloned into a pET vector (Genscript, Piscataway, USA) carrying a C-terminal His6 tag. The sequences connected to the crystal structures with PDB IDs 3CPG, 3R79, and 1CT5 (3CPG, Uniprot A1A3G9, GenBank AP009256.1; 3R79, Uniprot A9CHE9, GenBank AE007869.2; 1CT5, Uniprot P38197, GenBank BK006936.2) were ordered without internal modifications. The sequence of the deposited crystal structure 3SY1 contained several variations (L33 V, G56S, N58H, H81N, I83A, H102I, M165S, S202A, M205Q, R221A) when comparing approximately 1500 sequences belonging to the 3SY1 subfamily. All variations were entirely at conserved positions. The wild-type residue is conserved in more than 92 % of the sequences with exception of H102, where the histidine is conserved among 80 % of the sequences. Therefore, the sequence of B2NCJ2, having the highest core similarity to 3SY1’s sequence, was ordered instead as synthetic gene (in the text 3SY1*; codon-optimized sequence deposited at GenBank KX640989). Protein expression was performed in E. coli BL21 (DE3) purchased from New England Biolabs (Ipswich, USA). Protein expression and purification LB media (5 ml) were inoculated with single colonies and grown overnight (37 °C, 180 rpm, with either ampicillin (100 μg ml−1) or kanamycin (50 μg ml−1) according to the pET vector type). These were used to inoculate 50 ml TB media, which were shaken at 180 rpm, 37 °C with ampicillin (100 μg ml−1) or kanamycin (50 μg ml−1). At OD600nm = 0.8, protein expression was induced with IPTG (final concentration 0.2 mM) and cultures were shaken at 180 rpm and 20 °C overnight. The cells were centrifuged at 4000×g for 45 min, and the pellets were frozen at −20 °C until later use. The
pellets were resuspended in buffer A (50 mM HEPES pH 7.5, 100 μM PLP, 300 mM NaCl) and sonicated on ice (5 min at 55 % power, 50 % cycle, repeated three times per sample with 5-min pause between runs) with a Sonopuls KE76 probe (Sigma, St. Louis, USA). The lysates were centrifuged at 10,000×g, 4 °C for 30 min, and the supernatant was filtered through 0.22-μm sterile filters. The filtrates were loaded onto hand columns packed with Roti®garose His/Co beads (Roth, Karlsruhe, Germany), which were pre-equilibrated with buffer A. Nonspecific binding proteins were washed away with 5 ml buffer A + 15 mM imidazole. The proteins were eluted with buffer A + 300 mM imidazole. The purified enzymes were desalted using PD-10 desalting columns (GE Healthcare Bio-Sciences AB, Uppsala, Sweden) equilibrated with desalting buffer (50 mM HEPES pH 7.5, 100 μM PLP). Protein purity was monitored through SDS-PAGE, and protein concentrations were determined using the BCA Protein Assay Kit (Pierce Biotechnology, Rockford, USA). HPLC analysis of racemase activity Purified enzyme was used for the reactions (1 mM amino acid(s), 50 mM HEPES pH 7.5, 100 μM PLP). The reactions were run at 25 °C, 400 rpm overnight. Reaction samples were derivatized with 5 mM o-phthalaldehyde (OPA) and 15 mM N-isobutyryl-L-cysteine in sodium borate buffer, pH 9.5, to a final amino acid concentration of 250 μM. The derivatization reactions were centrifuged and transferred to GC vial inserts. The HPLC instrument used was a Hitachi L-7000 series. Samples were loaded on a Hypersil ODS C18 column (4.6 × 250 mm; 5 μm). The method was 7–47 % buffer B over 55 min, 1 ml min−1 flow rate (buffer A 23 mM sodium acetate, pH 6.0, buffer B 12:1 methanol:acetonitrile) (Brückner et al. 1994). Intensity values were collected at 330 nm to assess the formation of the OPA amino acid derivative. Size exclusion chromatography Size exclusion chromatography (SEC) experiments were run on an Äkta purifier (GE Healthcare Bio-Sciences AB, Uppsala, Sweden). Purified protein samples were loaded on a room temperature HiLoad 16/600 Superdex 75 column (GE Healthcare Bio-Sciences AB, Uppsala, Sweden), which had been pre-equilibrated with a filtered and degassed desalting buffer (50 mM HEPES pH 7.5, 100 μM PLP). The isocratic method was run at 1 ml min−1, and absorbance data were collected at 280 nm. Kinetic analysis via an enzyme-coupled assay The enzyme kinetics were determined using a D-amino acid oxidase and peroxidase coupled-enzyme colorimetric assay
Appl Microbiol Biotechnol
(Holt and Palcic 2006). Briefly, 50 μl purified enzyme solution and 100 μl chromogenic solution (2 mM vanillic acid, 0.5 mM 4-amino-antipyrine, 10 U/ml D-amino acid oxidase, 100 U/ml horseradish peroxidase) were added to 96-well assay plates. Of the alanine, 50 μl was added to the wells at varying concentrations (0–25 mM final concentration), and the reactions were incubated at 30 °C. The reaction was monitored at 498 nm. The extinction coefficient was determined to be 3942 M−1 cm−1 (Fig. S4). The kinetic data were fit with nonlinear least squares fitting with the SciPy package for Michaelis-Menten parameters (SciPy, www.scipy.org). Native-PAGE activity stain Samples of 3R79 and 1CT5 enzymes were run on a 12 % native-PAGE gel, such that each half of the gel was duplicated. After electrophoresis, the gels were cut in half, yielding two duplicate native gels for each enzyme. One half was stained with Coomassie Brilliant blue staining solution; the other halves of the gels were soaked in the enzyme-coupled assay chromogenic solution with L-alanine (final concentration 5 mM) for 1 h. The chromogenic solution was decanted, and the gels were placed in the 37 °C incubator. The ColorPlus prestained ladder (New England Biolabs, Ipswich, USA) was loaded on both halves to compare the location of the bands. All four enzymes were loaded onto the same gel with ColorPlus prestained ladder as reference. The gel was soaked for 1 h in the chromogenic solution with 25 mM L-alanine. The chromogenic solution was then decanted, and the gels were placed in the 37 °C incubator. The gels were monitored for color formation until bands were visible (1–2 h). MALDI-TOF analysis Gel bands were excised from stained 2-D gels and were transferred into microcentrifuge tube. The gel pieces were washed twice with 100 μl of a solution of 50 % CH3CN and 50 % 50 mM NH4HCO3 for 30 min. After drying at 37 °C for 17 min, 10 μl trypsin solution containing 20 ng/μl trypsin (Promega, Madison, WI, USA) was added and incubated at 37 °C for 120 min. For extraction, gel pieces were covered with 60 μl 0.1 % TFA in 50 % CH3CN and incubated for 30 min. The peptide containing supernatant was transferred into a new microcentrifuge tube, and the extraction was repeated with 40 μl of the same solution. The supernatants were dried at 40 °C completely. The dry peptides were resuspended in 0.9 μl of α-cyano-4-hydroxycinnamic acid matrix (3.3 mg/ ml in 50/49.5/0.5 % (v/v/v) CH3CN/H2O/TFA), and 0.7 μl of this solution was deposited on the MALDI target plate. The samples were allowed to dry on the target 10 to 15 min before measurement in MALDI-TOF.
The mass spectrometric data confirmed the absence of E. coli alanine racemase in the protein samples. The MALDI-TOF measurement was carried out on the AB SCIEX TOF/TOFTM 5800 Analyzer (AB Sciex/MDS Analytical Technologies). This instrument is designed for high throughput measurement, being automatically able to measure the samples, calibrate the spectra, and analyze the data using the TOF/TOF™ Series Explorer™ Software v4.1.0. The spectra were recorded in a mass range from 900 to 3700 Da with a focus mass of 1700 Da. For one main spectrum, 25 sub-spectra with 100 shots per sub-spectrum were accumulated using a random search pattern. If the autolytical fragment of trypsin with the mono-isotopic (M + H) + m/z at 2211.104 reached a signal-to-noise ratio (S/N) of at least 40, an internal calibration was automatically performed as onepoint calibration using this peak. The standard mass deviation was less than 0.15 Da. If the automatic mode failed (in less than 1 %), the calibration was carried out manually. The five most intense peaks from the TOF spectra were selected for MS/MS analysis. One MS/MS spectrum with 6375 partial spectra was generated (25 measuring points each with 225 shots, using a random search pattern). The internal calibration was automatically performed as one-point calibration with the mono-isotopic arginine (M + H) + m/z at 175,119, or lysine (M + H) + m/z at 147,107 reached a S/N of at least 5. The peak lists were created by using GPS Explorer™ Software version 3.6 (build 332) with the following settings for TOF-MS mass range, 900–3700 Da; peak density, 20 peaks per 200 Da; and minimum S/N ratio of 15 and maximal 65 peaks per spot. The TOF-TOF-MS settings were a mass range from 60 to precursor 20 Da, a peak density of 50 peaks per 200 Da, and maximal 65 peaks per precursor. The peak lists were created for a minimal value S/N ratio of 10. For the database search, the Mascot search engine version 2.4.1 (Matrix Science Ltd., London, UK) with a specific sequence database was used.
Results Identification of a novel subfamily of fold-type III PLP-dependent enzymes A 3DM database was built to include all the crystal structures and sequences belonging to the fold-type III of PLPdependent enzymes. As template crystal for the construction of the database, we chose the one of the alanine racemase of G. stearothermophilus (PDB ID 1SFT (Shaw et al. 1997). This structure was chosen because this enzyme is well
Appl Microbiol Biotechnol Fig. 2 Structural features of foldtype III racemases. a Side view of 1CT5 showing the typical TIM barrel for the fold-type III dependent enzymes, with PLP shown in orange and residues within 6 Å from the PLP cofactor colored in cyan. b Comparison of the monomeric 1CT5 (cyan) with the dimeric alanine racemase 1SFT (chain A in violet, chain B in orange). c Active site view of 1CT5. The residues within 6 Å from the cofactor are shown in cyan, while the residues that make polar contacts to the PLP are labeled with one-letter code. The PLP’s phosphate group makes seven hydrogen bonds with both the backbones and side chains of S224, G241, and T242
K49 N70
A)
T242 G241 S224
R239
B)
documented (Eliot and Kirsch 2004; Inagaki et al. 1986; Sun and Toney 1999). The database includes 214 crystal structures, approximately 24,000 aligned sequences and 661 documented mutations. The proteins compared in the alignment reached a minimum sequence identity of 4.0 %. Although the fold-type III is commonly defined as the Balanine racemase superfamily,^ due to structural similarity with the seed crystal, the database includes sequences that are annotated with a different function. In particular, the database includes 43 subfamilies and it can be split in the following three blocks: 20 subfamilies include only sequences annotated as alanine racemase (7436 sequences), 19 subfamilies comprise sequences annotated as decarboxylases (11,133 sequences), and 4 subfamilies cover 5666 sequences with uncharacterized function (Fig. 1). The leading sequences belonging to the following four orphan subfamilies are associated with the PDB IDs: 3CPG, 3R79, 1CT5, and 3SY1. Among them, only the alanine racemase from S. cerevisiae (1CT5) has been connected to an alanine racemase activity by Eswaramoorthy and coworkers (Eswaramoorthy et al. 2003). The remaining structures have been deposited without characterization. The sequences have an average of 250 amino acids, while the classical alanine racemase from G. stearothermophilus (1BD0) has 388 amino Table 1 Kinetic parameters for the racemization of L-alanine to Dalanine by the most active orphan templates, including turnover number (kcat [s−1]), Michaelis constant (KM [mM]), and specificity constant (kcat/ KM [M−1 s−1]) Enzyme
kcat (s−1)
KM (mM)
kcat/KM (M−1 s−1)
1CT5 3R79 3CPG
7.28 × 10−3 2.64 × 10−5 2.19 × 10−4
0.567 0.023 0.264
12.8 1.15 0.83
C)
acids. The four orphans share only between 32 and 38 % sequence identity with high query coverage (above 90 % with exception of the comparison between 3CPG and 1CT5, where the query coverage drops to 82–88 %). Visual inspection of the protein crystal structures The crystal structures of the orphan proteins 1CT5, 3R79, 3CPG, and 3SY1 resemble the classical TIM-barrel scaffold of the enzymes belonging to the fold-type III of the PLPdependent enzymes. Compared to the majority of the enzymes present in this fold type, the orphan proteins show a Cterminally truncated sequence that would lead to the complete exposure of the active site to the solvent. The PLP cofactor should be maintained in the correct position due to the formation of several polar contacts with the first shell residues (residues at 6-Å distance), which are also seen in other PLPdependent enzymes. In particular, the PLP forms seven hydrogen bonds with the side or main chain of the residues S224, G241, and T242 (Fig. 2). Expression and purification of representative proteins All the following four proteins could be solubly expressed in E. coli BL21 and purified in high concentrations: 1CT5 3.31 mg ml−1, 3R79 5.30 mg ml−1, 3CPG 0.750 mg ml−1, and 3SY1* 0.263 mg ml−1. The purified protein fractions loaded onto the gel were all more than 95 % pure. In agreement with the monomeric crystal structure conformation, all the proteins run with a theoretical molecular weight of 29.9 kDa (1CT5), 25.1 kDa (3R79), 30.1 kDa (3CPG), and 26.6 kDa (3SY1) typical of each protein with a His6 purification tag (Fig. S1). MALDI-TOF analysis on minor bands at
Appl Microbiol Biotechnol
The most active templates were further characterized. The acid oxidase–horseradish peroxidase coupled enzyme assay (Holt and Palcic 2006) detected L-to-D alanine racemase activity in 1CT5, 3R79, and 3CPG as a function of the quinoneimine product formed (Fig. S4). The MichaelisMenten kinetic parameters show that the Michaelis constant KM for all of these three enzymes is sub-millimolar. When corrected for the enzyme concentrations used in the reactions, the turnover numbers for the three enzymes were in a range between 7.7 × 10−3 and 2.6 × 10−5 s−1 (Table 1).
Absorbance at 280 nm
320
D-amino
270 220
3R79
1CT5
170 120 70 20 -30
0
20
40
Retenon me (min)
60
80
Fig. 3 Size exclusion chromatography plot for the purified 3R79 and 1CT5. The comparison of the retention time in the plot with a calibration curve reveals that the two proteins in solution behave as oligomers (1CT5 142.3 kDa, 3R79 138.9 kDa) rather than monomers (1CT5 29.9 kDa, 3R79 25.1 kDa)
higher molecular weights confirmed that these are also the synthetic gene products.
Kinetic activity analysis via HPLC and substrate scope Each of the enzymes showed activity against alanine after an overnight incubation in a 1 mM alanine solution in the conversion of D-alanine to L-alanine and vice versa. 1CT5 and 3R79 also showed activity against L-serine in overnight reactions. Following a 1-h reaction, however, only alanine is converted. To confute the existence of these proteins as monomers in solution, which would lead to a solvent-exposed active site, we investigated the ability of these enzymes to racemize bulkier proteinogenic L-amino acids. The amino acid mixture contained aspartate, glutamate, serine, threonine, alanine, and valine. Each amino acid was at 1 mM final concentration (Table S1). All the substrates could be resolved, and thus run simultaneously, but only alanine and serine were converted (Figs. S2 and S3). Fig. 4 Native PAGE with purified protein fractions (1CT5 left and 3R79 right) matched with a pre-stained protein ladder. The right and left halves were run as one identical gel. The left half was soaked in chromogenic solution +5 mM L-Ala, and the right was stained with Coomassie staining solution. Importantly, the molecular weight can only be used as a visual reference because the commercial ladder comes in as denaturated proteins, while our sample is present in its native form
Oligomeric analysis with SEC and activity detection in native PAGE After confirming the alanine racemase activity in the template enzymes, we investigated the possibility of the formation of an oligomeric state of the purified protein in solution. This was examined via size exclusion chromatography by comparing the elution volume of the orphan proteins with proteins of known molecular weight. Surprisingly, the proteins did not run as only monomers but in multiple oligomeric states (Fig. 3). The primary peaks for 3R79 and 1CT5 eluted at 45 and 44 ml, which correspond to apparent molecular weights of 138.9 and 142.3 kDa, while 3R79 and 1CT5 have as monomers an apparent molecular weight equal to 25.1 and 29.9 kDa, respectively. Differences in their migration behavior could be due to differences in their isoelectric points. The native-PAGE gels run in parallel for 3R79 and 1CT5 both show the formation of the red quinoneimine dye product from the D -amino acid oxidase–horseradish peroxidase coupled enzyme assay, indicating that there is D-alanine formation in the gels. There were clear spots for both 3R79 and 1CT5 at approximately the same height on the native gels. Despite the majority of the protein accumulated at a low molecular weight, the band height at which enzyme activity was present was not at the monomer size but rather at a higher oligomeric state. 3CPG and 3SY1* did not show detectable activity at any position on the gel (Fig. 4).
Appl Microbiol Biotechnol
Discussion The 3DM database generated for this study clearly showed thousands of uncharacterized proteins with structural similarity to the fold-type III racemases and decarboxylases. These results are another example showing that bioinformatic methods such as 3DM are valuable as an initial screen for the prediction of enzymatic activities for uncharacterized proteins. Future efforts to characterize proteins within the foldtype III PLP-dependent enzyme family could make use of this database, and the generation of new 3DM databases can be used for prediction among other evolutionarily related proteins. HPLC analysis shows that the C-terminally truncated members of the fold-type III PLP-dependent family indeed exhibited alanine racemase activity. The kinetics of their enzymatic activity were determined with the plate-based D-amino acid oxidase–peroxidase coupled colorimetric assay. The Michaelis constants for these enzymes are comparable to literature values for fold-type III alanine racemases, but their turnover numbers are orders of magnitude smaller. As the recombinant enzyme were purified, no E. coli alanine racemase should interfere with activity measurements, and this was confirmed through mass spectrometric analysis of the samples. While these proteins appeared to be possible candidates for a broad specificity PLP-dependent enzyme with a solventexposed active site, it is clear that, while they crystallize in monomeric form, their activity is only detected in an oligomeric state. The native-PAGE in-gel assay for L-to-D-alanine racemase activity was an innovation of the plate-based screening method (Holt and Palcic 2006), which allowed for resolution of the oligomeric state in which enzymatic activity could be seen. From this assay, we see that there is only activity present in a higher oligomeric form. The majority of the enzyme is not present at the location of the active band in the gel, indicating that only a specific oligomeric state of the four orphan proteins is capable of alanine racemase activity. This in part explains the low turnover number, as the majority of the protein does not appear to be in an active form. Furthermore, it supports the tight alanine-racemase activity because—probably—oligomerization grants this control. It is likely that 1CT5, 3R79, 3CPG, and 3SY1 have a different native activity and that the alanine racemase activity is a promiscuous activity, owing to its high structural similarity to the alanine racemases within its fold type. As the KM against alanine is at physiologically relevant concentrations, these enzymes might have an activity against a similar—yet unknown—metabolite. The presence of an oligomeric form is interesting as the majority of the fold-type III dimeric interface could not form with the orphan enzymes due to the large C-terminal truncation. TIM barrel proteins are known to be prone to the
generation of very diverse oligomerization patterns. As these enzymes do form oligomers, they must be forming a novel oligomeric structure for the fold-type III PLP enzyme family. TIM-barrel proteins do readily form oligomeric structures, and the oligomerization is thought to be in part for thermodynamic stability (Romero-Romero et al. 2015). Whether the oligomeric structure of these enzymes is well defined or if it is a nonspecific oligomeric form is beyond the scope of this study. The lack of activity in monomeric form with a large, solvent-exposed active site could preclude these enzymes from the broad specificity applications for which they were initially characterized. Acknowledgments We thank the European Union (KBBE-2011-5, Grant No. 289350), the DFG (INST 292/118-1 FUGG), and the federal state Mecklenburg-Vorpommern for their financial support. A.M.K. thanks the Deutscher Akademischer Austauschdienst for financial support through the DAAD Study Scholarship. Furthermore, we thank Ina Menyes, Martin Weiss, and Dr. Mark Dörr (all Institute of Biochemistry, Greifswald University) for the analytical support. Compliance with ethical standards Conflict of interest All authors—except HJJ and TvdB as employees of Bioprodict—declare that they have no conflict of interest. Ethical approval This article does not contain any studies with human participants or animals performed by any of the authors.
References Anthony KG, Strych U, Yeung KR, Shoen CS, Perez O, Krause KL, Cynamon MH, Aristoff PA, Koski RA (2011) New classes of alanine racemase inhibitors identified by high-throughput screening show antimicrobial activity against Mycobacterium tuberculosis. PLoS One 6(5):e20374. doi:10.1371/journal.pone.0020374 Azam MA, Jayaram U (2016) Inhibitors of alanine racemase enzyme: a review. J Enzyme Inhib Med Chem 31(4):517–526. doi:10.3109 /14756366.2015.1050010 Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The Protein Data Bank. Nucleic Acids Res 28(1):235–242 Brückner H, Wittner R, Haasmann S, Langer M, Westhauser T (1994) Liquid chromatographic determination of D- and L-amino acids by derivatization with o-phthaldialdehyde and chiral thiols: applications with reference to biosciences. J Chromatogr A 666:259–273 Eliot AC, Kirsch JF (2004) Pyridoxal phosphate enzymes: mechanistic, structural, and evolutionary considerations. Ann Rev Biochem 73: 383–415. doi:10.1146/annurev.biochem.73.011303.074021 Espaillat A, Carrasco-López C, Bernardo-García N, Pietrosemoli N, Otero LH, Álvarez L, de Pedro MA, Pazos F, Davis BM, Waldor MK, Hermoso JA, Cava F (2014) Structural basis for the broad specificity of a new family of amino-acid racemases. Acta Cryst Sect D, Biol Cryst 70(Pt 1):79–90. doi:10.1107/S1399004713024838 Eswaramoorthy S, Gerchman S, Graziano V, Kycia H, Studier FW (2003) Structure of a yeast hypothetical protein selected by a structural genomics approach. Acta Cryst Sect D, Biol Cryst 59:127–135 Galkin A, Kulakova L, Yamamoto H, Tanizawa K, Tanaka H, Esaki N, Soda K (1997) Conversion of α-keto acids to D-amino acids by coupling of four enzyme reactions. J Ferment Bioeng 83(3):299–300
Appl Microbiol Biotechnol Henke E, Pleiss J, Bornscheuer UT (2002) Activity of lipases and esterases towards tertiary alcohols: insights into structure-function relationships. Angew Chem Int Ed 41(17):3211–3213. doi:10.1002 /1521-3773(20020902)41:17<3211::AID-ANIE3211>3.0.CO;2-U Höhne M, Schätzle S, Jochens H, Robins K, Bornscheuer UT (2010) Rational assignment of key motifs for function guides in silico enzyme identification. Nature Chem Biol 6(11):807–813. doi:10.1038 /nchembio.447 Holt A, Palcic MM (2006) A peroxidase-coupled continuous absorbance plate-reader assay for flavin monoamine oxidases, coppercontaining amine oxidases and related enzymes. Nat Protocols 1(5):2498–2505. doi:10.1038/nprot.2006.402 Inagaki K, Tanizawa K, Badet B, Walsh CT, Tanaka H, Soda K (1986) Thermostable alanine racemase from Bacillus stearothermophilus: molecular cloning of the gene, enzyme purification, and characterization. Biochemistry 25(11):3268–3274. doi:10.1021/bi00359a028 Ito T, Iimori J, Takayama S, Moriyama A, Yamauchi A, Hemmi H, Yoshimura T (2013) Conserved pyridoxal protein that regulates Ile and Val metabolism. J Bacteriol 195(24):5439–5449. doi:10.1128 /JB.00593-13 Jackson LK, Baldwin J, Akella R, Goldsmith EJ, Phillips MA (2004) Multiple active site conformations revealed by distant site mutation in ornithine decarboxylase. Biochemistry 43(41):12990–12999. doi:10.1021/bi048933l Jackson LK, Brooks HB, Osterman AL, Goldsmith EJ, Phillips MA (2000) Altering the reaction specificity of eukaryotic ornithine decarboxylase. Biochemistry 39(37):11247–11257 Joosten H-J (2007) 3DM: from data to medicine., PhD thesis, Wageningen University Ju J, Xu S, Furukawa Y, Zhang Y, Misono H, Minamino T, Namba K, Zhao B, Ohnishi K (2011) Correlation between catalytic activity and monomer-dimer equilibrium of bacterial alanine racemases. J Biochem 149(1):83–89. doi:10.1093/jb/mvq120 Kuipers R, Van Den Bergh T, Joosten HJ, Lekanne dit Deprez RH, Mannens MMAM, Schaap PJ (2010a) Novel tools for extraction and validation of disease-related mutations applied to fabry disease. Hum Mutat 31(9):1026–1032. doi:10.1002/humu.21317 Kuipers RK, Joosten H-J, van Berkel WJH, Leferink NGH, Rooijen E, Ittmann E, van Zimmeren F, Jochens H, Bornscheuer UT, Vriend G, dos Santos VAPM, Schaap PJ (2010b) 3DM: systematic analysis of heterogeneous superfamily data to discover protein functionalities. Proteins 78(9):2101–2113. doi:10.1002/prot.22725 Kumar S, Stecher G, Tamura K (2016) MEGA7: Molecular Evolutionary Genetics Analysis version 7.0 for bigger datasets. Mol Biol Evol. doi:10.1093/molbev/msw054 Nei M, Kumar S (2000) Molecular evolution and phylogenetics. Oxford University Press, USA
Percudani R, Peracchi A (2003) A genomic overview of pyridoxalphosphate-dependent enzymes. EMBO Rep 4(9):850–854. doi:10.1038/sj.embor.embor914 Romero-Romero S, Costas M, Rodriguez-Romero A, Fernandez-Velasco DA (2015) Reversibility and two state behaviour in the thermal unfolding of oligomeric TIM barrel proteins. Phys Chem Chem Phys 17(32):20699–20714. doi:10.1039/C5CP01599E Rzhetsky A, Nei M (1992) A simple method for estimating and testing minimum-evolution trees. Mol Biol Evol 9(5):945 Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4(4):406–425 Schneider G, Käck H, Lindqvist Y (2000) The manifold of vitamin B6 dependent enzymes. Structure 8(1):1–6. doi:10.1016/S0969-2126 (00)00085-X Shaw JP, Petsko GA, Ringe D (1997) Determination of the structure of alanine racemase from Bacillus stearothermophilus at 1.9-Å resolution. Biochemistry 36(6):1329–1342. doi:10.1021/bi961856c Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W, Lopez R, McWilliam H, Remmert M, Söding J, Thompson JD, Higgins DG (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol 7(1):n/ a–n/a. doi:10.1038/msb.2011.75 Soda K, Oikawa T, Yokoigawa K (2001) One-pot chemo-enzymatic enantiomerization of racemates. J Mol Catal B Enzym 11:149–153 Steffen-Munsberg F, Vickers C, Kohls H, Land H, Mallin H, Nobili A, Skalden L, van den Bergh T, Joosten H-J, Berglund P, Höhne M, Bornscheuer UT (2015) Bioinformatic analysis of a PLP-dependent enzyme superfamily suitable for biocatalytic applications. Biotechnol Adv 33:566–604 Steffen-Munsberg F, Vickers C, Thontowi A, Schätzle S, Tumlirsch T, Svedendahl Humble M, Land H, Berglund P, Bornscheuer UT, Höhne M (2013) Connecting unexplored protein crystal structures to enzymatic function. ChemCatChem 5:150–153 Sun S, Toney MD (1999) Evidence for a two-base mechanism involving tyrosine-265 from arginine-219 mutants of alanine racemase. Biochemistry 38(13):4058–4065. doi:10.1021/bi982924t Tatusova T, Ciufo S, Fedorov B, O’Neill K, Tolstoy I (2014) RefSeq microbial genomes database: new representation and annotation strategy. Nucleic Acids Res 42(D1):D553–D559. doi:10.1093/nar/gkt1274 Wu H-M, Kuan Y-C, Chu C-H, Hsu W-H, Wang W-C (2012) Crystal structures of lysine-preferred racemases, the non-antibiotic selectable markers for transgenic plants. PLoS One 7(10):e48301. doi:10.1371/journal.pone.0048301 Zuckerkandl E, Pauling L (1965) Evolutionary divergence and convergence in proteins. In: Bryson V, Vogel HJ (eds) Evolving Genes and Proteins. Academic Press, New York, pp. 97–166