ARTICLES Chinese Science Bulletin 2006 Vol. 51 No. 8 941—945
DOI: 10.1007/s11434-006-0941-7
Complete genome analysis of Ketogulonigenium sp. WB0104 YANG Fan1*, JIA Qian2*, XIONG Zhaohui1*, ZHANG Xiaobing1*, WU Hongtao2*, ZHAO Ying2, YANG Jian1, ZHU Junping1, DONG Jie1, XUE Ying1, SUN Lilian1, SHEN Yan3 & JIN Qi1 1. State Key Laboratory for Molecular Virology and Genetic Engineering, Beijing 100176, China; 2. North China Pharmaceutical Group Corporation, Shijiazhuang 050015, China; 3. Chinese National Human Genome Center, Beijing 100176, China Correspondence should be addressed to Jin Qi (email: zdsys@ vip.sina.com)
Abstract Ketogulonigenium sp. may convert L-sorbose into 2-keto-L-gulonic acid, the vitamin C precursor. The genome of Ketogulonigenium sp. WB0104 consists of a circular 2765030 bp chromosome with 61.69% G+C content and two circular plasmids of 267968 and 242707 bp. The genome contains 2727 open reading frames (ORFs). The systems of replication, transcription, translation, carbohydrate and energy metabolism are intact, but the repair system is incomplete. About 640 predicted ORFs have been found to encode transporter proteins, which account for about one fourth of total predicted ORFs, noticeably higher than other documented bacteria. This may be due to the fact that WB0104 adapts to soil circumstance. Keywords: Ketogulonigenium sp., genome.
Ketogulonigenium sp. is a new genus defined by Urbance and Bratina[1] in 2001. It belongs to Proteobacteria, Alpha proteobacteria, Rhodobacterales, Rhodobacteraceae, Ketogulonigenium. Ketogulonigenium sp. is a Gram-negative, facultatively anaerobic, egg or rod-shaped bacterium with flagellum and capsule, firstly isolated from soils. The diameter of single cell is 0.8―1.0 μm. The optimal growth temperature range is 27―29℃ and the optimal pH range is 7.5― 8.0. Ketogulonigenium sp. can be used to synthesize vitamin C by converting L-sorbose into 2-keto-L-gulonic acid, the vitamin C precursor.
The nucleotide sequence of the entire genome of Ketogulonigenium sp. WB0104, which is the productive strain of Vitamin C, provided by North China Pharmaceutical Group Corporation, was determined. The complete sequence of WB0104 is a direct basis for the important metabolic function genes study, the metabolic orientation analysis and the metabolic net reconstruction, the modification of the vitamin C productive strain and the improvement of production. 1 1.1
Materials and methods Bacterial strain
Ketogulonigenium sp. WB0104, provided by North China Pharmaceutical Group Corporation. 1.2
Chromosomal DNA extraction
Chromosomal DNA was extracted by standard microbiological methods. For details, see Molecular Cloning: A Laboratry Manual (3rd ed.). 1.3
Genome sequencing and assembly
The nucleotide sequence of the entire genome of Ketogulonigenium sp. was mainly obtained by the Shotgun method[2]. Chromosomal DNA was broken by ultrasonic. DNA bands of different sizes were harvested through agarose electrophoresis according to later libraries construction. The DNA segments were filled by T4 DNA polymerase, ligated into end-phosphated BluscriptKS vector and electro-transformed into E. coli. competent cells. The plasmid template libraries were obtained through plasmids extraction. We constructed three sets of genomic libraries with different sizes of inserted segments. The sizes of inserted segments were 1000―2000, 2000― 4000 and bigger than 4000 bp. A total of 35366 random sequences corresponding to 5.8 genome-equivalents were assembled. Then the physical gaps were filled by random PCR. The nucleotide sequence of the entire genome was completed by data alignment and verification. The sequencing was carried out with BigDye fluorescence sequencing kits (Perkin Elmer) and ABI3700 automated sequencers were used for data collection and analysis. The entire genomic sequence of Ketogulonigenium sp. was assembled by Phred-Phrap software developed by Green and Ewing at Genome Center of Washington
* These authors contributed equally to this work.
www.scichina.com
www.springerlink.com
941
ARTICLES University. The initial sequences were screened according to the universal quality guideline 20 of the Human Genome Project during the entire genome assembly. The accuracy of every base in assembled sequences may reach 99%. 1.4
ORF prediction and annotation
The ORF prediction and annotation of WB0104 genome were carried out with the Glimmer program. Comparison between the predicted ORFs and the existing COGs and NR protein sequences libraries was performed by taking the BLASTP program. The ORFs which share the higher similarity with the existing functional protein sequences were found using the hint came from the comparison. Besides, the transport RNAs are recognized by tRNAscan-SE.
2
Results and discussion
2.1
General genome features
The genome of Ketogulonigenium sp. WB0104 consists of a circular 2765030 bp chromosome and two circular plasmids of 267968 bp and 242707 bp. The chromosome with average 61.69% G+C content contains 2727 open reading frames (ORFs), of which 72.4% can be assigned to a clear functional role, 17.7% stay without function assigned. 2727 ORFs occupy 94.4% of the whole chromosome. The average size of the single ORF is 957 bp. Plasmid 1 with average 61.32% G+C content contains 244 ORFs, which occupy 91.77% of the whole plasmid. Plasmid 2 with average 62.58% G+C content contains 225 ORFs, which occupy 94.73% of the whole plasmid (Fig. 1, Tables 1 and 2). The ORFs of Ketogulonigenium sp. can be classified
Fig. 1. Circular map of Ketogulonigenium sp. WB0104 genome. The meanings of the circles from outer to inner are as follows: 1, the genes encoded by the positive DNA chain; 2, the genes encoded by the negative DNA chain; 3, the kind of different IS sequence; 4 and 5, G+C content and GC skew; 5, distribution of rRNA genes; 6, distribution of tRNA genes. Table 1 General features of the Ketogulonigenium sp. WB0104 genome Chromosome Plasmid 1 Length (bp) 2765030 267968 G+C ratio 61.69% 61.32% ORFs 2727 244 ORF length (bp) 957 1008 Coding region (% of genome size) 94.40% 91.77% rRNAs 4 1 tRNAs 55 3 Proteins matching to known proteins 2128 187 Function unknown proteins 260 21 Conserved hypothetical proteins 23 4 Hypothetical proteins 316 32
942
Chinese Science Bulletin
Plasmid 2 242707 62.58% 225 1022 94.73% 0 0 188 18 0 19
Vol. 51 No. 8
April 2006
ARTICLES Table 2 Cog-code J K L D O M N P T C G E F H I Q R S No cogs Cog no hits
Functional classification of ORFs according to COGs Functional category Chromosome Plasmid Translation 153 5 Transcription 165 44 DNA replication, 109 19 recombination and repair Cell cycle control, 23 4 mitosis and meiosis Posttranslational modification, 87 5 protein turnover, chaperones Cell wall/membrane biogenesis 126 7 Cell motility and secretion 68 11 Inorganic ion transport 249 90 and metabolism Signal transduction mechanisms 52 15 Energy production and conversion 132 19 Carbohydrate transport 186 27 and metabolism Amino acid transport 343 114 and metabolism Nucleotide transport 83 2 and metabolism Coenzyme transport 97 18 and metabolism Lipoid transport and metabolism 48 6 Secondary metabolites biosynthesis, 79 7 transport and catabolism General function prediction only 315 40 Function unknown 147 17 No cogs 261 55 Cog no hits 344 58
groups, one unknown function group, one unclassified group and one COG no hits group (Table 2). 2.2
DNA replication and repair
Ketogulonigenium sp. WB0104 contains a fairly completed DNA replication system. Genes of dnaA, dnaB and ssb for initiating chromosome replication and DNA elongation, genes encoding DNA polymerase I and III (including subunits of α, β, δσ, τ, χ and εγ), DNA Helicase (A and B subunits), ATP-dependent DNA Helicase and DNA Topoisomerase, etc. are found in the chromosome genome. However, the repair system of Ketogulonigenium sp. is incomplete. Only recA and lexA, which belong to SOS Repair System, as well as mutL and mutS, which are two of three methylated mismatch repair genes (nuclease mutH is absent) are found. uvrA, B, C and D of excision repair system are not found. ruvA―ruvC and recG, which correlate with DNA shuffling, are also found in the chromosome genome. 2.3
Transcription and replication
WB0104 genome contains a transcription system similar to the known E. coli system. It contains 165 www.scichina.com
www.springerlink.com
genes. Some genes code the whole RNA polymerase (subunits of αββ′σ). RNA polymerase σ subunit is correlative with recognition of transcription start site, including genes of E factor, sigma70 and sigma32 subunits. Some genes code transcription elongation factor AsnC sub-family and transcription terminator Rho factor and another lot of genes encode transcription regulation proteins with transcription regulating function (including members coming from 10 sub-families of MarR, ROK, LysR, GntR, AraC, tetR, IclR, AcrR, MerR and LacI). There are 153 genes involved in translation, ribosome structure and synthesization. All genes encoding amino acyl-tRNA synthetases of 20 basic amino acids can be found; they participate in acyl-tRNAs synthesis. The genome contains 55 tRNA genes carrying 19 basic amino acids. Only gene of Tyr-tRNA is not found. In addition, the WB0104 genome also consists of genes encoding fMet-tRNA-carbamyltransferase and 3 initiation factors (IF-1, IF-2 and IF-3) for polypeptides chain synthesis initiation. Elongation factor T (EF-T, including EF-Ts and EF-Tu) for binding and elongation factor G (EF-G) and P for translocation during the polypeptides chain elongation. Release factors (RF-1, RF-2 and RF-3) for polypeptides chain release, demethylase and acetyltransferase for post-translation modification, and 3 methylase and one methyltransferases for encoding rRNAs. Ketogulonigenium sp. ribosome consists of two subunits, 50S and 30S. 55 genes are involved including 22 genes for S type of ribosome protein and 27 genes for ribosome protein L type. 2.4
Transport system
About 640 ORFs (including two big plasmids) in Ketogulonigenium sp. are found encoding transporter proteins in the analysis and annotation, accounting for one fourth of total predicted genes. The total length of transporter genes occupies more than 20% of the whole genome length. It is higher than Agrobacterium tumefaciens (15%), more than twice of E. coli (10.8%), Haemophilus influenzae (9.8%), Bacillus subtilis (9.7%) and Mycoplasma genitalium (10.2%), much higher than Helicobacter pylori (5.4%), Methanococcus jannaschii (3.5%) and Synechocytis sp. strain PCC 6803 (4.7%)[3]. These transporters are classified into several superfamilies[4]: major facilitator superfamily (MFS), resistancenodulation-division superfamily (RND), drug/metabolite transporters (DMT), multidrug and toxic compound extrusion family (MATE), ATP-binding cassette (ABC)[1], etc. The transporters of Ketogulonigenium sp. 943
ARTICLES are classified into 5 groups according to the energycoupled classification guideline of TC transport commission[3] (Table 2). (i) Channel protein. Ketogulonigenium sp. contains 3 channel proteins, one large-conductance mechanosensitive channel (MscL) and 2 small-conductance mechanosensitive channels (MscS)[5,6]. All of them belong to mechanosensitive channel family. The molecular weight of mscs is much higher than MscL and the pressure threshold of MscS is only 50% of MscL. The MscL is composed of 163 amino acids and the 2 MscS are composed of 800 and 362 amino acids respectively. The key role of mechanosensitive channels in bacteria is to relate response to high pressure of osmosis. They can stretch cell membrane to depolarize it, thus opening and closing some ion channels. (ii) Secondary active transporter. Secondary active transporters hold the biggest amount and the most branches in transporters, of which the maximum is major facilitator superfamily (MFS). 26 coding regions of total 30 are located in the chromosome and only 4 coding regions in the plasmids. One of secondary active transporters is sulphate-specific transporter and the others are non-substrate-specific transporters (or the specificity is unknown) related to the transportation of glucide, amino acids, nucleic acids, drugs, ions, etc. In addition, 6 transporters belong to resistance nodulation division superfamily (RND); 12 transporters belong to the family of drug metabolism transporters (DMT) and most of them are involved in ions transportation and bacteria drug resistance; 8 transporters belong to trap trap family related to C4-dicarboxylic acid transportation. All these mentioned genes are located in the chromosome. There are also C4-dicarboxylic acid/H+ symporter, Na/H+ antiporter, K+ transporter, inorganic phosphate transporter, and the resistence protein of Ketogulonigenium sp. belonging to this big type of transporters. (iii) Primary active transporter. Ketogulonigenium sp. contains a large number of ABC-type transporters, similar to S. meliloti, Mesorhizobium loti, A. tumefaciens and Synechocytis sp. strain PCC 6803. There are totally 425 predicted ORFs encoding primary active transporters in Ketogulonigenium sp. genome, accounting for more than 60% of total transporters and much higher than the other sequence-known eukaryote and bacteria. The substrates of these ABC-type transporters include glucide, polysaccharide, amino acids, polypeptides, metal ions, complex with metal ions, 944
anions, multidrugs, etc. Ketogulonigenium sp. also contains three P-type ATPases relating with cation transportation. (iv) Group translocator. Ketogulonigenium sp. contains phosphoenolpyruvate (PEP) synthetase, phosphocarrier protein (HPr), ENZYME I, etc. required by phosphotransferase system (PTS). In addition, only receptor proteins related with fructose are found, indicating that only fructose is transported through PTS system. (v) Unclassified transporter. Unclassified transporters include drug efflux pumps, MGTE, etc. 2.5
Carbohydrate metabolism and transportation
Many glucide transporters indicate that Ketogulonigenium sp. can use several kinds of carbohydrate as energy sources. Biology experiments show that Ketogulonigenium sp. can use glucose, fructose, sucrose, trehalose, etc. as its carbon resources. The genome contains genes encoding all enzymes are involved in glycoysis. The tricarboxylic acid cycle is completed and the pentose phosphate pathway is the key pathway to afford energy while using L-sorbose (Fig. 2). The cell signal transduction regulation system of Ketogulonigenium sp. contains two-component signal transduction including sensor protein kinase and response regulator. 13 genes encoding response modifiers have been determined in the Ketogulonigenium sp. genome. Most of them abut on genes encoding Histidine Phosphatases. Response modifiers possess a fairly conservative N-phosphorylation receptor region, and its C-terminal is DNA binding region, sharing very high homology with the response modifiers of S. meliloti, A. tumefaciens, Caulobacter crescentus, and Pseudomonas aeruginosa. 2.6 The enzymes coding genes corresponding to the synthesis of 2-keto-L-gulonic acid Ketogulonigenium sp. genome contains genes encoding L-Sorbose dehydrogenase and L-sorbosone dehydrogenase which catalyze L-Sorbose into 2-keto-Lgulonic acid by two steps. L-sorbosone dehydrogenase shares 64% amino acid similarity with known L-sorbosone dehydrogenase of Gluconobacter oxydans. L-Sorbose dehydrogenase shares 35% amino acid similarity with known L-Sorbose dehydrogenase of Gluconobacter oxydans. At the same time, Ketogulonigenium sp. also contains other enzymes corresponding to the synthesis of 2-keto-L-gulonic acids such as L-idonate dehydrogenase, L-Sorbose reductase, L-sorbosone reductase and 2-keto-L-gulonic acid reductase. Chinese Science Bulletin
Vol. 51 No. 8
April 2006
ARTICLES
Fig. 2. Overview of metabolism and transport in Ketogulonigenium sp. WB0104. Grey circle, TCA cycle; black box, Pentose phosphate pathway; red arrow, glycolysis, fatty acid synthesis, amino acid synthesis; Membrane transporter, cation (green); anion (red); carbohydrate, carboxylate (yellow); amino acid, amine (brown).
The complete sequence of Ketogulonigenium sp. WB0104 is a new step towards a better understanding of Ketogulonigenium sp. biology background, genetic and metabolic feature. Ketogulonigenium sp. WB0104 is an important productive strain of vitamin C in China. The complete sequence of WB0104 is a direct platform for the important metabolic function genes study, the modification and construction of the vitamin C productive strain. The strain will be reconstructed or the ferment techniques will be optimized by changing metabolic orientation, altering or constructing new metabolic approaches on the base of the complete sequence. It is a new way and a new field in the industrial microbiology. Our study will help improve vitamin C production in China, determine the international status in this field and modernize traditional industry in China. Acknowledgements This work was supported by the Hi-Tech Research and Development Program of China (Grant No. 2001AA223071) www.scichina.com
www.springerlink.com
References 1 Urbance J W, Bratina B J, Stoddard S F, et al. Taxonomic characterization of Ketogulonigenium vulgare gen. nov., sp. nov. and Ketogulonigenium robustum sp. nov., which oxidize L-sorbose to 2-keto-L-gulonic acid. Int J Syst Evol Microbiol, 2001, 51(Pt 3): 1059―1070 2 Hayashi T, Makino K, Ohnishi M, et al. Complete genome sequence of enterohemorrhagic Escherichia coli O157: H7 and genomic comparison with a laboratory strain K-12. DNA Res., 2001, 8(1): 11―22 3 Saier M H Jr, Paulsen I T. Phylogeny of multidrug transporters. Semin Cell Dev Biol, 2001, 12(3): 205―213 4 Paulsen I T, Sliwinski M K, Saier M H Jr. Microbial genome analyses: Global comparisons of transport capabilities based on phylogenies, bioenergetics and substrate specificities. J Mol Biol, 1998, 277(3): 573―592 5 Chang G, Spencer R H, Lee A T, et al. Structure of the MscL homolog from Mycobacterium tuberculosis: A gated mechanosensitive ion channel. Science, 1998, 282(5397): 2220―2226 6 Moe P C, Blount P, Kung C. Functional and structural conservation in the mechanosensitive channel MscL implicates elements crucial for mechanosensation. Mol Microbiol, 1998, 28(3): 583―592 7 Wood D W, Setubal J C, Kaul R, et al. The genome of the natural genetic engineer Agrobacterium tumefaciens C58. Science, 2001, 294(5550): 2317―2323 (Received October 31, 2005; accepted January 25, 2006)
945