RNA secondary structure, an important bioinformatics tool to enhance multiple sequence alignment: a case study (Sordariomycetes, Fungi)

In a case study of fungi of the class Sordariomycetes, we evaluated the effect of multiple sequence alignment (MSA) on the reliability of the phylogen...

0 downloads 76 Views 1MB Size

Download PDF

Mycol Progress DOI 10.1007/s11557-012-0836-8

ORIGINAL ARTICLE

RNA secondary structure, an important bioinformatics tool to enhance multiple sequence alignment: a case study (Sordariomycetes, Fungi) Martina Réblová & Kamila Réblová

Received: 7 March 2012 / Revised: 29 May 2012 / Accepted: 1 June 2012 # German Mycological Society and Springer 2012

Abstract In a case study of fungi of the class Sordariomycetes, we evaluated the effect of multiple sequence alignment (MSA) on the reliability of the phylogenetic trees, topology and confidence of major phylogenetic clades. We compared two main approaches for constructing MSA based on (1) the knowledge of the secondary (2D) structure of ribosomal RNA (rRNA) genes, and (2) automatic construction of MSA by four alignment programs characterized by different algorithms and evaluation methods, CLUSTAL, MAFFT, MUSCLE, and SAM. In the primary fungal sequences of the two functional rRNA genes, the nuclear small and large ribosomal subunits (18 S and 28 S), we identified four and six, respectively, highly variable regions, which correspond mainly to hairpin loops in the 2D structure. These loops are often positioned in expansion segments, which are missing or are not completely developed in the Archaeal and Eubacterial kingdoms. Proper sorting of these sites was a key for constructing an accurate MSA. We utilized DNA sequences from 28 S as an example for onegene analysis. Five different MSAs were created and analyzed with maximum parsimony and maximum likelihood methods. The phylogenies inferred from the alignments improved with 2D structure with identified homologous Electronic supplementary material The online version of this article (doi:10.1007/s11557-012-0836-8) contains supplementary material, which is available to authorized users. M. Réblová (*) Department of Taxonomy, Institute of Botany, Academy of Sciences, CZ-252 43 Průhonice, Czech Republic e-mail: [email protected] K. Réblová CEITEC - Central European Institute of Technology, Masaryk University, Campus Bohunice, Kamenice 5, CZ-625 00 Brno, Czech Republic

segments, and those constructed using the MAFFT alignment program, with all highly variable regions included, provided the most reliable phylograms with higher bootstrap support for the majority of clades. We illustrate and provide examples demonstrating that re-evaluating ambiguous positions in the consensus sequences using 2D structure and covariance is a promising means in order to improve the quality and reliability of sequence alignments. Keywords 2D structure . 2D mask . Alignment . Fungal phylogeny . 18 S rRNA . 28 S rRNA

Introduction DNA sequences have been employed in phylogenetic reconstruction since the early 1990s in higher fungi (Walker and Doolittle 1982) and prokaryotic organisms (e.g., Woese et al. 1990) and soon they were being widely applied in mycology to see the whole extent of the kingdom Fungi (e.g., White et al. 1990; Berbee and Taylor 1994; Blackwell and Spatafora 1994; Taylor et al. 2004; Lutzoni et al. 2004; James et al. 2006; Hibbett et al. 2007; Schoch et al. 2009). With the aid of molecular data, we may address conflicts among phylogenies inferred from morphological data and test morphology-based hypotheses that have been formulated since the early days of mycology. Today, nearly every taxonomic mycological study is supplemented with sequence data of different genes to provide support for their inferred classification. The best loci known to construct fungal higher-level multigene phylogenies, and to provide enough support and resolution, are five protein-coding genes, viz. the genes of the largest and second largest subunits of RNA polymerase II (RPB1 and RPB2), and a gene of translation elongation factor 1 alpha (TEF). Recently, two other single-copy protein-coding genes were identified and

Mycol Progress

confirmed to outperform all other single-copy genes used in phylogeny (Aguileta et al. 2008; Schmitt et al. 2009; Raja et al. 2011). These are the Mcm7 gene, which corresponds to a DNA replication licensing factor required for DNA replication initiation and cell proliferation (Moir et al. 1982; Kearsey and Labib 1998), and the Tsr1 gene required for rRNA accumulation during biogenesis of the ribosome (Gelperin et al. 2001). The four most widely used rRNA genes are represented by the genes coding for small and large nuclear ribosomal subunits (nc18 S and nc28 S rRNA), and genes for small and large mitochondrial ribosomal subunits (mr12S and mr16S rRNA). The genes are analyzed individually or in various combinations. The 18 S and 28 S rRNA sequences have been widely applied for fungi and have (apart from the whole ITS rRNA operon) the largest representation in public nucleotide sequence databases (Cochrane et al. 2009). The sequences of protein-coding genes, although still rare, are present at least for representatives of the main fungal orders. The multilevel process, which starts by extracting DNA and ends by creating phylograms to visualize and support phylogenetic relationships, is long. Some of its first steps can entail potential pitfalls that can be avoided by experience, careful treatment of sequence data, and knowledge of principles for manual and automatic assemblage of sequences into a multiple-sequence alignment (MSA). Several step-by-step guides are available (e.g., Harrison and Langdale 2006; Weiss 2010; Hall 2011), but they mostly focus on the part concerning statistical methods to create phylograms from protein or nucleic acid sequence data. The process starts with raw sequenced fragments being edited and assembled into a consensus sequence using different sequential programs. Whoever has worked with sequences downloaded directly from a public nucleotide sequence database, e.g., GenBank (National Center for Biotechnology Information, National Library of Medicine, U.S.A), EMBL-Bank (European Molecular Biology Laboratory, Great Britain), or DDBJ (DNA Data Bank of Japan), has probably encountered that some may be erroneously edited, contain excessive numbers of nucleotides even in conserved areas, or miss certain nucleotides. The 5′- and 3′ends, which may often contain poorly read nucleotides, are part of the submitted sequences (but should rather be omitted from submissions). In the MSA, such segments can cause several disruptions, which can be eliminated only by insertion of redundant gaps or exclusion of the whole area from the analysis. Consensus sequences with erroneous positions have obviously never been checked again by their authors after a first alignment in MSA; ambiguous positions have never been re-evaluated in the light of closely related aligned sequences. Although MSA is a routine in bioinformatics, the range of available approaches used for its construction can

significantly influence the conclusions drawn. In our study, we evaluate two approaches and illustrate how a lack of care for sequence data quality and inappropriate choice of alignment program can adversely affect the outcome of any phylogenetic analysis. Only homologous sequences (i.e. sequences sharing a common ancestor) related over their whole length or most of it can be successfully assembled into an MSA. It can be quite readily done for very similar sequences, but the more divergent the sequences become, the more difficult is their alignment. However, to accurately align more distantly related sequences is a challenging problem. With MSA, we can identify evolutionary motifs, conserved areas with a low level of variation, apply this knowledge to phylogenetics and create reliable phylogenetic hypotheses. Another important use of MSA is secondary (2D) and tertiary (3D) structure prediction. While the use of a single sequence for structure prediction of both protein and RNA provides little accuracy (Garnier et al. 1996; Mathews et al. 1999), predictions based on MSAs have much higher confidence (Schuster et al. 1997; Gutell et al. 1985, 2002; Gutell 1993; Doshi et al. 2004). MSAs of rRNA are also useful in the identification of correlated mutations in a primary sequence known as covariance (Olsen 1983). A wide range of programs for constructing MSAs is available. These programs apply different algorithms and evaluation methods. However, each is best suited for a limited range of situations, which makes it difficult to pick the right method for a particular dataset (Notredame 2002). From the variety of existing programs used to automatically align protein and nucleotide acid sequences into MSA, we selected four, which represent different alignment strategies, i.e. CLUSTAL (Thompson et al. 1994; Larkin et al. 2007), SAM (Hughey and Krogh 1996; Karplus et al. 1998), MUSCLE (Edgar 2004), and MAFFT (Katoh et al. 2002, 2005). Poorly aligned positions and divergent regions may subsequently be excluded from phylogenetic analysis by various approaches (e.g., Gblocks: Castresana 2000; Talavera and Castresana 2007). Only a minority of scientists are concerned with 2D structures of rRNA genes, which are used in the evaluation of phylogenetic problems at higher taxonomic levels. The 2D structure of ribosomal RNA is highly conserved throughout evolution, due to the fact that most of the folding is functionally essential despite obvious divergence in primary sequences (Wheeler and Honeycutt 1988; Michot et al. 1990; Billoud et al. 2000). Therefore, the known 2D structure can principally be applied on any MSA containing homologous sequences to enhance the quality of the phylogenetic signal. In particular, MSA with a 2D mask supports the identification of homologous segments and allows reasonable insertion of gaps (Billoud et al. 2000). The recognition of the 2D structure is also essential in re-evaluating ambiguous positions in the consensus sequences (assembled

Mycol Progress

from the sequenced fragments). However, utilizing the 2D structure is time-demanding; it builds on the knowledge of structural elements and assumes covariance. Considering the availability of a high number of more or less sophisticated and widely used MSA programs that are based solely on the primary sequences, the benefits of a 2D approach could easily be overlooked. The similarity in RNA 2D structure among different organisms can be further analyzed (e.g., Liu and Wang 2006). A phylogenetic approach based on the 2D elements termed molecular morphometrics was successfully applied to fungi, especially when using sequences of the internal transcribed spacers (ITS1 and ITS2 rRNA) of representatives of the Chaetosphaeriaceae (Réblová and Winka 2000) and Erysiphaceae (Takamatsu et al. 1998) of the Ascomycota, and of the Lycoperdaceae (Krüger and Gargas 2004, 2008) of the Basidiomycota. A comparison of 2D structures of selected expansion segments of the 18 S rRNA has been applied to insects (Billoud et al. 2000). A novel method demonstrated using ITS2 data of green algae assumes the application of a sequence structurespecific scoring matrix in order to construct phylogenetic relationships (Buchheim et al. 2011). The application of the RNA 2D structure on the alignment of protein-coding genes is generally not used because mRNAs do not have a stable conserved structure as is typical for rRNA genes. In addition to programs for automatic MSA of rRNA sequences, there are several projects which offer webbased software tools for the alignment and classification of such sequences. These projects offer access to curated rRNA sequences and alignment databases, e.g., SILVA: a comprehensive online resource rRNA sequence data (Pruesse et al. 2007), the Ribosomal Database project (Cole et al. 2009), and Greengenes project (DeSantis et al. 2006). These projects offer sequence data of small subunit rRNA. Only the SILVA project (which currently substitutes the European rRNA databank that no longer exists as a web project; Van de Peer et al. 1997) also incorporates sequence data of large subunit rRNA from all three domains of life, Bacteria, Archaea, and Eucarya. The alignment mechanism of the SILVA project involves calculations according to the secondary structure models of Gutell et al. (1994). In the last step, the sequence data aligned into MSAs are subjected to phylogenetic analyses. Distance, parsimony, and likelihood methods are employed, but the latter are generally considered state-of-the-art phylogenetic techniques (Whelan et al. 2001). They allow statistical testing of large datasets employing different evolutionary hypotheses as models (Goldman 1993; Posada and Crandall 2001; Goldman and Yang 1994; Sullivan and Joyce 2005; Nylander et al. 2004; Shapiro et al. 2006) and are basically divided into two alternatives, Bayesian and maximum likelihood methods. The use of gene- and/or partition-specific models should reduce the possibility of error and result in

better likelihood scores and more accurate posterior probability estimates (e.g., Brandley et al. 2005). To demonstrate the usefulness of incorporating 2D information into the process of MSA construction, we present a study based on representatives of orders and families of three subclasses of the Sordariomycetes accepted by Eriksson (2006) and Lumbsch and Huhndorf (2010). The choice of this fungal class was prompted by the good representation of their rRNA sequences in GenBank, the availability of several recently published sequences by Zhang et al. (2006), Réblová et al. (2011) and Réblová (2011), and the fact that M. Réblová is familiar with the taxonomy and phylogeny of this fungal class. We focused on the two rRNA genes (18 S and 28 S) and enhancement of their alignments with 2D structure. Because the primary sequence of the 28 S gene is more variable compared to that of the 18 S gene, we used the former as an example of one-gene analysis. We also created a multigene alignment combining these sequences with those of the protein-coding gene, RPB2. We compare the inferred three-gene phylogeny, e.g., reliability of phylogenetic groups and supports of individual clades, with analyses based on a single rRNA gene. The study focuses on the following points: (1) evaluation of one-gene MSAs created manually with the use of 2D structure with identified homologous segments versus alignments created by multiple alignment programs; (2) evaluation of the effect of constructing MSAs following different principles on the reliability of the phylogenetic trees, topology and confidence of major phylogenetic clades; and (3) exploring similarities and conflicts among the resulting phylogenies; comparing phylogenetic reconstructions of major evolutionary lineages with the generally accepted euascomycete classification.

Materials and methods Taxon sampling The analyzed fungi were inoperculate euascomycetes classified in three subclasses of the Sordariomycetes, i.e. Hypocreomycetidae, Sordariomycetidae, and Xylariomycetidae. Altogether, 29 distinct groups representing 14 orders and 15 families were included and analyzed first using 28 S phylogenies. In the phylogenies constructed subsequently from MSAs of three genes (18 S-28 S-RPB2), we focused on just 17 distinct groups. The cutback in the number of groups is due to the fact that not all three genes were always available. In general, if an order was represented in phylogeny by a single family, e.g., the Chaetosphaeriales, Coniochaetales, Coronophorales, Melanosporales, or Xylariales, they were analyzed only at the ordinal level. However, orders like the

Mycol Progress

Calosphaeriales, Glomerellales, and Microascales, which in some phylogenetic analyses appeared paraphyletic, were concurrently analyzed as individual families. The orders Diaporthales, Hypocreales, and Sordariales, always shown as monophyletic clades, were not further divided into families. We also tested the phylogenetic relationships among three subclasses and four other distinguished clades I−IV that correspond to constant groupings of orders and families, viz. clade I: Boliniales, Chaetosphaeriales, Coniochaetales, and Sordariales; clade II: Calosphaeriales, Diaporthales, Jobellisiaceae, and Togniniaceae; clade III: Hypocreomycetidae and Sordariomycetidae; clade IV: clade III + Xylariomycetidae. Taxon sampling was done to ensure that the minimum of three and maximum of nine species per order and minimum of two species per family were included, at least for the phylogeny based on 28 S data. The main source of sequences, which were retrieved from GenBank, were studies published by Wingfield et al. (1999), Réblová and Winka (2000), Réblová et al. (2004), Spatafora et al. (2006), Zhang et al. (2006), Réblová et al. (2011) and Réblová (2011). The species, their higher-level classification, and accession numbers of analyzed sequences are compiled in the Supplementary Table 1. Multiple sequence alignments Using 18 S and 28 S rDNA sequences, five different MSAs were created for each gene. In construction of MSAs, we used two different approaches based on (1) application of 2D structure on rDNA sequences, and (2) automatic adjustment of rDNA sequences into MSAs by four different alignment programs. In MSA improved with 2D structure, sequences were manually aligned in BioEdit v.7.0.9.0 (Hall 1999). All 18 S and 28 S sequences were enhanced by utilizing the homologous 2D structure of Saccharomyces cerevisiae Meyen ex E.C. Hansen (Gutell 1993; Gutell et al. 1993) obtained from RNA STRAND database (Andronescu et al. 2008) in the dot-bracket format (2D mask). Sequences were adjusted according to the 2D mask, which was aligned with the studied sequences and decisions on homologous characters and introductions of gaps were improved. For regions where the 2D structure was poorly determined (such as in the expansion segment ES6 in 18 S), we used the model of 18 S corrected by Wuyts et al. (2002), and also recently determined 3D structures of S. cerevisiae for 18 S and 28 S (Ben-Shem et al. 2010). The 28 S alignment improved with 2D structure is deposited in TreeBASE (Study no. 12367). Automatic MSAs of 18 S and 28 S sequences were prepared using four different programs, i.e. CLUSTAL v.2 (Thompson et al. 1994; Larkin et al. 2007), MAFFT v.6.853 (Katoh et al. 2002, 2005), MUSCLE v.3.8 (Edgar 2004) and

SAM v.3.5 (Hughey and Krogh 1996; Karplus et al. 1998), under their default options. In the case of the MAFFT program, we selected the option of using iterative refinement with WSP scores (weighted sum-of-pairs) and consistency scores to achieve the highest accuracy. Due to errors in several consensus sequences at the 5′-end, we observed a shift of block of characters in the CLUSTAL alignment at the positions 83−86. These erroneously shifted sequences (belonging to taxa of the Hypocreomycetidae) were checked and the shift was corrected manually. In order to construct the three-gene MSA, the individual 18 S and 28 S alignments were combined with DNA sequences coding for the second largest subunit of RNA polymerase II (RPB2). Five different multigene MSAs were created. The first MSA contained 18 S and 28 S sequences enhanced with 2D structure and RPB2 sequence data adjusted based on application of Blosum62 matrix. In particular, the RPB2 sequences were transformed into protein sequences maintaining a correct reading frame using the BioEdit program. We applied ClustalW implemented in BioEdit and protein weight matrix Blosum62 to create an MSA. This alignment was improved by taking into account the exchangeability of amino acids (AA) with similar chemical properties at certain positions. The protein alignment was converted back into a DNA alignment. In the second to the fifth MSAs, the concatenated 18 S, 28 S, and RPB2 sequences were aligned automatically using CLUSTAL, MAFFT, MUSCLE, and SAM, respectively. The alignment of RPB2 sequences was performed as DNA alignment only. For simplification, the alignments are henceforth labeled by the name of the program which created them or reflecting the use of secondary structures and/or protein matrix. Character sampling The 2D structure of the first half (5´-terminal domain) of the 28 S rRNA of S. cerevisiae (Fig. 1) corresponding to the 1−1,878 nucleotides and the whole 18 S rRNA of S. cerevisiae (Supplementary Fig. 1) were numbered and labeled with forward and reverse fungal primers, which are widely used for PCR amplification and sequencing (Supplementary Table 2). The priming sites are in some cases occupied by more than one primer. Sometimes, a blue label (indicating the location of a 5′-end primer) and a green label (indicating the location of a 3′-end primer) refers to several primers as typical for the particular priming site. In the alignment of 28 S sequences using the 2D structure of S. cerevisiae, we identified six highly variable areas, five hairpin loops (130−138, 463−467, 541−550, 600 −604, 731−738) and one linker segment in a three-way junction (978−980). The numbers in parentheses correspond with the exact positions of nucleotides in the 2D structure of S. cerevisiae. These areas and their respective alignments are

Mycol Progress

Fig. 1 Predicted model of 2D structure of 28 S rRNA (5′ half) of Saccharomyces cerevisiae (according to Gutell et al. 1993). Sites for primers are labeled, with a blue (5′-end → 3′-end) and a green (5′-end ← 3′-end) color, respectively. The exact position of each primer within

the primary sequence of S. cerevisiae is given in Supplementary Table 2. Six highly variable sites corresponding to hairpin loops 1−5 (130−138, 463−467, 541−550, 600−604, 731−738) and a linker segment in a three-way junction (978−980) are labeled with a pink color

Mycol Progress

illustrated in Fig. 2 and Supplementary Fig. 2. We applied this knowledge on the alignments created by the four alignment programs to identify the six highly variable areas and the erroneously aligned character sites comprising nonhomologous nucleotides. The numbers of nucleotides excluded from the six variable areas of 28 S were the following: 28 in the 2D, 58 in CLUSTAL, 61 in MAFFT, 66 in MUSCLE, and 78 in the SAM alignments. Only four (and much shorter) variable areas were identified in the 2D structure of 18 S rRNA, these were hairpin loops (132−135, 1056−1057), an internal loop (237−241), and a linker segment in a three-way junction (1092−1093).

Phylogenetic relationships among taxa of the Sordariomycetes were examined using two separate datasets, the 28 S

(94 species; Supplementary Table 3) and the combined set of three genes 18 S-28 S-RPB2 (56 species; Supplementary Table 4), each represented by five different MSAs. We analyzed the whole 18 S region, 5−7 regions of the RPB2 gene, but only the first 1,197 nucleotides in the 28 S gene, because the majority of sequences deposited in GenBank terminate here. The length of alignments, numbers of variable and excluded characters, numbers of the most parsimonious trees found, tree length, and other details for maximum parsimony and maximum likelihood analyses were compiled in respective tables. Two outgroup taxa, S. cerevisiae and Vanderwaltozyma polyspora (Van der Walt) Kurtzman (Saccharomycetes), were used to root the phylogenies. Bases 1−75 were excluded from phylogenetic analyses of the 28 S and 18 S alignments and bases 1−60 were excluded from the analysis of RPB2 alignment because of the

Fig. 2 Examples of MSAs of three hairpin loops distinguished in the 2D structure of 28 S rRNA. The alignment was enhanced with 2D structure or was created by four different alignment programs. The

framed part in the 2D alignment corresponds with the framed part of the 2D structure of Saccharomyces cerevisiae. Highly variable sites (130−138, 463−467, 541−550) are labeled with a pink color

Phylogenetic analyses

Mycol Progress

incompleteness of the 5′-end in the majority of the available sequences. An additional 44 bases were excluded from the 3′-end of the 18 S alignment, because in the majority of sequences this part was missing. The combined dataset was partitioned into four subsets of nucleotide sites; (1) 28 S, (2) 18 S genes of the rDNA, (3) first and second codon positions of RPB2, and (4) third codon position of RPB2. The three genes for the combined analysis were tested for heterogeneity before combining them for the total evidence analysis. We used the partition homogeneity/incongruence-length difference test implemented in PAUP v.4.0b10 to determine whether different partitions of the data gave significantly different signals. Because combining data with P<0.01 generally improves phylogenetic accuracy (Cunningham 1997) and our data did not show significant heterogeneity (P00.01), the sequences were combined for further analysis. Maximum parsimony (MP) and maximum likelihood (ML) were used to estimate phylogenetic relationships; 20 phylogenies were created for each sequence dataset (Supplementary Tables 3, 4). Every alignment, excluding the specified number of characters at the 5′-end and 3′-end, was tested with both phylogeny methods under two hypotheses, (1) all characters were included (labeled with “a” suffix, e.g. MPa), or (2) defined sets of characters corresponding to the six variable areas of 28 S were excluded (labeled with “b” suffix). Maximum likelihood analysis was performed with RAxML-HPC v.7.0.3 (Stamatakis et al. 2005, Stamatakis 2006) using a GTRCAT model of evolution, which is a combination of GTRGAMMA and GTRCAT (a RAxMLspecific alternative model, in which the alignment sites are pooled into a pre-specified number of rate categories). The GTRCAT model is used for the heuristic search, the final tree topology is optimized, and the stable likelihood values are calculated under the GTRGAMMA model. The nodal support was verified by nonparametric bootstrapping with 1,000 replicates. Maximum parsimony analysis was conducted with PAUP v.4.0b10 (Swofford 2002). A heuristic search was performed with the stepwise-addition option, with 1,000 random taxon addition replicates and TBR branch swapping. All characters were unordered and given equal weight. Gaps were treated as missing data. Branch support was estimated on the recovered topologies by performing a heuristic search of 1,000 bootstrap replicates consisting of ten randomaddition replicates for each bootstrap replicate.

Results The first 20 phylogenies (10 MP+10 ML) were restricted to the 28 S sequences. The details and differences among

Fig. 3 Phylograms (strict consensus) inferred from the MP analysis of 28 S sequences of taxa of the Sordariomycetes. Phylogenies are based on MSAs enhanced with 2D structure or created by four different alignment programs. The phylogram inferred from the MP/ML phylogeny of three genes (18 S-28 S-RPB2) is drawn for comparison. The framed phylograms represent tree topologies, which are in a major agreement with phylogeny based on three genes. H Hypocreomycetidae, S Sordariomycetidae, X Xylariomycetidae

individual phylogenies based on five different MSAs are illustrated and discussed in detail below (Supplementary Table 3; Figs 3, 4). Results from other 20 phylogenies based on five MSAs of the combined sequences of three genes (18 S-28 S-RPB2) are summarized in Supplementary Table 4 (phylograms not shown). We evaluated and compared all phylograms based on MSAs created by five different methods. We focused on the tree topologies and confidence of major lineages of subclasses, main groupings of orders, families, and clades I−IV distinguished in the Sordariomycetes. Phylogenetic results based on the 28 S rDNA We observed several features characteristic for phylogenies performed solely on the 28 S gene. The Xylariomycetidae are always strongly supported (94−99 % in MP and 90−98 % in ML). The Hypocreomycetidae are resolved in MP phylogenies, but did not obtain bootstrap support higher than 50 %, while in ML the subclass is always resolved and supported by steady bootstrap values ranging from 71 to 89 %, with, however, one exception. In phylogenies based on the CLUSTAL alignment (MPa/MLa, MPb), the subclass is not resolved, or resolved (MLb) but received bootstrap support below 50 %. In all MP analyses, the Sordariomycetidae are not resolved, except in the analyses based on the MUSCLE alignment (MPa), while in the ML analyses, the Sordariomycetidae are resolved but received branch support lower than 50 % except in the analyses based on the CLUSTAL (MLb); in the MUSCLE analysis (MLa), the subclass is not resolved. The Lulworthiales are shown as a member of the subclass Hypocreomycetidae, often as a sister of the Coronophorales/Melanosporales clade, except in the analyses based on the SAM (MPa), CLUSTAL (MPa/ MLa, MPb), MAFFT (MLb), and MUSCLE (MPa/MLa) alignments; however, such a relationship remains poorly supported. Lastly, no support is provided for the monophyly of the Microascales in any of the MP or ML analyses based on 28 S sequence data; instead, its four families form two well-supported but phylogenetically separated clades. The clade containing the Halosphaeriaceae and Microascaceae represents the Microascales s. str. in our phylogenies (see also “Discussion”). Clades I and II were always resolved in ML analyses with mid- to high bootstrap supports, while in MP analyses they were often resolved, but with bootstrap support lower than 50 %.

Mycol Progress

Mycol Progress

Of all 28 S phylogenies, only those based on the 2D (MPa/MLa) and MAFFT (MPa/MLa) alignments with all characters included produced phylograms with delimitation of orders and families identical to that accepted by Lumbsch and Huhndorf (2010) and Zhang et al. (2006). The new arrangements, the alternative analyses with characters from the six variable positions excluded, did not bring further improvement, i.e. we did not obtain higher bootstrap support values (always greater numbers of groups obtained lower values of bootstrap support), and some groups were not resolved compared to analyses with all characters included. Except the two listed cases, paraphyly of the Calosphaeriales, Boliniales, Coronophorales, and Glomerellales appeared frequently; compare Supplementary Table 3. Phylogenetic results based on multigene alignment In phylogenetic analyses based on the three genes, we focused on evaluation of 17 distinct groups (Supplementary Table 4). The inferred relationships among the three subclasses received high bootstrap supports. We identified two other major lineages, clades III (Hypocreomycetidae + Sordariomycetidae) and clade IV (clade III + Xylariomycetidae). The clades III and IV were resolved only in ML analyses based on the 2D (with RPB2 alignment constructed with Blosum62 substitution matrix; clade III: 59 % in MLa, 54 % in MLb) and MAFFT alignments (clade III: 51 % in MPa, 54 % in MPb). However, only phylogenies based on the MAFFT alignment obtained branch support for the clade IV higher than 50 % (clade IV: 66 % in MLa, 82 % in MLb). In all MP analyses, the Lulworthiales were placed as a basal group to all ingroup taxa, and the Hypocreomycetidae and the Sordariomycetidae formed a sister clade to the Xylariomycetidae. However, in ML analyses based on the CLUSTAL, MUSCLE, and SAM alignments, the Lulworthiales showed a tendency to appear as a sister group to the Hypocreomycetidae, although this relationship was only poorly supported.

Discussion Alignment programs Four different multiple alignment programs, viz. CLUSTAL, MAFFT, MUSCLE, and SAM, were selected to create MSAs of 18 S and 28 S sequences to be compared to the alignments enhanced by the application of a 2D mask on rDNA sequences. Calculations were run under the default options. Generally, alignment programs become less reliable when sequences show a high degree of divergence, thus giving only an approximate solution. The ClustalX program (the graphical interface of ClustalW) represents a method of

Fig. 4 Phylograms inferred from the ML analysis of 28 S sequences of taxa of the Sordariomycetes. Phylogenies are based on MSAs enhanced with 2D structure or created by four different alignment programs. The phylogram inferred from the MP/ML phylogeny of three genes (18 S-28 S-RPB2) is drawn for comparison. The framed phylograms represent tree topologies, which are in a major agreement with phylogeny based on three genes. Abbreviations as in Fig. 3

progressive alignment strategy using a pair-wise algorithm (Thompson et al. 1994; Larkin et al. 2007). The algorithm is further specified by gap opening, gap extension, and weighting DNA transitions. Clustal starts with the alignment from the preceding round to calculate genetic distances (1); then, a new alignment is built based on a NJ guide tree inferred from these distances (2). While within the step 2 once the sequences are aligned, their alignment will never be re-evaluated or modified by sequences added later; however, sequences may be realigned in step 1 of the next round. We employed three other programs based on different paradigms. The MUSCLE program (Edgar 2004) is based on a progressive alignment, the so-called log-expectation score, and subsequent refinement of the aligned sequences using tree-dependent restricted partitioning. The MAFFT program (Katoh et al. 2002, 2005) uses various strategies that can be chosen by the user; it offers a progressive method and also permits iterative refinement with WSP scores (weighted sum-of-pairs) and consistency scores, which we used in our study. The SAM program (Hughey and Krogh 1996; Karplus et al. 1998) is based on an iterative/stochastic algorithm and uses an implemented linear hidden Markov model. 2D structure and identification of variable sites in the 28 S rRNA Although the 28 S rRNA gene is moderately conserved for phylogenies at the genus and higher taxonomic levels, it contains several areas with significant variability among orders and families of the Sordariomycetes and Euascomycetes in general. In the phylogenies restricted to the 28 S gene, we were limited, to a certain extent, by the length of available sequences. We analyzed only the first 1,197 nucleotides (D1 and almost the whole D2 domain), because the majority of 28 S sequences deposited in GenBank were not sequenced beyond this point. Thus, the 3′-end of many analyzed sequences corresponded to the site of the LR6 primer (Fig. 1). Therefore, the length of the final 28 S MSA corresponds to the first 1,197 nucleotides of S. cerevisiae. The two domains generally referred to as D1 (1–635 in S. cerevisiae) and D2 (636–1,450 in S. cerevisiae) are widely used as markers for 28 S-based fungal phylogenies. In the past, phylogenetic studies were often based on 28 S sequences corresponding to the D1 domain only (∼635 nucleotides) and some sequences available in GenBank had mere ∼400 nucleotides.

Mycol Progress

Mycol Progress

The variable segments detected in the primary sequences of the two rRNA genes and in their 2D structure are not evenly distributed throughout the rRNA molecule; instead, they are often restricted to highly variable regions called expansion segments (Hassouna et al. 1984; Gerbi 1996). The expansion segments (ES) are additional nucleotide clusters with a high evolutionary rate, which are inserted at specific positions in the common conserved rRNA core (Larson and Wilson 1989). The presence and sequence length of ES vary among organisms. In the analyzed part of the 28 S gene, we identified six short diverging sites (Fig. 1), which are the main sources of variability in the D1/D2 domains. The variation is due to insertion/deletion events; the length of these variable regions is typical for certain phylogenetic groups, orders or families. At the same time, they represent sites of possible ambiguity in the alignments (Fig. 2; Supplementary Fig. 2). Five of these regions form hairpin loops, while the sixth one is a linker segment in a three-way junction in the 2D structure. They are positioned in areas well known as expansion segments ES5, ES7, ES9, and ES12 (Taylor et al. 2009: fig. 5), which are missing or not completely developed in the Archaeal and Eubacterial kingdoms of life. The D1 domain contains four of these six critical regions. However, phylogenies based on the first 400 nucleotides possess only one highly variable region. The sequence length of the first diverging site (a hairpin loop; Fig. 2) varies among orders and families of the Sordariomycetes. It corresponds to nucleotides 133−136 of S. cerevisiae. The duplex, which gives rise to the hairpin loop, possesses a diverging upper part with a variable number of base pairs which correspond to the first three pairs of S. cerevisiae (nucleotide pairs 130/138, 131/137, 132/136). The shortest hairpin loop consisting of 2(−3) nucleotides is typical of the Calosphaeriales, while the longest occurs, e.g., in the Coronophorales (5−6) or Microascales (3−)4−5. The length of the stem also differs among orders and families. A stem consisting of five base pairs is characteristic of the Coronophorales; four base pairs are typical of the Melanosporales, Microascales, and Plectosphaerellaceae, while the shortest stem of 2(−3) base pairs is present in the majority of studied orders and families. The loop of the second hairpin loop (463−467 in S. cerevisiae) is less variable, usually consisting of four nucleotides, but in the Glomerellales, Plectosphaerellaceae, Microascaceae, and Halosphaeriaceae, it possesses usually five nucleotides. The third hairpin loop (541−550 in S. cerevisiae) represents a region where typically four nucleotides occur in the loop, while the number of base pairs of the duplex varies. The longest duplex is present in the Coronophorales, Diaporthales, and Plectosphaerellaceae, and also the sequence of Ceratocystis fimbriata (AF221009; Wingfield et al. 1999) has an unusually long duplex compared with other taxa belonging to the

Microascales (Fig. 2). The number of the nucleotides in the loop of the fourth hairpin loop (600−604 in S. cerevisiae; Supplementary Fig. 2) varies from three to six; it is shortest in the Calosphaeriales, Chaetosphaeriales, or Glomerellaceae, but longest in the Melanosporales and in Bertia moriformis of the Coronophorales. The loop in the fifth hairpin loop (731−738 in S. cerevisiae) typically possesses two nucleotides in the majority of the studied families and orders; however, 3−4 nucleotides occur in the Gondwanamycetaceae, (2−)4−6 in the Xylariales, and four in the Melanosporales. The variability in the linker segment (Supplementary Fig. 2) is related only to its first half (978−980 in S. cerevisiae), where the sequence length varies between 2−7 nucleotides among members of the Sordariomycetes, while the nucleotides 981−984 are highly conserved in all orders and families. In Eucarya, this linker segment is positioned in the expansion segment ES12 in an area of a three-way junction, while Archaea show a kink-turn motif (Kt-38) in this site (Lescoute et al. 2005), and Eubacteria show a kink-turn-like segment in the same position (Réblová et al. 2010). This region is in the middle of Helix 38, which is also known as A-site finger due to its unusual length, distinct bend, and direct interaction with the A-site tRNA (Yusupov et al. 2001). The A-site finger forms an intersubunit bridge B1a and modulates ribosomal activity (Piekna-Przybylska et al. 2008). Considering Gutell’s 2D structure of 28 S for Eucarya based on S. cerevisiae (Gutell, web: http://www.rna.ccbb. utexas.edu/) with ranked nucleotide conservation, critical segments corresponding to hairpin loops 2−4 are positioned in areas with low conservation (less than 80 %), while primer sites are strongly conserved. The conservation 2D map also reveals in which segments, beyond the point marked by the LR6 primer in Fig. 1, we should expect higher variability. The segment between nucleotides in the positions 1,615−1,831 of S. cerevisiae refers to a region where we can expect such low conservation. However, this segment, which could enhance 28 S phylogenies, is generally missing in 28 S sequences deposited in GenBank. In the primary structure of the 18 S gene, we identified four shorter and less variable segments (Supplementary Fig. 1) compared to the 28 S gene, and therefore this gene was less useful as an example for a one-gene analysis. Moreover, the phylogenies based solely on the 18 S rRNA gene (not shown; Réblová et al. 2011) provided identical tree topology compared with a three-gene analysis, except for the unresolved backbone of the Sordariomycetes. The 2D structure map of 18 S rRNA of S. cerevisiae (Gutell, loc cit) with ranked nucleotide conservation contains predominantly sites with 98+, 90−98, and 80−90 %. Two of the four variable areas (internal loop 237−241 and hairpin loop 1,056−1,057) are positioned in the expansion segments ES3 and ES7, respectively, which are present only in the

Mycol Progress

Eucarya (Wuyts et al. 2002) and which belong to the category with conservation lower than 80 %.

structure mask may considerably decrease the number of errors left in the final sequence.

Application of the 2D mask on MSA versus MSA created by alignment programs

Phylogenetic analyses of 28 S rDNA sequences

Figure 2 and Supplementary Fig. 2 illustrate how the four alignment programs cope with the six problematic and critical regions distinguished in the primary and secondary structure of 28 S rRNA compared with the method applying a 2D mask. The numbers of possibly ambiguously aligned characters corresponding to the six variable areas differed among individual alignments. Although 28 ambiguous characters were identified in the 2D alignment, the numbers of nucleotides excluded from MSAs generated by alignment programs increased and varied between 58 and 78. Although each program aligned these segments differently, we could detect one common feature—in order to align them, additional gaps were frequently inserted. The disadvantage of the alignment programs clearly lies in the fact that they cannot distinguish 2D structure in the primary sequence, i.e. they cannot separate duplex regions from the loops. Thus, it often happened that non-homologous nucleotides were aligned, e.g., nucleotides of the loop were mixed with nucleotides of the duplex, and vice versa. Such situations result in an alignment of nonhomologous sites and the subsequent exclusion of more characters from analysis than necessary. To correct such possible errors, we utilized known 2D structures of 28 S and 18 S rRNA of S. cerevisiae on sequences in the alignment, while assuming covariance. The 2D structure defines duplexes, hairpin loops, internal loops, bulges, and junctions. The knowledge of their topology helps us to align all sequences in the MSA according to the common structure mask. At the same time, we were able to detect substitutions in the conserved sequences, for if they appeared in one strand then corresponding areas are also likely to be found in the opposite strand. These substitutions are known as covariations. When we detect them in the strand, we do not introduce gaps in the MSA, because these substitutions are isosteric, i.e. they preserve the 3D structure (Leontis et al. 2002). The knowledge of rRNA 2D topology and covariance can also be used for correcting and re-evaluating ambiguous positions in the consensus sequence which appear due to sequencing errors in the assembled sequenced fragments. This can be done by determining the 2D region and then comparing the problematic nucleotide reads with those belonging to the opposite strand of the RNA. Because the quality check of submitted sequences by GenBank staff is limited due to the amount of sequences submitted on a daily basis, the authors are responsible for preparation of high quality sequences. We believe that an additional control of a consensus sequence using the 2D

In the analyses based on 28 S alignment enhanced with 2D structures, the inferred groupings matched the topology of the main lineages of orders and families of the Sordariomycetes constructed from analyses of four genes by Zhang et al. (2006)1 and from analyses of three genes (this study). However, the resolution of the backbone of the Sordariomycetes proved to be problematic, indicating that this single gene might be insufficient to resolve higher-level phylogenies even with 2D analysis. The situation in the Hypocreomycetidae was often complicated by the position of the Lulworthiales, its sister group relationship with the Melanosporales and Coronophorales clades in all MP analyses. Possibly, the low quality of several 28 S sequences of members of the Lulworthiales deposited in GenBank may also play a role. In phylograms inferred from MP, the 28 S sequence data alone point to para- or polyphyly of the Sordariomycetidae, while in ML, the node of the subclass is resolved, but with a low confidence (bootstrap support lower than 50 %). The Microascales are not shown monophyletic, instead two major subclades each containing two families are always separated, i.e. Microascaceae/Halosphaeriaceae and Ceratocystidaceae/Gondwanamycetaceae (Réblová et al. 2011). The exclusion of characters of six variable regions from MP and ML analyses, identified as possible ambiguously aligned positions, did not give higher bootstrap support. Instead, their exclusion caused paraphyly of three orders, i.e. the Boliniales, Calosphaeriales, and Glomerellales. In phylogenies derived from MSAs created automatically by four different programs, we observed several cases of paraphyly of otherwise phylogenetically and taxonomically well-delimited orders, i.e. Calosphaeriales and Coronophorales. The monophyly of the Glomerellales was corrupted by the inclusion of the family Plectosphaerellaceae. Similarly to results based on the 2D alignments, the exclusion of possibly ambiguously aligned segments led to paraphyly of several orders, and the confidence for individual clades did not increase but varied, and a majority of clades received lower support (Supplementary Table 3). 1

The sequences of representatives of four orders and families, the Calosphaeriales, Glomerellales, Plectosphaerellaceae, and Togniniaceae, became available recently. They were not included in the fourgene analysis performed by Zhang et al. (2006). These taxa belong to the subclass Sordariomycetidae and further resolve its phylogenetic structure. Of the other orders and families recently recognized, we did not include members of the Amplistromataceae, Helminthosphaeriaceae, Koralionastetales, Meliolales, Phyllachorales, and Trichosphaeriales, mostly because of the lack of DNA sequence data and representative taxa.

Mycol Progress

Conclusions Our results show that application of 2D structure can improve MSA. In the 2D structure of 28 S rRNA, we identified six sites, which carry the main variability in the primary structure. Proper sorting of these sites was a key for subsequent phylogenetic analyses as they allowed constructing of more reliable and resolved phylograms. Based on the known 2D conservation map of 28 S rRNA of S. cerevisiae, we can estimate the location of other, however rarely sequenced, variable segments. In the 2D structure of 18 S rRNA, we identified four variable sites, which were shorter and less diverging in their primary structure compared to the 28 S rRNA. These variable and critical sites are positioned in areas of expansion segments, which are present and well developed only in Eucarya. The reliable 28 S phylogram with the highest branch supports for main phylogenetic lineages was constructed from maximum parsimony and maximum likelihood based on the 2D alignment (MPa/MLa), when all characters were included. If these two phylograms were compared with phylograms created by four alignment programs, only those based on the MAFFT alignment (MPa/MLa) are well-comparable. The resolved topology of phylogenetic lineages of orders and families was in concordance with Eriksson (2006) and Lumbsch and Huhndorf (2010), whilst the backbone of the Sordariomycetes phylogeny remained poorly resolved. The values of bootstrap support for individual nodes are similar, and clades I and II are well-delimited. Phylograms of the Sordariomycetes inferred from five different MSAs of sequences of three genes (18 S-28 SRPB2), regardless of whether characters were included or excluded, resulted in the same tree topology as presented by Zhang et al. (2006), except results inferred from the SAM alignment. The groupings are largely consistent with the Euascomycete classification accepted by Eriksson (2006), and Lumbsch and Huhndorf (2010). Although we commonly saw paraphyly or polyphyly and several groups lacked resolution in analyses based on the single 28 S gene, all differences and discrepancies disappeared if the same data were concatenated into a multigene analysis. We observed a main shift in topology, the backbone of the Sordariomycetes phylogeny was always resolved, and both MP/ML analyses provided higher confidence for several clades of orders and families. The clades III and IV (also recognized by Zhang et al. 2006) were identified only in the three-gene phylogeny and represent high-level phylogenetic lineages among subclasses. Only two phylograms based on the MAFFT alignment (MPa/MLa) and another two based on the 2D alignment (MPa/MLa) based on the 2D alignment (with protein data improved with Blosum62 and AA exchangeability) with all variable regions included had the highest

scores for the lineages of subclasses and resolved clades III and IV in both MP/ML analyses. Of the four alignment programs used to automatically create MSAs, only the MAFFT program, while employing the iterative refinement method with the WSP and consistency scores, provided reliable phylogenies. The MAFFT method we used is the slowest and the most computationalcost-demanding option available to achieve a high accuracy. The knowledge of 2D structures and understanding of covariance function are essential in order to create highly confident MSAs, with clearly identified homologous sites. MSA of precisely aligned sequences is a prerequisite of reliable phylogenetic analyses. Moreover, application of 2D structure proved to have a positive effect on the quality of the consensus sequence, i.e. editing ambiguously read/ interpreted nucleotides. As much as these parts of our research may seem simple and obvious, they are the soul and heart of our analyses and the conclusions drawn. Acknowledgements This study was supported by the Project of the National Foundation of the Czech Republic (GAP 506/12/0038), as a long-term research development project of the Institute of Botany, Academy of Sciences no. RVO 67985939 and by the Project CEITEC - Central European Institute of Technology from European Regional Development Fund (CZ.1.05/1.1.00/02.0068). Martina Réblová wishes to dedicate this paper to Ove Eriksson from University Umeå for his guidance and supervision during her stay in 1999, teaching her about 2D structures and their application on the alignment of sequences of rRNA genes. We are grateful to Michael Weiss for consultation, careful reading of our manuscript, providing valuable suggestions and pointing out several interesting aspects of methodology. Walter Gams and Ove Eriksson are thanked for reading earlier versions of this manuscript and providing valuable comments and suggestions.

References Aguileta G, Marthey S, Chiapello H, Lebrun MH, Rodolphe F, Fournier E, Gendrault-Jacquemard A, Giraud T (2008) Assessing the performance of single-copy genes for recovering robust phylogenies. Syst Biol 57:613–627 Andronescu M, Bereg V, Hoos HH, Condon A (2008) RNA STRAND: the RNA secondary structure and statistical analysis database. BMC Bioinformatics 9:340 Ben-Shem A, Jenner L, Yusupova G, Yusupov M (2010) Crystal structure of the eukaryotic ribosome. Science 330:1203–1209 Berbee ML, Taylor JW (1994) 18 S ribosomal DNA sequence data and dating, classifying, and ranking the fungi. In: Hawskworth DL (ed) Ascomycete systematics: problems and perspectives in the nineties. NATO ASI Series A, vol 269. Plenum, New York, pp 213–223 Billoud B, Guerrucci MA, Masselot M, Deutsch JS (2000) Cirripede phylogeny using a novel approach: molecular morphometrics. Mol Biol Evol 17:1435–1445 Blackwell M, Spatafora JW (1994) Molecular data sets and broad taxon sampling in detecting morphological convergence. In: Hawksworth DL (ed) Ascomycete systematics: Problems and perspectives in the Nineties. NATO ASI Series A, Vol. 269. Plenum, New York, pp 243–248

Mycol Progress Brandley MC, Schmitz A, Reeder TW (2005) Partitioned Bayesian analyses, partition choice, and the phylogenetic relationships of scincid lizards. Syst Biol 54:373–390 Buchheim MA, Keller A, Koetschan C, Forster F, Merget B, Wolf M (2011) Internal transcribed spacer 2 (nu ITS2 rRNA) sequencestructure phylogenetics: towards an automated reconstruction of the green algal tree of life. PLoS One 6:e16931 Castresana J (2000) Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol 17:540–552 Cochrane G, Akhtar R, Bonfield J et al (2009) Petabyte-scale innovations at the European Nucleotide Archive. Nucleic Acids Res 37: D19–D25 Cole JR, Wang Q, Cardenas E, Fish J, Chai B, Farris RJ, Kulam-SyedMohideen AS, McGarrell DM, Marsh T, Garrity GM, Tiedje JM (2009) The Ribosomal database project: improved alignments and new tools for rRNA analysis. Nucleic Acids Res 37(Database issue):D141–D145 Cunningham CW (1997) Can three incongruence tests predict when data should be combined? Mol Phylogenet Evol 14:733–740 DeSantis TZ, Hugenholtz P, Larsen N, Rojas M, Brodie EL, Keller K, Huber T, Dalevi D, Hu P, Andersen GL (2006) Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl Environ Microbiol 72:5069–5072, http:// greengenes.lbl.gov/ Doshi KJ, Cannone JJ, Cobaugh CW, Gutell RR (2004) Evaluation of the suitability of free-energy minimization using nearest-neighbor energy parameters for RNA secondary structure prediction. BMC Bioinformatics 5:105 Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797 Eriksson OE (Ed) (2006) Outline of Ascomycota - 2006. Myconet 12:1–82 Garnier J, Gibrat J-F, Robson B (1996) GOR method for predicting protein secondary structure from amino acid sequence. Methods Enzymol 266:540–553 Gelperin D, Horton L, Beckman J, Hensold J, Lemmon SK (2001) Bms1p, a novel GTP-binding protein, and the related Tsr1p are required for distinct steps of 40 S ribosome biogenesis in yeast. RNA 7:1268–1283 Gerbi SA (1996) Expansion segments: Regions of variable size that interrupt the universal core secondary structure of ribosomal RNA. In: Zimmermann RA, Dahlberg AE (eds) Ribosomal RNA: Structure, evolution, processing, and function in protein biosynthesis. CRL, Boca Raton, pp 71–87 Goldman N (1993) Statistical tests of models of DNA substitution. J Mol Evol 36:182–198 Goldman N, Yang ZH (1994) A codon-based model of nucleotide substitution for protein coding DNA sequences. Mol Biol Evol 11:725–736 Gutell RR (1993) Collection of small subunit (16 S and 16 S-like) ribosomal RNA structures. Nucleic Acids Res 21:3051–3054 Gutell RR, Weiser B, Woese CR, Noller HF (1985) Comparative anatomy of 16 S-like ribosomal RNA. Prog Nucleic Acid Res Mol Biol 32:155–216 Gutell RR, Gray MW, Schnare MN (1993) A compilation of large subunit (23 S and 23 S-like) ribosomal RNA structures. Nucleic Acids Res 21:3055–3074 Gutell RR, Larsen N, Woese CR (1994) Lessons from an evolving rRNA: 16 S and 23 S rRNA structures from a comparative perspective. Microbiol Rev 58:10–26 Gutell RR, Lee JC, Cannone JJ (2002) The accuracy of ribosomal RNA comparative structure models. Curr Opin Struct Biol 12:301–310 Hall TA (1999) BioEdit 5.0.9: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp Ser 41:95–98

Hall BG (2011) Phylogenetic trees made easy. A How-to manual for molecular biologists, 4th edn, Sinauer, Sunderland Harrison CJ, Langdale JA (2006) A step by step guide to phylogeny reconstruction. Plant J 45:561–572 Hassouna N, Michot B, Bachellerie JP (1984) The complete nucleotide sequence of mouse 28 S rRNA gene. Implications for the process of size increase of the large subunit rRNA in higher eukaryotes. Nucleic Acids Res 12:3563–3583 Hibbett DS, Binder M, Bischoff JF et al (2007) A higher-level phylogenetic classification of the Fungi. Mycol Res 111:509–547 Hughey R, Krogh A (1996) Hidden Markov models for sequence analysis: extension and analysis of the basic method. Cabios 12:95–107 James TY, Kauff F, Schoch C et al (2006) Reconstructing the early evolution of Fungi using a six-gene phylogeny. Nature 443:818– 822 Karplus K, Barrett C, Hughey R (1998) Hidden Markov models for detecting remote protein homologies. Bioinformatics 14:846–856 Katoh K, Misawa K, Kuma K, Miyata T (2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 30:3059–3066 Katoh K, Kuma K, Toh H, Miyata T (2005) MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res 33:511–518 Kearsey SE, Labib K (1998) MCM proteins: evolution, properties, and role in DNA replication. Biochim Biophys Acta 1398:113– 136 Krüger D, Gargas A (2004) The basidiomycete genus Polyporus - an emendation based on phylogeny and putative secondary structure of ribosomal RNA molecules. Feddes Repert 115:530–546 Krüger D, Gargas A (2008) Secondary structure of ITS2 rRNA provides taxonomic characters for systematic studies - a case in Lycoperdaceae (Basidiomycota). Mycol Res 112:316–330 Larkin MA, Blackshields G, Brown NP, Duenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, Higgins DG (2007) Clustal W and Clustal X version 2.0. Bioinformatics 23:2947–2948 Larson A, Wilson AC (1989) Patterns of ribosomal RNA evolution in salamanders. Mol Biol Evol 6:131–154 Leontis NB, Stombaugh J, Westhof E (2002) The non-Watson-Crick base pairs and their associated isostericity matrices. Nucleic Acids Res 30:3497–3531 Lescoute A, Leontis NB, Massire C, Westhof E (2005) Recurrent structural RNA motifs, isostericity matrices and sequence alignments. Nucleic Acids Res 33:2395–2409 Liu N, Wang T (2006) A computational method for the similarity analysis of RNA secondary structures and its application. J Mol Struct: THEOCHEM 767:185–188 Lumbsch TH, Huhndorf SM (2010) 1. Outline of Ascomycota - 2009. 2. Notes on Ascomycete Systematics. Nos. 4751−5113. Myconet 14:1−64 (Fieldiana: Life and Earth Sciences 1) Lutzoni F, Kauff F, Cox JC et al (2004) Assembling the fungal tree of life: progress, classification, and evolution of subcellular traits. Am J Bot 91:1446–1480 Mathews DH, Sabina J, Zuker M, Turner DH (1999) Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J Mol Biol 288:911–940 Michot B, Qu LH, Bachellerie JP (1990) Evolution of largesubunit rRNA structure. The diversification of divergent D3 domain among major phylogenetic groups. Eur J Biochem 188:219–229 Moir D, Stewart SE, Osmond BC, Botstein D (1982) Cold-sensitive cell-division-cycle mutants of yeast: isolation, properties, and pseudoreversion studies. Genetics 100:547–563 Notredame C (2002) Recent progress in multiple sequence alignment: a survey. Pharmacogenomics 3:131–144

Mycol Progress Nylander JA, Ronquist F, Huelsenbeck JP, Nieves-Aldrey JL (2004) Bayesian phylogenetic analysis of combined data. Syst Biol 53:47–67 Olsen GJ (1983) Comparative analysis of nucleotide sequence data. Dissertation, University of Colorado, Health Science Center, Denver Piekna-Przybylska D, Przybylski P, Baudin-Baillieu AS, Rousset JP, Fournier MJ (2008) Ribosome performance is enhanced by a rich cluster of pseudouridines in the A-site finger region of the large subunit. J Biol Chem 283:26026–26036 Posada D, Crandall KA (2001) Selecting the best-fit model of nucleotide substitution. Syst Biol 50:580–601 Pruesse E, Quasi C, Knittel K, Fuchs BM, Ludwig W, Peplies J, Glöckner FO (2007) SILVA: a comprehensive online resource for duality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Res 35:7188–7196 Raja HA, Schoch CL, Hustad VP, Shearer CA, Miller AN (2011) Testing the phylogenetic utility of MCM7 in the Ascomycota. MycoKeys 1:63–94 Réblová M (2011) New insights into the systematics and phylogeny of the genus Jattaea and similar fungi of the Calosphaeriales. Fungal Diver 49:167–198 Réblová M, Winka K (2000) Phylogeny of Chaetosphaeria and its anamorphs based on morphological and molecular data. Mycologia 92:939–954 Réblová M, Mostert L, Gams W, Crous PW (2004) New genera in the Calosphaeriales: Togniniella and its anamorph Phaeocrella, and Calosphaeriophora as anamorph of Calosphaeria. Stud Mycol 50:533–550 Réblová K, Rázga F, Li W, Gao H, Frank J, Šponer J (2010) Dynamics of the base of ribosomal A-site finger revealed by molecular dynamics simulations and cryo-EM. Nucleic Acids Res 38:1325–1340 Réblová M, Gams W, Seifert KA (2011) Monilochaetes and allied genera of the Glomerellales, and a reconsideration of families in the Microascales. Stud Mycol 68:163–191 Schmitt I, Crespo A, Divakar PK, Fankhauser JD, Herman-Sackett E, Kalb K, Nelsen MP, Nelson NA, Rivas-Plata E, Shimp AD, Widhelm T, Lumbsch TH (2009) New primers for promising single-copy genes in fungal phylogenetics and systematics. Persoonia 23:35–40 Schoch CL, Sung G-H, López-Giráldez F et al (2009) The Ascomycota tree of life: a phylum wide phylogeny clarifies the origin and evolution of fundamental reproductive and ecological traits. Syst Biol 58:224–239 Schuster P, Fontana W, Stadler PF, Renner A (1997) RNA structures and folding: from conventional to new issues in structure predictions. Curr Opin Struct Biol 7:229–235 Shapiro B, Rambaut A, Drummond AJ (2006) Choosing appropriate substitution models for the phylogenetic analysis of proteincoding sequences. Mol Biol Evol 23:7–9 Spatafora JW, Johnson D, Sung G-H et al (2006) A five-gene phylogenetic analysis of the Pezizomycotina. Mycologia 98:1018–1028 Stamatakis A (2006) RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22:2688–2690 Stamatakis A, Ludwig T, Meier H (2005) RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees. Bioinformatics 21:456–463

Sullivan J, Joyce P (2005) Model selection in phylogenetics. Annu Rev Ecol Evol Syst 36:445–466 Swofford DL (2002) PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods), 4th version. Sinauer, Sunderland Takamatsu S, Hirata T, Sato Y (1998) Phylogenetic analysis and predicted secondary structures of the rDNA internal transcribed spacers of the powdery mildew fungi (Erysiphaceae). Mycoscience 39:441–453 Talavera G, Castresana J (2007) Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst Biol 56:564–577 Taylor JW, Spatafora J, O’Donnell K, Lutzoni F, James T, Hibbett DS, Geiser D, Bruns TD, Blackwell M (2004) The Fungi. In: Cracraft J, Donoghue MJ (eds) Assembling the tree of life. Oxford University Press, Oxford, pp 171–194 Taylor JD, Devkota B, Huang AD, Topf M, Narayanan E, Sali A, Harvey SC, Frank J (2009) Comprehensive molecular structure of the eukaryotic ribosome. Structure 17:1591–1604 Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680 Van de Peer Y, Jansen J, De Rijk P, De Wachter R (1997) Database on the structure of small ribosomal subunit RNA. Nucleic Acids Res 25:111–116 Walker WF, Doolittle WF (1982) Redividing the basidiomycetes on the basis of 5 S rRNA sequences. Nature 299:723–724 Weiss M (2010) Molecular Phylogenetic Reconstruction. Tübingen University, Tübingen Wheeler WC, Honeycutt RL (1988) Paired sequence difference in ribosomal RNAs: evolutionary and phylogenetic implications. Mol Biol Evol 5:90–96 Whelan S, Lio P, Goldman N (2001) Molecular phylogenetics: stateof-the-art methods for looking into the past. Trends Genet 17:262–272 White TJ, Bruns T, Lee S, Taylor J (1990) Amplification and direct sequencing of fungal ribosomal RNA genes for phylogenetics. In: Innis MA, Gelfand DH, Sninsky JJ, White TJ (eds) PCR protocols: a guide to methods and applications. Academic, San Diego, pp 315–322 Wingfield BD, Viljoen CD, Wingfield MJ (1999) Phylogenetic relationships of ophiostomatoid fungi associated with Protea infructescences in South Africa. Mycol Res 103:1616–1620 Woese CR, Kandler O, Wheelis ML (1990) Towards a natural system of organisms: proposal for the domains archea, bacteria, and eucarya. Proc Natl Acad Sci USA 87:4576–4579 Wuyts J, Van de Peer Y, Winkelmans T, De Wachter R (2002) The European database on small subunit ribosomal RNA. Nucleic Acids Res 30:183–185 Yusupov MM, Yusupova GZ, Baucom A, Lieberman K, Earnest TN, Cate JHD, Noller HF (2001) Crystal structure of the ribosome at 5.5 Ǻngstrøm resolution. Science 292:883–896 Zhang N, Castlebury LA, Miller AN, Huhndorf SM, Schoch C, Seifert KA, Rossman AM, Rogers JD, Kohlmeyer J, VolkmannKohlmeyer B, Sung GH (2006) An overview of the systematics of the Sordariomycetes based on four-gene phylogeny. Mycologia 98:1077–1088

RNA secondary structure, an important bioinformatics tool to enhance multiple sequence alignment: a case study (Sordariomycetes, Fungi)

Recommend Documents