MLBio+Laboratory Machine Learning in Biomedical Informatics

BMC Genomics

Syndicate content
The latest research articles published by BMC Genomics
Updated: 1 year 2 weeks ago

RNAseq versus genome-predicted transcriptomes: a large population of novel transcripts identified in an Illumina-454 Hydra transcriptome

Sun, 03/24/2013 - 20:00
Background: Evolutionary studies benefit from deep sequencing technologies that generate genomic and transcriptomic sequences from a variety of organisms. Genome sequencing and RNAseq have complementary strengths. In this study, we present the assembly of the most complete Hydra transcriptome to date along with a comparative analysis of the specific features of RNAseq and genome-predicted transcriptomes currently available in the freshwater hydrozoan Hydra vulgaris. Results: To produce an accurate and extensive Hydra transcriptome, we combined Illumina and 454 Titanium reads, giving the primacy to Illumina over 454 reads to correct homopolymer errors. This strategy yielded an RNAseq transcriptome that contains 48'909 unique sequences including splice variants, representing approximately 24'450 distinct genes. Comparative analysis to the available genome-predicted transcriptomes identified 10'597 novel Hydra transcripts that encode 529 evolutionarily-conserved proteins. The annotation of 170 human orthologs points to critical functions in protein biosynthesis, FGF and TOR signaling, vesicle transport, immunity, cell cycle regulation, cell death, mitochondrial metabolism, transcription and chromatin regulation. However, a majority of these novel transcripts encodes short ORFs, at least 767 of them corresponding to pseudogenes. This RNAseq transcriptome also lacks 11'270 predicted transcripts that correspond either to silent genes or to genes expressed below the detection level of this study. Conclusions: We established a simple and powerful strategy to combine Illumina and 454 reads and we produced, with genome assistance, an extensive and accurate Hydra transcriptome. The comparative analysis of the RNAseq transcriptome with genome-predicted transcriptomes lead to the identification of large populations of novel as well as missing transcripts that might reflect Hydra-specific evolutionary events.

Patterns and evolution of ACGT repeat cis-element landscape across four plant genomes

Sun, 03/24/2013 - 20:00
Background: Transcription factor binding is regulated by several interactions, primarily involving cis-element binding. These binding sites maintain specificity by means of their sequence, and other additional factors such as inter-motif distance and spacer specificity. The ACGT core sequence has been established as a functionally important cis-element which frequently regulates gene expression in synergy with other cis-elements. In this study, we used two monocotyledonous -- Oryza sativa and Sorghum bicolor, and two dicotyledonous species -- Arabidopsis thaliana and Glycine max to analyze the conservation of co-occurring ACGT core elements in plant promoters with respect to spacer distance between them. Using data generated from Arabidopsis thaliana and Oryza sativa, we also identified conserved regions across all spacers and possible conditions regulating gene promoters with multiple ACGT cis-elements. Results: Our data indicated specific predominant spacer lengths between co-occurring ACGT elements, but these lengths were not universally conserved across all species under analysis. However, the frequency distribution indicated local regions of high correlation among monocots and dicots. Sequence specificity data clearly revealed a preference for G at the first and C at the terminal position of a spacer sequence, suggesting that the G-box motif is the most prevalent for the ACGT class of promoters. Using gene expression databases, we also observed trends suggesting that co-occurring ACGT elements are responsible for gene regulation in response to exogenous stress. Conservation in patterns of ACGT (N) ACGT among orthologous genes also indicated the possibility that emergence of functional significance across species was a result of parallel evolution of these cis-elements. Conclusions: Although the importance of ACGT elements has been acknowledged for several plant species, ours is the first study that attempts to compare their occurrence across four species and analyze conservation among them. The apparent preference for particular spacer distances suggest that these motifs might be implicated in important physiological functions which are yet to be identified. Variations in correlation patterns among monocots and dicots might arise out of differences in transcriptional regulation in the two classes. In accordance with literature, we established the involvement of co-occurring ACGT elements in stress responses and showed how this regulation differs with variation in the ACGT (N) ACGT motif. We believe that our study will be an essential resource in determining optimum spacer length and spacer sequence between ACGT elements for promoter design in future.

The genome and transcriptome of perennial ryegrass mitochondria

Fri, 03/22/2013 - 20:00
Background: Perennial ryegrass (Lolium perenne L.) is one of the most important forage and turf grass species of temperate regions worldwide. Its mitochondrial genome is inherited maternally and contains genes that can influence traits of agricultural importance. Moreover, the DNA sequence of mitochondrial genomes has been established and compared for a large number of species in order to characterize evolutionary relationships. Therefore, it is crucial to understand the organization of the mitochondrial genome and how it varies between and within species. Here, we report the first de novo assembly and annotation of the complete mitochondrial genome from perennial ryegrass. Results: Intact mitochondria from perennial ryegrass leaves were isolated and used for mtDNA extraction. The mitochondrial genome was sequenced to a 167-fold coverage using the Roche 454 GS-FLX Titanium platform, and assembled into a circular master molecule of 678,580 bp. A total of 34 proteins, 14 tRNAs and 3 rRNAs are encoded by the mitochondrial genome, giving a total gene space of 48,723 bp (7.2%). Moreover, we identified 149 open reading frames larger than 300 bp and covering 67,410 bp (9.93%), 250 SSRs, 29 tandem repeats, 5 pairs of large repeats, and 96 pairs of short inverted repeats. The genes encoding subunits of the respiratory complexes -- nad1 to nad9, cob, cox1 to cox3 and atp1 to atp9 -- all showed high expression levels both in absolute numbers and after normalization. Conclusions: The circular master molecule of the mitochondrial genome from perennial ryegrass presented here constitutes an important tool for future attempts to compare mitochondrial genomes within and between grass species. Our results also demonstrate that mitochondria of perennial ryegrass contain genes crucial for energy production that are well conserved in the mitochondrial genome of monocotyledonous species. The expression analysis gave us first insights into the transcriptome of these mitochondrial genes in perennial ryegrass.

Xylem transcription profiles indicate potential metabolic responses for economically relevant characteristics of Eucalyptus species

Thu, 03/21/2013 - 20:00
Background: Eucalyptus is one of the most important sources of industrial cellulose. Three species of this botanical group are intensively used in breeding programs: E. globulus, E. grandis and E. urophylla. E. globulus is adapted to subtropical/temperate areas and is considered a source of high-quality cellulose; E. grandis grows rapidly and is adapted to tropical/subtropical climates; and E. urophylla, though less productive, is considered a source of genes related to robustness. Wood, or secondary xylem, results from cambium vascular differentiation and is mostly composed of cellulose, lignin and hemicelluloses. In this study, the xylem transcriptomes of the three Eucalyptus species were investigated in order to provide insights on the particularities presented by each of these species. Results: Data analysis showed that (1) most Eucalyptus genes are expressed in xylem; (2) most genes expressed in species-specific way constitutes genes with unknown functions and are interesting targets for future studies; (3) relevant differences were observed in the phenylpropanoid pathway: E. grandis xylem presents higher expression of genes involved in lignin formation whereas E. urophylla seems to deviates the pathway towards flavonoid formation; (4) stress-related genes are considerably more expressed in E. urophylla, suggesting that these genes may contribute to its robustness. Conclusions: The comparison of these three transcriptomes indicates the molecular signatures underlying some of their distinct wood characteristics. This information may contribute to the understanding of xylogenesis, thus increasing the potential of genetic engineering approaches aiming at the improvement of Eucalyptus forest plantations productivity.

Comparative genome analysis of Streptococcus infantarius subsp. infantarius CJ18, an African fermented camel milk isolate with adaptations to dairy environment

Thu, 03/21/2013 - 20:00
Background: Streptococcus infantarius subsp. infantarius (Sii) belongs to the Streptococcus bovis/Streptococcus equinus complex associated with several human and animal infections. Sii is a predominant bacterium in spontaneously fermented milk products in Africa. The genome sequence of Sii strain CJ18 was compared with that of other Streptococcus species to identify dairy adaptations including genome decay such as in Streptococcus thermophilus, traits for its competitiveness in spontaneous milk fermentation and to assess potential health risks for consumers. Results: The genome of Sii CJ18 harbors several unique regions in comparison to Sii ATCC BAA-102T, among others an enlarged exo- and capsular polysaccharide operon; Streptococcus thermophilus-associated genes; a region containing metabolic and hypothetical genes mostly unique to CJ18 and the dairy isolate Streptococcus gallolyticus subsp. macedonicus; and a second oligopeptide transport operon. Dairy adaptations in CJ18 are reflected by a high percentage of pseudogenes (4.9%) representing genome decay which includes the inactivation of the lactose phosphotransferase system (lacIIABC) by multiple transposases integration. The presence of lacS and lacZ genes is the major dairy adaptation affecting lactose metabolism pathways also due to the disruption of of lacIIABC.We constructed mutant strains of lacS, lacZ and lacIIABC and analyzed the resulting strains of CJ18 to confirm the redirection of lactose metabolism via LacS and LacZ.Natural competence genes are conserved in both Sii strains, but CJ18 contains a lower number of CRISPR spacers which indicates a reduced defense capability against alien DNA. No classical streptococcal virulence factors were detected in both Sii strains apart from those involved in adhesion which should be considered niche factors. Sii-specific virulence factors are not described. Several Sii-specific regions encoding uncharacterized proteins provide new leads for virulence analyses and investigation of the unclear association of dairy and clinical Sii with human diseases. Conclusions: The genome of the African dairy isolate Sii CJ18 clearly differs from the human isolate ATCC BAA-102T. CJ18 possesses a high natural competence predisposition likely explaining the enlarged genome. Metabolic adaptations to the dairy environment are evident and especially lactose uptake corresponds to S. thermophilus. Genome decay is not as advanced as in S. thermophilus (10-19%) possibly due to a shorter history in dairy fermentations.

Intron retention and transcript chimerism conserved across mammals: Ly6g5b and Csnk2b-Ly6g5b as examples

Thu, 03/21/2013 - 20:00
Background: Alternative splicing (AS) is a major mechanism for modulating gene expression of an organism, allowing the synthesis of several structurally and functionally distinct mRNAs and protein isoforms from a unique gene. Related to AS is the Transcription Induced Chimerism (TIC) or Tandem Chimerism, by which chimeric RNAs between adjacent genes can be found, increasing combinatorial complexity of the proteome. The Ly6g5b gene presents particular behaviours in its expression, involving an intron retention event and being capable to form RNA chimera transcripts with the upstream gene Csnk2b. We wanted to characterise these events more deeply in four tissues in six different mammals and analyse their protein products. Results: While canonical Csnk2b isoform was widely expressed, Ly6g5b canonical isoform was less ubiquitous, although the Ly6g5b first intron retained transcript was present in all the tissues and species analysed. Csnk2b-Ly6g5b chimeras were present in all the samples analysed, but with restricted expression patterns. Some of these chimeric transcripts maintained correct structural domains from Csnk2b and Ly6g5b. Moreover, we found Csnk2b, Ly6g5b, and Csnk2b-Ly6g5b transcripts that present exon skipping, alternative 5' and 3' splice site and intron retention events. These would generate truncated or aberrant proteins whose role remains unknown. Some chimeric transcripts would encode CSNK2B proteins with an altered C-terminus, which could affect its biological function broadening its substrate specificity. Over-expression of human CSNK2B, LY6G5B, and CSNK2B-LY6G5B proteins, show different patterns of post-translational modifications and cell distribution. Conclusions: Ly6g5b intron retention and Csnk2b-Ly6g5b transcript chimerism are broadly distributed in tissues of different mammals.

Antennal transcriptome analysis of the chemosensory gene families in the tree killing bark beetles, Ips typographus and Dendroctonus ponderosae (Coleoptera: Curculionidae: Scolytinae)

Wed, 03/20/2013 - 20:00
Background: The European spruce bark beetle, Ips typographus, and the North American mountain pine beetle, Dendroctonus ponderosae (Coleoptera: Curculionidae: Scolytinae), are severe pests of coniferous forests. Both bark beetle species utilize aggregation pheromones to coordinate mass-attacks on host trees, while odorants from host and non-host trees modulate the pheromone response. Thus, the bark beetle olfactory sense is of utmost importance for fitness. However, information on the genes underlying olfactory detection has been lacking in bark beetles and is limited in Coleoptera. We assembled antennal transcriptomes from next-generation sequencing of I. typographus and D. ponderosae to identify members of the major chemosensory multi-gene families. Results: Gene ontology (GO) annotation indicated that the relative abundance of transcripts associated with specific GO terms was highly similar in the two species. Transcripts with terms related to olfactory function were found in both species. Focusing on the chemosensory gene families, we identified 15 putative odorant binding proteins (OBP), 6 chemosensory proteins (CSP), 3 sensory neuron membrane proteins (SNMP), 43 odorant receptors (OR), 6 gustatory receptors (GR), and 7 ionotropic receptors (IR) in I. typographus; and 31 putative OBPs, 11 CSPs, 3 SNMPs, 49 ORs, 2 GRs, and 15 IRs in D. ponderosae. Predicted protein sequences were compared with counterparts in the flour beetle, Tribolium castaneum, the cerambycid beetle, Megacyllene caryae, and the fruit fly, Drosophila melanogaster. The most notable result was found among the ORs, for which large bark beetle-specific expansions were found. However, some clades contained receptors from all four beetle species, indicating a degree of conservation among some coleopteran OR lineages. Putative GRs for carbon dioxide and orthologues for the conserved antennal IRs were included in the identified receptor sets. Conclusions: The protein families important for chemoreception have now been identified in three coleopteran species (four species for the ORs). Thus, this study allows for improved evolutionary analyses of coleopteran olfaction. Identification of these proteins in two of the most destructive forest pests, sharing many semiochemicals, is especially important as they might represent novel targets for population control.

Transcriptome-based discovery of pathways and genes related to resistance against Fusarium head blight in wheat landrace Wangshuibai

Wed, 03/20/2013 - 20:00
Background: Fusarium head blight (FHB), caused mainly by Fusarium graminearum (Fg) Schwabe (teleomorph: Gibberellazeae Schwble), brings serious damage to wheat production. Chinese wheat landrace Wangshuibai is one of the most important resistance sources in the world. The knowledge of mechanism underlying its resistance to FHB is still limited. Results: To get an overview of transcriptome characteristics of Wangshuibai during infection by Fg, a high-throughput RNA sequencing based on next generation sequencing (NGS) technology (Illumina) were performed. Totally, 165,499 unigenes were generated and assigned to known protein databases including NCBI non-redundant protein database (nr) (82,721, 50.0%), Gene Ontology (GO) (38,184, 23.1%), Swiss-Prot (50,702, 30.6%), Clusters of orthologous groups (COG) (51,566, 31.2%) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) (30,657, 18.5%), as determined by Blastx search. With another NGS based platform, a digital gene expression (DGE) system, gene expression in Wangshuibai and its FHB susceptible mutant NAUH117 was profiled and compared at two infection stages by inoculation of Fg at 24 and 48 hour, with the aim of identifying genes involved in FHB resistance. Conclusion: Pathogen-related proteins such as PR5, PR14 and ABC transporter and JA signaling pathway were crucial for FHB resistance, especially that mediated by Fhb1. ET pathway and ROS/NO pathway were not activated in Wangshuibai and may be not pivotal in defense to FHB. Consistent with the fact that in NAUH117 there presented a chromosome fragment deletion, which led to its increased FHB susceptibility, in Wangshuibai, twenty out of eighty-nine genes showed changed expression patterns upon the infection of Fg. The up-regulation of eight of them was confirmed by qRT-PCR, revealing they may be candidate genes for Fhb1 and need further functional analysis to confirm their roles in FHB resistance.

Structured association analysis leads to insight into Saccharomyces cerevisiae gene regulation by finding multiple contributing eQTL hotspots associated with functional gene modules

Wed, 03/20/2013 - 20:00
Background: Association analysis using genome-wide expression quantitative trait locus (eQTL) data investigates the effect that genetic variation has on cellular pathways and leads to the discovery of candidate regulators. Traditional analysis of eQTL data via pairwise statistical significance tests or linear regression does not leverage the availability of the structural information of the transcriptome, such as presence of gene networks that reveal correlation and potentially regulatory relationships among the study genes. We employ a new eQTL mapping algorithm, GFlasso, which we have previously developed for sparse structured regression, to reanalyze a genome-wide yeast dataset. GFlasso fully takes into account the dependencies among expression traits to suppress false positives and to enhance the signal/noise ratio. Thus, GFlasso leverages the gene-interaction network to discover the pleiotropic effects of genetic loci that perturb the expression level of multiple (rather than individual) genes, which enables us to gain more power in detecting previously neglected signals that are marginally weak but pleiotropically significant. Results: While eQTL hotspots in yeast have been reported previously as genomic regions controlling multiple genes, our analysis reveals additional novel eQTL hotspots and, more interestingly, uncovers groups of multiple contributing eQTL hotspots that affect the expression level of functional gene modules. To our knowledge, our study is the first to report this type of gene regulation stemming from multiple eQTL hotspots. Additionally, we report the results from in-depth bioinformatics analysis for three groups of these eQTL hotspots: ribosome biogenesis, telomere silencing, and retrotransposon biology. We suggest candidate regulators for the functional gene modules that map to each group of hotspots. Not only do we find that many of these candidate regulators contain mutations in the promoter and coding regions of the genes, in the case of the Ribi group, we provide experimental evidence suggesting that the identified candidates do regulate the target genes predicted by GFlasso. Conclusions: Thus, this structured association analysis of a yeast eQTL dataset via GFlasso, coupled with extensive bioinformatics analysis, discovers a novel regulation pattern between multiple eQTL hotspots and functional gene modules. Furthermore, this analysis demonstrates the potential of GFlasso as a powerful computational tool for eQTL studies that exploit the rich structural information among expression traits due to correlation, regulation, or other forms of biological dependencies.

5'-Serial Analysis of Gene Expression studies reveal a transcriptomic switch during fruiting body development in Coprinopsis cinerea

Tue, 03/19/2013 - 20:00
Background: The transition from the vegetative mycelium to the primordium during fruiting body development is the most complex and critical developmental event in the life cycle of many basidiomycete fungi. Understanding the molecular mechanisms underlying this process has long been a goal of research on basidiomycetes. Large scale assessment of the expressed transcriptomes of these developmental stages will facilitate the generation of a more comprehensive picture of the mushroom fruiting process. In this study, we coupled 5'-Serial Analysis of Gene Expression (5'-SAGE) to high-throughput pyrosequencing from 454 Life Sciences to analyze the transcriptomes and identify up-regulated genes among vegetative mycelium (Myc) and stage 1 primordium (S1-Pri) of Coprinopsis cinerea during fruiting body development. Results: We evaluated the expression of >3,000 genes in the two respective growth stages and discovered that almost one-third of these genes were preferentially expressed in either stage. This identified a significant turnover of the transcriptome during the course of fruiting body development. Additionally, we annotated more than 79,000 transcription start sites (TSSs) based on the transcriptomes of the mycelium and stage 1 primoridum stages. Patterns of enrichment based on gene annotations from the GO and KEGG databases indicated that various structural and functional protein families were uniquely employed in either stage and that during primordial growth, cellular metabolism is highly up-regulated. Various signaling pathways such as the cAMP-PKA, MAPK and TOR pathways were also identified as up-regulated, consistent with the model that sensing of nutrient levels and the environment are important in this developmental transition. More than 100 up-regulated genes were also found to be unique to mushroom forming basidiomycetes, highlighting the novelty of fruiting body development in the fungal kingdom. Conclusions: We implicated a wealth of new candidate genes important to early stages of mushroom fruiting development, though their precise molecular functions and biological roles are not yet fully known. This study serves to advance our understanding of the molecular mechanisms of fruiting body development in the model mushroom C. cinerea.

Powered by Drupal, an open source content management system