Bioinformatics Journal

Bioinformatics - RSS feed of current issue
  • MSIsensor: microsatellite instability detection using paired tumor-normal sequence data
    [Mar 2014]

    Motivation: Microsatellite instability (MSI) is an important indicator of larger genome instability and has been linked to many genetic diseases, including Lynch syndrome. MSI status is also an independent prognostic factor for favorable survival in multiple cancer types, such as colorectal and endometrial. It also informs the choice of chemotherapeutic agents. However, the current PCR–electrophoresis-based detection procedure is laborious and time-consuming, often requiring visual inspection to categorize samples. We developed MSIsensor, a C++ program for automatically detecting somatic microsatellite changes. It computes length distributions of microsatellites per site in paired tumor and normal sequence data, subsequently using these to statistically compare observed distributions in both samples. Comprehensive testing indicates MSIsensor is an efficient and effective tool for deriving MSI status from standard tumor-normal paired sequence data.

    Availability and implementation: https://github.com/ding-lab/msisensor

    Contact: kye@genome.wustl.edu or lding@genome.wustl.edu

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • ClockstaR: choosing the number of relaxed-clock models in molecular phylogenetic analysis
    [Mar 2014]

    Summary: Relaxed molecular clocks allow the phylogenetic estimation of evolutionary timescales even when substitution rates vary among branches. In analyses of large multigene datasets, it is often appropriate to use multiple relaxed-clock models to accommodate differing patterns of rate variation among genes. We present ClockstaR, a method for selecting the number of relaxed clocks for multigene datasets.

    Availability: ClockstaR is freely available for download at http://sydney.edu.au/science/biology/meep/software/.

    Contact: sebastian.duchene@sydney.edu.au

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • Site-heterogeneous mutation-selection models within the PhyloBayes-MPI package
    [Mar 2014]

    Motivation: In recent years, there has been an increasing interest in the potential of codon substitution models for a variety of applications. However, the computational demands of these models have sometimes lead to the adoption of oversimplified assumptions, questionable statistical methods or a limited focus on small data sets.

    Results: Here, we offer a scalable, message-passing-interface-based Bayesian implementation of site-heterogeneous codon models in the mutation-selection framework. Our software jointly infers the global mutational parameters at the nucleotide level, the branch lengths of the tree and a Dirichlet process governing across-site variation at the amino acid level. We focus on an example estimation of the distribution of selection coefficients from an alignment of several hundred sequences of the influenza PB2 gene, and highlight the site-specific characterization enabled by such a modeling approach. Finally, we discuss future potential applications of the software for conducting evolutionary inferences.

    Availability and implementation: The models are implemented within the PhyloBayes-MPI package, (available at phylobayes.org) along with usage details in the accompanying manual.

    Contact: nicolas.rodrigue@ucalgary.ca

    Categories: Journal Articles
  • ASSIST: a fast versatile local structural comparison tool
    [Mar 2014]

    Motivation: Structural genomics initiatives are increasingly leading to the determination of the 3D structure of target proteins whose catalytic function is not known. The aim of this work was that of developing a novel versatile tool for searching structural similarity, which allows to predict the catalytic function, if any, of these proteins.

    Results: The algorithm implemented by the tool is based on local structural comparison to find the largest subset of similar residues between an input protein and known functional sites. The method uses a geometric hashing approach where information related to residue pairs from the input structures is stored in a hash table and then is quickly retrieved during the comparison step. Tests on proteins belonging to different functional classes, done using the Catalytic Site Atlas entries as targets, indicate that the algorithm is able to identify the correct functional class of the input protein in the vast majority of the cases.

    Availability and implementation: The application was developed in Java SE 6, with a Java Swing Graphic User Interface (GUI). The system can be run locally on any operating system (OS) equipped with a suitable Java Virtual Machine, and is available at the following URL: http://www.computationalbiology.it/software/ASSISTv1.zip.

    Contact: polticel@uniroma3.it

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • SplicePlot: a utility for visualizing splicing quantitative trait loci
    [Mar 2014]

    Summary: RNA sequencing has provided unprecedented resolution of alternative splicing and splicing quantitative trait loci (sQTL). However, there are few tools available for visualizing the genotype-dependent effects of splicing at a population level. SplicePlot is a simple command line utility that produces intuitive visualization of sQTLs and their effects. SplicePlot takes mapped RNA sequencing reads in BAM format and genotype data in VCF format as input and outputs publication-quality Sashimi plots, hive plots and structure plots, enabling better investigation and understanding of the role of genetics on alternative splicing and transcript structure.

    Availability and implementation: Source code and detailed documentation are available at http://montgomerylab.stanford.edu/spliceplot/index.html under Resources and at Github. SplicePlot is implemented in Python and is supported on Linux and Mac OS. A VirtualBox virtual machine running Ubuntu with SplicePlot already installed is also available.

    Contact: wu.eric.g@gmail.com or smontgom@stanford.edu

    Categories: Journal Articles
  • RelateAdmix: a software tool for estimating relatedness between admixed individuals
    [Mar 2014]

    Motivation: Pairwise relatedness plays an important role in a range of genetic research fields. However, currently only few estimators exist for individuals that are admixed, i.e. have ancestry from more than one population, and these estimators fail in some situations.

    Results: We present a new software tool, RelateAdmix, for obtaining maximum likelihood estimates of pairwise relatedness from genetic data between admixed individuals. We show using simulated data that it gives rise to better estimates than three state-of-the-art software tools, REAP, KING and Plink, while still being fast enough to be applicable to large datasets.

    Availability and implementation: The software tool, implemented in C and R, is freely available from www.popgen.dk/software.

    Contact: albrecht@binf.ku.dk

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • JEPETTO: a Cytoscape plugin for gene set enrichment and topological analysis based on interaction networks
    [Mar 2014]

    Summary: JEPETTO (Java Enrichment of Pathways Extended To TOpology) is a Cytoscape 3.x plugin performing integrative human gene set analysis. It identifies functional associations between genes and known cellular pathways, and processes using protein interaction networks and topological analysis. The plugin integrates information from three separate web servers we published previously, specializing in enrichment analysis, pathways expansion and topological matching. This integration substantially simplifies the analysis of user gene sets and the interpretation of the results. We demonstrate the utility of the JEPETTO plugin on a set of misregulated genes associated with Alzheimer’s disease.

    Availability: Source code and binaries are freely available for download at http://apps.cytoscape.org/apps/jepetto, implemented in Java and multi-platform. Installable directly via Cytoscape plugin manager. Released under the GNU General Public Licence.

    Contact: jepetto.plugin@gmail.com

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • GPU-Meta-Storms: computing the structure similarities among massive amount of microbial community samples using GPU
    [Mar 2014]

    Motivation: The number of microbial community samples is increasing with exponential speed. Data-mining among microbial community samples could facilitate the discovery of valuable biological information that is still hidden in the massive data. However, current methods for the comparison among microbial communities are limited by their ability to process large amount of samples each with complex community structure.

    Summary: We have developed an optimized GPU-based software, GPU-Meta-Storms, to efficiently measure the quantitative phylogenetic similarity among massive amount of microbial community samples. Our results have shown that GPU-Meta-Storms would be able to compute the pair-wise similarity scores for 10 240 samples within 20 min, which gained a speed-up of >17 000 times compared with single-core CPU, and >2600 times compared with 16-core CPU. Therefore, the high-performance of GPU-Meta-Storms could facilitate in-depth data mining among massive microbial community samples, and make the real-time analysis and monitoring of temporal or conditional changes for microbial communities possible.

    Availability and implementation: GPU-Meta-Storms is implemented by CUDA (Compute Unified Device Architecture) and C++. Source code is available at http://www.computationalbioenergy.org/meta-storms.html.

    Contact: ningkang@qibebt.ac.cn

    Categories: Journal Articles
  • Ondex Web: web-based visualization and exploration of heterogeneous biological networks
    [Mar 2014]

    Summary: Ondex Web is a new web-based implementation of the network visualization and exploration tools from the Ondex data integration platform. New features such as context-sensitive menus and annotation tools provide users with intuitive ways to explore and manipulate the appearance of heterogeneous biological networks. Ondex Web is open source, written in Java and can be easily embedded into Web sites as an applet. Ondex Web supports loading data from a variety of network formats, such as XGMML, NWB, Pajek and OXL.

    Availability and implementation: http://ondex.rothamsted.ac.uk/OndexWeb.

    Contact: keywan.hassani-pak@rothamsted.ac.uk

    Categories: Journal Articles
  • Assimilating genome-scale metabolic reconstructions with modelBorgifier
    [Mar 2014]

    Motivation: Genome-scale reconstructions and models, as collections of genomic and metabolic information, provide a useful means to compare organisms. Comparison requires that models are similarly notated to pair shared components.

    Result: Matching and comparison of genome-scale reconstructions and models are facilitated by modelBorgifier. It reconciles models in light of different annotation schemes, allowing diverse models to become useful for synchronous investigation.

    Availability and implementation: The modelBorgifier toolbox is freely available at http://www.brain-biotech.de/downloads/modelBorgifier.zip.

    Contact: jrb@brain-biotech.de

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • CheNER: chemical named entity recognizer
    [Mar 2014]

    Motivation: Chemical named entity recognition is used to automatically identify mentions to chemical compounds in text and is the basis for more elaborate information extraction. However, only a small number of applications are freely available to identify such mentions. Particularly challenging and useful is the identification of International Union of Pure and Applied Chemistry (IUPAC) chemical compounds, which due to the complex morphology of IUPAC names requires more advanced techniques than that of brand names.

    Results: We present CheNER, a tool for automated identification of systematic IUPAC chemical mentions. We evaluated different systems using an established literature corpus to show that CheNER has a superior performance in identifying IUPAC names specifically, and that it makes better use of computational resources.

    Availability and implementation: http://metres.udl.cat/index.php/9-download/4-chener, http://chener.bioinfo.cnio.es/

    Contact: miguel.vazquez@cnio.es

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • SurpriseMe: an integrated tool for network community structure characterization using Surprise maximization
    [Mar 2014]

    Summary: Detecting communities and densely connected groups may contribute to unravel the underlying relationships among the units present in diverse biological networks (e.g. interactomes, coexpression networks, ecological networks). We recently showed that communities can be precisely characterized by maximizing Surprise, a global network parameter. Here, we present SurpriseMe, a tool that integrates the outputs of seven of the best algorithms available to estimate the maximum Surprise value. SurpriseMe also generates distance matrices that allow visualizing the relationships among the solutions generated by the algorithms. We show that the communities present in small- and medium-sized networks, with up to 10 000 nodes, can be easily characterized: on standard PC computers, these analyses take less than an hour. Also, four of the algorithms may rapidly analyze networks with up to 100 000 nodes, given enough memory resources. Because of its performance and simplicity, SurpriseMe is a reference tool for community structure characterization.

    Availability and implementation: SurpriseMe is implemented in Perl and C/C++. It compiles and runs on any UNIX-based operating system, including Linux and Mac OS/X, using standard libraries. The source code is freely and publicly available under the GPL 3.0 license at http://github.com/raldecoa/SurpriseMe/releases.

    Contact: imarin@ibv.csic.es

    Categories: Journal Articles
  • LipidGO: database for lipid-related GO terms and applications
    [Mar 2014]

    Motivation: Lipid, an essential class of biomolecules, is receiving increasing attention in the research community, especially with the development of analytical technique like mass spectrometry. Gene Ontology (GO) is the de facto standard function annotation scheme for gene products. Identification of both explicit and implicit lipid-related GO terms will help lipid research in many ways, e.g. assigning lipid function in protein function prediction.

    Results: We have constructed a Web site ‘LipidGO’ that facilitates browsing and searching lipid-related GO terms. An expandable hierarchical GO tree is constructed that allows users to find lipid-related GO terms easily. To support large-scale analysis, a user is able to upload a list of gene products or a list of GO terms to find out which of them is lipid related. Finally, we demonstrate the usefulness of ‘LipidGO’ by two applications: (i) identifying lipid-related gene products in model organisms and (ii) discovering potential novel lipid-related molecular functions

    Availability and implementation: LipidGO is available at http://compbio.ddns.comp.nus.edu.sg/%7elipidgo/index.php.

    Contact: wongls@comp.nus.edu.sg

    Supplementary Information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • tasiRNAdb: a database of ta-siRNA regulatory pathways
    [Mar 2014]

    Summary: In plants, many trans-acting small interfering RNA (ta-siRNA) regulatory pathways have been identified as significant components of the gene networks involved in development, metabolism, responses to biotic and abiotic stresses and DNA methylation at the TAS locus. To obtain a more comprehensive understanding on the nature of ta-siRNA regulatory pathways, we developed a freely accessible resource, tasiRNAdb, to serve as a repository for the sequences of ta-siRNA regulatory pathway-related microRNAs, TASs, ta-siRNAs and ta-siRNA targets, and for the cascading relations among them. With 583 pathways from 18 species, tasiRNAdb is the largest resource for known ta-siRNA regulatory pathways currently available. tasiRNAdb also provides a tool named TasExpAnalysis that was developed to map user-submitted small RNA and degradome libraries to a stored/input TAS and to perform sRNA phasing analysis and TAS cleavage analysis.

    Availability: The database of plant ta-siRNA regulatory pathways is available at http://bioinfo.jit.edu.cn/tasiRNADatabase/.

    Contact: zhang_chq2002@sohu.com

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • Don't use a cannon to kill the ... miRNA mosquito
    [Mar 2014]

    Contact: poirazi@imbb.forth.gr

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • A divergent calponin homology (NN-CH) domain defines a novel family: implications for evolution of ciliary IFT complex B proteins
    [Mar 2014]

    Motivation: Microtubules are dynamic polymers of tubulin dimers that undergo continuous assembly and disassembly. A mounting number of microtubule-associated proteins (MAPs) regulate the dynamic behavior of microtubules and hence the assembly and disassembly of disparate microtubule structures within the cell. Despite recent advances in identification and functional characterization of MAPs, a substantial number of microtubule accessory factors have not been functionally annotated. Here, using profile-to-profile comparisons and structure modeling, we show that the yeast outer kinetochore components NDC80 and NUF2 share evolutionary ancestry with a novel protein family in mammals comprising, besides NDC80/HEC1 and NUF2, three Intraflagellar Transport (IFT) complex B subunits (IFT81, IFT57, CLUAP1) as well as six proteins with poorly defined function (FAM98A-C, CCDC22, CCDC93 and C14orf166). We show that these proteins consist of a divergent N-terminal calponin homology (CH)-like domain adjoined to an array of C-terminal heptad repeats predicted to form a coiled-coil arrangement. We have named the divergent CH-like domain NN–CH after the founding members NDC80 and NUF2.

    Contact: kbschou@bio.ku.dk or lbpedersen@bio.ku.dk

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • SBML and CellML translation in Antimony and JSim
    [Mar 2014]

    Motivation: The creation and exchange of biologically relevant models is of great interest to many researchers. When multiple standards are in use, models are more readily used and re-used if there exist robust translators between the various accepted formats.

    Summary: Antimony 2.4 and JSim 2.10 provide translation capabilities from their own formats to SBML and CellML. All provided unique challenges, stemming from differences in each format’s inherent design, in addition to differences in functionality.

    Availability and implementation: Both programs are available under BSD licenses; Antimony from http://antimony.sourceforge.net/and JSim from http://physiome.org/jsim/.

    Contact: lpsmith@u.washington.edu

    Categories: Journal Articles
  • A wavelet-based method to exploit epigenomic language in the regulatory region
    [Mar 2014]

    Motivation: Epigenetic landscapes in the regulatory regions reflect binding condition of transcription factors and their co-factors. Identifying epigenetic condition and its variation is important in understanding condition-specific gene regulation. Computational approaches to explore complex multi-dimensional landscapes are needed.

    Results: To study epigenomic condition for gene regulation, we developed a method, AWNFR, to classify epigenomic landscapes based on the detected epigenomic landscapes. Assuming mixture of Gaussians for a nucleosome, the proposed method captures the shape of histone modification and identifies potential regulatory regions in the wavelet domain. For accuracy estimation as well as enhanced computational speed, we developed a novel algorithm based on down-sampling operation and footprint in wavelet. We showed the algorithmic advantages of AWNFR using the simulated data. AWNFR identified regulatory regions more effectively and accurately than the previous approaches with the epigenome data in mouse embryonic stem cells and human lung fibroblast cells (IMR90). Based on the detected epigenomic landscapes, AWNFR classified epigenomic status and studied epigenomic codes. We studied co-occurring histone marks and showed that AWNFR captures the epigenomic variation across time.

    Availability and implementation: The source code and supplemental document of AWNFR are available at http://wonk.med.upenn.edu/AWNFR.

    Contact: wonk@mail.med.upenn.edu

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • Efficient clustering of identity-by-descent between multiple individuals
    [Mar 2014]

    Motivation: Most existing identity-by-descent (IBD) detection methods only consider haplotype pairs; less attention has been paid to considering multiple haplotypes simultaneously, even though IBD is an equivalence relation on haplotypes that partitions a set of haplotypes into IBD clusters. Multiple-haplotype IBD clusters may have advantages over pairwise IBD in some applications, such as IBD mapping. Existing methods for detecting multiple-haplotype IBD clusters are often computationally expensive and unable to handle large samples with thousands of haplotypes.

    Results: We present a clustering method, efficient multiple-IBD, which uses pairwise IBD segments to infer multiple-haplotype IBD clusters. It expands clusters from seed haplotypes by adding qualified neighbors and extends clusters across sliding windows in the genome. Our method is an order of magnitude faster than existing methods and has comparable performance with respect to the quality of clusters it uncovers. We further investigate the potential application of multiple-haplotype IBD clusters in association studies by testing for association between multiple-haplotype IBD clusters and low-density lipoprotein cholesterol in the Northern Finland Birth Cohort. Using our multiple-haplotype IBD cluster approach, we found an association with a genomic interval covering the PCSK9 gene in these data that is missed by standard single-marker association tests. Previously published studies confirm association of PCSK9 with low-density lipoprotein.

    Availability and implementation: Source code is available under the GNU Public License http://cs.au.dk/~qianyuxx/EMI/.

    Contact: qianyuxx@gmail.com

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • featureCounts: an efficient general purpose program for assigning sequence reads to genomic features
    [Mar 2014]

    Motivation: Next-generation sequencing technologies generate millions of short sequence reads, which are usually aligned to a reference genome. In many applications, the key information required for downstream analysis is the number of reads mapping to each genomic feature, for example to each exon or each gene. The process of counting reads is called read summarization. Read summarization is required for a great variety of genomic analyses but has so far received relatively little attention in the literature.

    Results: We present featureCounts, a read summarization program suitable for counting reads generated from either RNA or genomic DNA sequencing experiments. featureCounts implements highly efficient chromosome hashing and feature blocking techniques. It is considerably faster than existing methods (by an order of magnitude for gene-level summarization) and requires far less computer memory. It works with either single or paired-end reads and provides a wide range of options appropriate for different sequencing applications.

    Availability and implementation: featureCounts is available under GNU General Public License as part of the Subread (http://subread.sourceforge.net) or Rsubread (http://www.bioconductor.org) software packages.

    Contact: shi@wehi.edu.au

    Categories: Journal Articles