Bioinformatics Journal

Bioinformatics - RSS feed of current issue
  • Big data and other challenges in the quest for orthologs
    [Oct 2014]

    Given the rapid increase of species with a sequenced genome, the need to identify orthologous genes between them has emerged as a central bioinformatics task. Many different methods exist for orthology detection, which makes it difficult to decide which one to choose for a particular application.

    Here, we review the latest developments and issues in the orthology field, and summarize the most recent results reported at the third ‘Quest for Orthologs’ meeting. We focus on community efforts such as the adoption of reference proteomes, standard file formats and benchmarking. Progress in these areas is good, and they are already beneficial to both orthology consumers and providers. However, a major current issue is that the massive increase in complete proteomes poses computational challenges to many of the ortholog database providers, as most orthology inference algorithms scale at least quadratically with the number of proteomes.

    The Quest for Orthologs consortium is an open community with a number of working groups that join efforts to enhance various aspects of orthology analysis, such as defining standard formats and datasets, documenting community resources and benchmarking.

    Availability and implementation: All such materials are available at http://questfororthologs.org.

    Contact: erik.sonnhammer@scilifelab.se or c.dessimoz@ucl.ac.uk

    Categories: Journal Articles
  • Comparison of the mammalian insulin signalling pathway to invertebrates in the context of FOXO-mediated ageing
    [Oct 2014]

    Motivation: A large number of experimental studies on ageing focus on the effects of genetic perturbations of the insulin/insulin-like growth factor signalling pathway (IIS) on lifespan. Short-lived invertebrate laboratory model organisms are extensively used to quickly identify ageing-related genes and pathways. It is important to extrapolate this knowledge to longer lived mammalian organisms, such as mouse and eventually human, where such analyses are difficult or impossible to perform. Computational tools are needed to integrate and manipulate pathway knowledge in different species.

    Results: We performed a literature review and curation of the IIS and target of rapamycin signalling pathways in Mus Musculus. We compare this pathway model to the equivalent models in Drosophila melanogaster and Caenorhabtitis elegans. Although generally well-conserved, they exhibit important differences. In general, the worm and mouse pathways include a larger number of feedback loops and interactions than the fly. We identify ‘functional orthologues’ that share similar molecular interactions, but have moderate sequence similarity. Finally, we incorporate the mouse model into the web-service NetEffects and perform in silico gene perturbations of IIS components and analyses of experimental results. We identify sub-paths that, given a mutation in an IIS component, could potentially antagonize the primary effects on ageing via FOXO in mouse and via SKN-1 in worm. Finally, we explore the effects of FOXO knockouts in three different mouse tissues.

    Availability and implementation: http://www.ebi.ac.uk/thornton-srv/software/NetEffects

    Contact: ip8@sanger.ac.uk or thornton@ebi.ac.uk

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • proovread: large-scale high-accuracy PacBio correction through iterative short read consensus
    [Oct 2014]

    Motivation: Today, the base code of DNA is mostly determined through sequencing by synthesis as provided by the Illumina sequencers. Although highly accurate, resulting reads are short, making their analyses challenging. Recently, a new technology, single molecule real-time (SMRT) sequencing, was developed that could address these challenges, as it generates reads of several thousand bases. But, their broad application has been hampered by a high error rate. Therefore, hybrid approaches that use high-quality short reads to correct erroneous SMRT long reads have been developed. Still, current implementations have great demands on hardware, work only in well-defined computing infrastructures and reject a substantial amount of reads. This limits their usability considerably, especially in the case of large sequencing projects.

    Results: Here we present proovread, a hybrid correction pipeline for SMRT reads, which can be flexibly adapted on existing hardware and infrastructure from a laptop to a high-performance computing cluster. On genomic and transcriptomic test cases covering Escherichia coli, Arabidopsis thaliana and human, proovread achieved accuracies up to 99.9% and outperformed the existing hybrid correction programs. Furthermore, proovread-corrected sequences were longer and the throughput was higher. Thus, proovread combines the most accurate correction results with an excellent adaptability to the available hardware. It will therefore increase the applicability and value of SMRT sequencing.

    Availability and implementation: proovread is available at the following URL: http://proovread.bioapps.biozentrum.uni-wuerzburg.de

    Contact: frank.foerster@biozentrum.uni-wuerzburg.de

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • A new approach for detecting riboswitches in DNA sequences
    [Oct 2014]

    Motivation: Riboswitches are short sequences of messenger RNA that can change their structural conformation to regulate the expression of adjacent genes. Computational prediction of putative riboswitches can provide direction to molecular biologists studying riboswitch-mediated gene expression.

    Results: The Denison Riboswitch Detector (DRD) is a new computational tool with a Web interface that can quickly identify putative riboswitches in DNA sequences on the scale of bacterial genomes. Riboswitch descriptions are easily modifiable and new ones are easily created. The underlying algorithm converts the problem to a ‘heaviest path’ problem on a multipartite graph, which is then solved using efficient dynamic programming. We show that DRD can achieve ~88–99% sensitivity and >99.99% specificity on 13 riboswitch families.

    Availability and implementation: DRD is available at http://drd.denison.edu.

    Contact: havill@denison.edu

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • Monte Carlo algorithms for Brownian phylogenetic models
    [Oct 2014]

    Motivation: Brownian models have been introduced in phylogenetics for describing variation in substitution rates through time, with applications to molecular dating or to the comparative analysis of variation in substitution patterns among lineages. Thus far, however, the Monte Carlo implementations of these models have relied on crude approximations, in which the Brownian process is sampled only at the internal nodes of the phylogeny or at the midpoints along each branch, and the unknown trajectory between these sampled points is summarized by simple branchwise average substitution rates.

    Results: A more accurate Monte Carlo approach is introduced, explicitly sampling a fine-grained discretization of the trajectory of the (potentially multivariate) Brownian process along the phylogeny. Generic Monte Carlo resampling algorithms are proposed for updating the Brownian paths along and across branches. Specific computational strategies are developed for efficient integration of the finite-time substitution probabilities across branches induced by the Brownian trajectory. The mixing properties and the computational complexity of the resulting Markov chain Monte Carlo sampler scale reasonably with the discretization level, allowing practical applications with up to a few hundred discretization points along the entire depth of the tree. The method can be generalized to other Markovian stochastic processes, making it possible to implement a wide range of time-dependent substitution models with well-controlled computational precision.

    Availability: The program is freely available at www.phylobayes.org

    Contact: nicolas.lartillot@univ-lyon1.fr

    Categories: Journal Articles
  • CCBuilder: an interactive web-based tool for building, designing and assessing coiled-coil protein assemblies
    [Oct 2014]

    Motivation: The ability to accurately model protein structures at the atomistic level underpins efforts to understand protein folding, to engineer natural proteins predictably and to design proteins de novo. Homology-based methods are well established and produce impressive results. However, these are limited to structures presented by and resolved for natural proteins. Addressing this problem more widely and deriving truly ab initio models requires mathematical descriptions for protein folds; the means to decorate these with natural, engineered or de novo sequences; and methods to score the resulting models.

    Results: We present CCBuilder, a web-based application that tackles the problem for a defined but large class of protein structure, the α-helical coiled coils. CCBuilder generates coiled-coil backbones, builds side chains onto these frameworks and provides a range of metrics to measure the quality of the models. Its straightforward graphical user interface provides broad functionality that allows users to build and assess models, in which helix geometry, coiled-coil architecture and topology and protein sequence can be varied rapidly. We demonstrate the utility of CCBuilder by assembling models for 653 coiled-coil structures from the PDB, which cover >96% of the known coiled-coil types, and by generating models for rarer and de novo coiled-coil structures.

    Availability and implementation: CCBuilder is freely available, without registration, at http://coiledcoils.chm.bris.ac.uk/app/cc_builder/

    Contact: D.N.Woolfson@bristol.ac.uk or Chris.Wood@bristol.ac.uk

    Categories: Journal Articles
  • Modeling time-dependent transcription effects of HER2 oncogene and discovery of a role for E2F2 in breast cancer cell-matrix adhesion
    [Oct 2014]

    Motivation: Oncogenes are known drivers of cancer phenotypes and targets of molecular therapies; however, the complex and diverse signaling mechanisms regulated by oncogenes and potential routes to targeted therapy resistance remain to be fully understood. To this end, we present an approach to infer regulatory mechanisms downstream of the HER2 driver oncogene in SUM-225 metastatic breast cancer cells from dynamic gene expression patterns using a succession of analytical techniques, including a novel MP grammars method to mathematically model putative regulatory interactions among sets of clustered genes.

    Results: Our method highlighted regulatory interactions previously identified in the cell line and a novel finding that the HER2 oncogene, as opposed to the proto-oncogene, upregulates expression of the E2F2 transcription factor. By targeted gene knockdown we show the significance of this, demonstrating that cancer cell-matrix adhesion and outgrowth were markedly inhibited when E2F2 levels were reduced. Thus, validating in this context that upregulation of E2F2 represents a key intermediate event in a HER2 oncogene-directed gene expression-based signaling circuit. This work demonstrates how predictive modeling of longitudinal gene expression data combined with multiple systems-level analyses can be used to accurately predict downstream signaling pathways. Here, our integrated method was applied to reveal insights as to how the HER2 oncogene drives a specific cancer cell phenotype, but it is adaptable to investigate other oncogenes and model systems.

    Availability and implementation: Accessibility of various tools is listed in methods; the Log-Gain Stoichiometric Stepwise algorithm is accessible at http://www.cbmc.it/software/Software.php.

    Contact: bollig@karmanos.org

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • An improved method for computing q-values when the distribution of effect sizes is asymmetric
    [Oct 2014]

    Motivation: Asymmetry is frequently observed in the empirical distribution of test statistics that results from the analysis of gene expression experiments. This asymmetry indicates an asymmetry in the distribution of effect sizes. A common method for identifying differentially expressed (DE) genes in a gene expression experiment while controlling false discovery rate (FDR) is Storey’s q-value method. This method ranks genes based solely on the P-values from each gene in the experiment.

    Results: We propose a method that alters and improves upon the q-value method by taking the sign of the test statistics, in addition to the P-values, into account. Through two simulation studies (one involving independent normal data and one involving microarray data), we show that the proposed method, when compared with the traditional q-value method, generally provides a better ranking for genes as well as a higher number of truly DE genes declared to be DE, while still adequately controlling FDR. We illustrate the proposed method by analyzing two microarray datasets, one from an experiment of thale cress seedlings and the other from an experiment of maize leaves.

    Availability and implementation: The R code and data files for the proposed method and examples are available at Bioinformatics online.

    Contact: megan.orr@ndsu.edu

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • Network-based analysis identifies epigenetic biomarkers of esophageal squamous cell carcinoma progression
    [Oct 2014]

    Motivation: A rapid progression of esophageal squamous cell carcinoma (ESCC) causes a high mortality rate because of the propensity for metastasis driven by genetic and epigenetic alterations. The identification of prognostic biomarkers would help prevent or control metastatic progression. Expression analyses have been used to find such markers, but do not always validate in separate cohorts. Epigenetic marks, such as DNA methylation, are a potential source of more reliable and stable biomarkers. Importantly, the integration of both expression and epigenetic alterations is more likely to identify relevant biomarkers.

    Results: We present a new analysis framework, using ESCC progression-associated gene regulatory network (GRNescc), to identify differentially methylated CpG sites prognostic of ESCC progression. From the CpG loci differentially methylated in 50 tumor–normal pairs, we selected 44 CpG loci most highly associated with survival and located in the promoters of genes more likely to belong to GRNescc. Using an independent ESCC cohort, we confirmed that 8/10 of CpG loci in the promoter of GRNescc genes significantly correlated with patient survival. In contrast, 0/10 CpG loci in the promoter genes outside the GRNescc were correlated with patient survival. We further characterized the GRNescc network topology and observed that the genes with methylated CpG loci associated with survival deviated from the center of mass and were less likely to be hubs in the GRNescc. We postulate that our analysis framework improves the identification of bona fide prognostic biomarkers from DNA methylation studies, especially with partial genome coverage.

    Contact: tsengsm@mail.ncku.edu.tw or ycw5798@mail.ncku.edu.tw

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • Mas-o-menos: a simple sign averaging method for discrimination in genomic data analysis
    [Oct 2014]

    Motivation: The successful translation of genomic signatures into clinical settings relies on good discrimination between patient subgroups. Many sophisticated algorithms have been proposed in the statistics and machine learning literature, but in practice simpler algorithms are often used. However, few simple algorithms have been formally described or systematically investigated.

    Results: We give a precise definition of a popular simple method we refer to as más-o-menos, which calculates prognostic scores for discrimination by summing standardized predictors, weighted by the signs of their marginal associations with the outcome. We study its behavior theoretically, in simulations and in an extensive analysis of 27 independent gene expression studies of bladder, breast and ovarian cancer, altogether totaling 3833 patients with survival outcomes. We find that despite its simplicity, más-o-menos can achieve good discrimination performance. It performs no worse, and sometimes better, than popular and much more CPU-intensive methods for discrimination, including lasso and ridge regression.

    Availability and Implementation: Más-o-menos is implemented for survival analysis as an option in the survHD package, available from http://www.bitbucket.org/lwaldron/survhd and submitted to Bioconductor.

    Contact: sdzhao@illinois.edu

    Categories: Journal Articles
  • Inferring condition-specific miRNA activity from matched miRNA and mRNA expression data
    [Oct 2014]

    Motivation: MicroRNAs (miRNAs) play crucial roles in complex cellular networks by binding to the messenger RNAs (mRNAs) of protein coding genes. It has been found that miRNA regulation is often condition-specific. A number of computational approaches have been developed to identify miRNA activity specific to a condition of interest using gene expression data. However, most of the methods only use the data in a single condition, and thus, the activity discovered may not be unique to the condition of interest. Additionally, these methods are based on statistical associations between the gene expression levels of miRNAs and mRNAs, so they may not be able to reveal real gene regulatory relationships, which are causal relationships.

    Results: We propose a novel method to infer condition-specific miRNA activity by considering (i) the difference between the regulatory behavior that an miRNA has in the condition of interest and its behavior in the other conditions; (ii) the causal semantics of miRNA–mRNA relationships. The method is applied to the epithelial–mesenchymal transition (EMT) and multi-class cancer (MCC) datasets. The validation by the results of transfection experiments shows that our approach is effective in discovering significant miRNA–mRNA interactions. Functional and pathway analysis and literature validation indicate that the identified active miRNAs are closely associated with the specific biological processes, diseases and pathways. More detailed analysis of the activity of the active miRNAs implies that some active miRNAs show different regulation types in different conditions, but some have the same regulation types and their activity only differs in different conditions in the strengths of regulation.

    Availability and implementation: The R and Matlab scripts are in the Supplementary materials.

    Contact: jiuyong.li@unisa.edu.au

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • Compression and fast retrieval of SNP data
    [Oct 2014]

    Motivation: The increasing interest in rare genetic variants and epistatic genetic effects on complex phenotypic traits is currently pushing genome-wide association study design towards datasets of increasing size, both in the number of studied subjects and in the number of genotyped single nucleotide polymorphisms (SNPs). This, in turn, is leading to a compelling need for new methods for compression and fast retrieval of SNP data.

    Results: We present a novel algorithm and file format for compressing and retrieving SNP data, specifically designed for large-scale association studies. Our algorithm is based on two main ideas: (i) compress linkage disequilibrium blocks in terms of differences with a reference SNP and (ii) compress reference SNPs exploiting information on their call rate and minor allele frequency. Tested on two SNP datasets and compared with several state-of-the-art software tools, our compression algorithm is shown to be competitive in terms of compression rate and to outperform all tools in terms of time to load compressed data.

    Availability and implementation: Our compression and decompression algorithms are implemented in a C++ library, are released under the GNU General Public License and are freely downloadable from http://www.dei.unipd.it/~sambofra/snpack.html.

    Contact: sambofra@dei.unipd.it or cobelli@dei.unipd.it.

    Categories: Journal Articles
  • ASP-G: an ASP-based method for finding attractors in genetic regulatory networks
    [Oct 2014]

    Motivation: Boolean network models are suitable to simulate GRNs in the absence of detailed kinetic information. However, reducing the biological reality implies making assumptions on how genes interact (interaction rules) and how their state is updated during the simulation (update scheme). The exact choice of the assumptions largely determines the outcome of the simulations. In most cases, however, the biologically correct assumptions are unknown. An ideal simulation thus implies testing different rules and schemes to determine those that best capture an observed biological phenomenon. This is not trivial because most current methods to simulate Boolean network models of GRNs and to compute their attractors impose specific assumptions that cannot be easily altered, as they are built into the system.

    Results: To allow for a more flexible simulation framework, we developed ASP-G. We show the correctness of ASP-G in simulating Boolean network models and obtaining attractors under different assumptions by successfully recapitulating the detection of attractors of previously published studies. We also provide an example of how performing simulation of network models under different settings help determine the assumptions under which a certain conclusion holds. The main added value of ASP-G is in its modularity and declarativity, making it more flexible and less error-prone than traditional approaches. The declarative nature of ASP-G comes at the expense of being slower than the more dedicated systems but still achieves a good efficiency with respect to computational time.

    Availability and implementation: The source code of ASP-G is available at http://bioinformatics.intec.ugent.be/kmarchal/Supplementary_Information_Musthofa_2014/asp-g.zip.

    Contact: Kathleen.Marchal@UGent.be or Martine.DeCock@UGent.be

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • Systematic analysis of gene properties influencing organ system phenotypes in mammalian perturbations
    [Oct 2014]

    Motivation: Diseases and adverse drug reactions are frequently caused by disruptions in gene functionality. Gaining insight into the global system properties governing the relationships between genotype and phenotype is thus crucial to understand and interfere with perturbations in complex organisms such as diseases states.

    Results: We present a systematic analysis of phenotypic information of 5047 perturbations of single genes in mice, 4766 human diseases and 1666 drugs that examines the relationships between different gene properties and the phenotypic impact at the organ system level in mammalian organisms. We observe that while single gene perturbations and alterations of nonessential, tissue-specific genes or those with low betweenness centrality in protein–protein interaction networks often show organ-specific effects, multiple gene alterations resulting e.g. from complex disorders and drug treatments have a more widespread impact. Interestingly, certain cellular localizations are distinctly associated to systemic effects in monogenic disease genes and mouse gene perturbations, such as the lumen of intracellular organelles and transcription factor complexes, respectively. In summary, we show that the broadness of the phenotypic effect is clearly related to certain gene properties and is an indicator of the severity of perturbations. This work contributes to the understanding of gene properties influencing the systemic effects of diseases and drugs.

    Contact: monica.campillos@helmholtz-muenchen.de

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • Biocellion: accelerating computer simulation of multicellular biological system models
    [Oct 2014]

    Motivation: Biological system behaviors are often the outcome of complex interactions among a large number of cells and their biotic and abiotic environment. Computational biologists attempt to understand, predict and manipulate biological system behavior through mathematical modeling and computer simulation. Discrete agent-based modeling (in combination with high-resolution grids to model the extracellular environment) is a popular approach for building biological system models. However, the computational complexity of this approach forces computational biologists to resort to coarser resolution approaches to simulate large biological systems. High-performance parallel computers have the potential to address the computing challenge, but writing efficient software for parallel computers is difficult and time-consuming.

    Results: We have developed Biocellion, a high-performance software framework, to solve this computing challenge using parallel computers. To support a wide range of multicellular biological system models, Biocellion asks users to provide their model specifics by filling the function body of pre-defined model routines. Using Biocellion, modelers without parallel computing expertise can efficiently exploit parallel computers with less effort than writing sequential programs from scratch. We simulate cell sorting, microbial patterning and a bacterial system in soil aggregate as case studies.

    Availability and implementation: Biocellion runs on x86 compatible systems with the 64 bit Linux operating system and is freely available for academic use. Visit http://biocellion.com for additional information.

    Contact: seunghwa.kang@pnnl.gov

    Categories: Journal Articles
  • e-Driver: a novel method to identify protein regions driving cancer
    [Oct 2014]

    Motivation: Most approaches used to identify cancer driver genes focus, true to their name, on entire genes and assume that a gene, treated as one entity, has a specific role in cancer. This approach may be correct to describe effects of gene loss or changes in gene expression; however, mutations may have different effects, including their relevance to cancer, depending on which region of the gene they affect. Except for rare and well-known exceptions, there are not enough data for reliable statistics for individual positions, but an intermediate level of analysis, between an individual position and the entire gene, may give us better statistics than the former and better resolution than the latter approach.

    Results: We have developed e-Driver, a method that exploits the internal distribution of somatic missense mutations between the protein’s functional regions (domains or intrinsically disordered regions) to find those that show a bias in their mutation rate as compared with other regions of the same protein, providing evidence of positive selection and suggesting that these proteins may be actual cancer drivers. We have applied e-Driver to a large cancer genome dataset from The Cancer Genome Atlas and compared its performance with that of four other methods, showing that e-Driver identifies novel candidate cancer drivers and, because of its increased resolution, provides deeper insights into the potential mechanism of cancer driver genes identified by other methods.

    Availability and implementation: A Perl script with e-Driver and the files to reproduce the results described here can be downloaded from https://github.com/eduardporta/e-Driver.git

    Contact: adam@godziklab.org or eppardo@sanfordburnham.org

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • MToolBox: a highly automated pipeline for heteroplasmy annotation and prioritization analysis of human mitochondrial variants in high-throughput sequencing
    [Oct 2014]

    Motivation: The increasing availability of mitochondria-targeted and off-target sequencing data in whole-exome and whole-genome sequencing studies (WXS and WGS) has risen the demand of effective pipelines to accurately measure heteroplasmy and to easily recognize the most functionally important mitochondrial variants among a huge number of candidates. To this purpose, we developed MToolBox, a highly automated pipeline to reconstruct and analyze human mitochondrial DNA from high-throughput sequencing data.

    Results: MToolBox implements an effective computational strategy for mitochondrial genomes assembling and haplogroup assignment also including a prioritization analysis of detected variants. MToolBox provides a Variant Call Format file featuring, for the first time, allele-specific heteroplasmy and annotation files with prioritized variants. MToolBox was tested on simulated samples and applied on 1000 Genomes WXS datasets.

    Availability and implementation: MToolBox package is available at https://sourceforge.net/projects/mtoolbox/.

    Contact: marcella.attimonelli@uniba.it

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • AffyPipe: an open-source pipeline for Affymetrix Axiom genotyping workflow
    [Oct 2014]

    The Affymetrix Axiom genotyping standard and ‘best practice’ workflow for Linux and Mac users consists of three stand-alone executable programs (Affymetrix Power Tools) and an R package (SNPolisher). Currently, SNP analysis has to be performed in a step-by-step procedure. Manual intervention and/or programming skills by the user is required at each intermediate point, as Affymetrix Power Tools programs do not produce input files for the program next-in-line. An additional problem is that the output format of genotypes is not compatible with most analysis software currently available. AffyPipe solves all the above problems, by automating both standard and ‘best practice’ workflows for any species genotyped with the Axiom technology. AffyPipe does not require programming skills and performs all the steps necessary to obtain a final genotype file. Furthermore, users can directly edit SNP probes and export genotypes in PLINK format.

    Availability and implementation: https://github.com/nicolazzie/AffyPipe.git.

    Contact: ezequiel.nicolazzi@tecnoparco.org

    Categories: Journal Articles
  • FisHiCal: an R package for iterative FISH-based calibration of Hi-C data
    [Oct 2014]

    Summary: The fluorescence in situ hybridization (FISH) method has been providing valuable information on physical distances between loci (via image analysis) for several decades. Recently, high-throughput data on nearby chemical contacts between and within chromosomes became available with the Hi-C method. Here, we present FisHiCal, an R package for an iterative FISH-based Hi-C calibration that exploits in full the information coming from these methods. We describe here our calibration model and present 3D inference methods that we have developed for increasing its usability, namely, 3D reconstruction through local stress minimization and detection of spatial inconsistencies. We next confirm our calibration across three human cell lines and explain how the output of our methods could inform our model, defining an iterative calibration pipeline, with applications for quality assessment and meta-analysis.

    Availability and implementation: FisHiCal v1.1 is available from http://cran.r-project.org/.

    Contact: ys388@cam.ac.uk

    Supplementary information: Supplementary Data is available at Bioinformatics online.

    Categories: Journal Articles
  • STAMP: statistical analysis of taxonomic and functional profiles
    [Oct 2014]

    Summary: STAMP is a graphical software package that provides statistical hypothesis tests and exploratory plots for analysing taxonomic and functional profiles. It supports tests for comparing pairs of samples or samples organized into two or more treatment groups. Effect sizes and confidence intervals are provided to allow critical assessment of the biological relevancy of test results. A user-friendly graphical interface permits easy exploration of statistical results and generation of publication-quality plots.

    Availability and implementation: STAMP is licensed under the GNU GPL. Python source code and binaries are available from our website at: http://kiwi.cs.dal.ca/Software/STAMP

    Contact: donovan.parks@gmail.com

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles