Bioinformatics Journal

Bioinformatics - RSS feed of current issue

URL: http://bioinformatics.oxfordjournals.org

Updated: 8 years 20 weeks ago

Mango: a bias-correcting ChIA-PET analysis pipeline

Mon, 09/21/2015 - 07:37

Motivation: Chromatin Interaction Analysis by Paired-End Tag sequencing (ChIA-PET) is an established method for detecting genome-wide looping interactions at high resolution. Current ChIA-PET analysis software packages either fail to correct for non-specific interactions due to genomic proximity or only address a fraction of the steps required for data processing. We present Mango, a complete ChIA-PET data analysis pipeline that provides statistical confidence estimates for interactions and corrects for major sources of bias including differential peak enrichment and genomic proximity.

Results: Comparison to the existing software packages, ChIA-PET Tool and ChiaSig revealed that Mango interactions exhibit much better agreement with high-resolution Hi-C data. Importantly, Mango executes all steps required for processing ChIA-PET datasets, whereas ChiaSig only completes 20% of the required steps. Application of Mango to multiple available ChIA-PET datasets permitted the independent rediscovery of known trends in chromatin loops including enrichment of CTCF, RAD21, SMC3 and ZNF143 at the anchor regions of interactions and strong bias for convergent CTCF motifs.

Availability and implementation: Mango is open source and distributed through github at https://github.com/dphansti/mango.

Contact: mpsnyder@standford.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

DISTMIX: direct imputation of summary statistics for unmeasured SNPs from mixed ethnicity cohorts

Mon, 09/21/2015 - 07:37

Motivation: To increase the signal resolution for large-scale meta-analyses of genome-wide association studies, genotypes at unmeasured single nucleotide polymorphisms (SNPs) are commonly imputed using large multi-ethnic reference panels. However, the ever increasing size and ethnic diversity of both reference panels and cohorts makes genotype imputation computationally challenging for moderately sized computer clusters. Moreover, genotype imputation requires subject-level genetic data, which unlike summary statistics provided by virtually all studies, is not publicly available. While there are much less demanding methods which avoid the genotype imputation step by directly imputing SNP statistics, e.g. Directly Imputing summary STatistics (DIST) proposed by our group, their implicit assumptions make them applicable only to ethnically homogeneous cohorts.

Results: To decrease computational and access requirements for the analysis of cosmopolitan cohorts, we propose DISTMIX, which extends DIST capabilities to the analysis of mixed ethnicity cohorts. The method uses a relevant reference panel to directly impute unmeasured SNP statistics based only on statistics at measured SNPs and estimated/user-specified ethnic proportions. Simulations show that the proposed method adequately controls the Type I error rates. The 1000 Genomes panel imputation of summary statistics from the ethnically diverse Psychiatric Genetic Consortium Schizophrenia Phase 2 suggests that, when compared to genotype imputation methods, DISTMIX offers comparable imputation accuracy for only a fraction of computational resources.

Availability and implementation: DISTMIX software, its reference population data, and usage examples are publicly available at http://code.google.com/p/distmix.

Contact: dlee4@vcu.edu

Supplementary information: Supplementary Data are available at Bioinformatics online.

Categories: Journal Articles

SoloDel: a probabilistic model for detecting low-frequent somatic deletions from unmatched sequencing data

Mon, 09/21/2015 - 07:37

Motivation: Finding somatic mutations from massively parallel sequencing data is becoming a standard process in genome-based biomedical studies. There are a number of robust methods developed for detecting somatic single nucleotide variations However, detection of somatic copy number alteration has been substantially less explored and remains vulnerable to frequently raised sampling issues: low frequency in cell population and absence of the matched control samples.

Results: We developed a novel computational method SoloDel that accurately classifies low-frequent somatic deletions from germline ones with or without matched control samples. We first constructed a probabilistic, somatic mutation progression model that describes the occurrence and propagation of the event in the cellular lineage of the sample. We then built a Gaussian mixture model to represent the mixed population of somatic and germline deletions. Parameters of the mixture model could be estimated using the expectation-maximization algorithm with the observed distribution of read-depth ratios at the points of discordant-read based initial deletion calls. Combined with conventional structural variation caller, SoloDel greatly increased the accuracy in classifying somatic mutations. Even without control, SoloDel maintained a comparable performance in a wide range of mutated subpopulation size (10–70%). SoloDel could also successfully recall experimentally validated somatic deletions from previously reported neuropsychiatric whole-genome sequencing data.

Availability and implementation: Java-based implementation of the method is available at http://sourceforge.net/projects/solodel/

Contact: swkim@yuhs.ac or dhlee@biosoft.kaist.ac.kr

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

antaRNA: ant colony-based RNA sequence design

Mon, 09/21/2015 - 07:37

Motivation: RNA sequence design is studied at least as long as the classical folding problem. Although for the latter the functional fold of an RNA molecule is to be found, inverse folding tries to identify RNA sequences that fold into a function-specific target structure. In combination with RNA-based biotechnology and synthetic biology, reliable RNA sequence design becomes a crucial step to generate novel biochemical components.

Results: In this article, the computational tool antaRNA is presented. It is capable of compiling RNA sequences for a given structure that comply in addition with an adjustable full range objective GC-content distribution, specific sequence constraints and additional fuzzy structure constraints. antaRNA applies ant colony optimization meta-heuristics and its superior performance is shown on a biological datasets.

Availability and implementation: http://www.bioinf.uni-freiburg.de/Software/antaRNA

Contact: backofen@informatik.uni-freiburg.de

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

QVZ: lossy compression of quality values

Mon, 09/21/2015 - 07:37

Motivation: Recent advancements in sequencing technology have led to a drastic reduction in the cost of sequencing a genome. This has generated an unprecedented amount of genomic data that must be stored, processed and transmitted. To facilitate this effort, we propose a new lossy compressor for the quality values presented in genomic data files (e.g. FASTQ and SAM files), which comprise roughly half of the storage space (in the uncompressed domain). Lossy compression allows for compression of data beyond its lossless limit.

Results: The proposed algorithm QVZ exhibits better rate-distortion performance than the previously proposed algorithms, for several distortion metrics and for the lossless case. Moreover, it allows the user to define any quasi-convex distortion function to be minimized, a feature not supported by the previous algorithms. Finally, we show that QVZ-compressed data exhibit better performance in the genotyping than data compressed with previously proposed algorithms, in the sense that for a similar rate, a genotyping closer to that achieved with the original quality values is obtained.

Availability and implementation: QVZ is written in C and can be downloaded from https://github.com/mikelhernaez/qvz.

Contact: mhernaez@stanford.edu or gmalysa@stanford.edu or iochoa@stanford.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

A parallel and sensitive software tool for methylation analysis on multicore platforms

Mon, 09/21/2015 - 07:37

Motivation: DNA methylation analysis suffers from very long processing time, as the advent of Next-Generation Sequencers has shifted the bottleneck of genomic studies from the sequencers that obtain the DNA samples to the software that performs the analysis of these samples. The existing software for methylation analysis does not seem to scale efficiently neither with the size of the dataset nor with the length of the reads to be analyzed. As it is expected that the sequencers will provide longer and longer reads in the near future, efficient and scalable methylation software should be developed.

Results: We present a new software tool, called HPG-Methyl, which efficiently maps bisulphite sequencing reads on DNA, analyzing DNA methylation. The strategy used by this software consists of leveraging the speed of the Burrows–Wheeler Transform to map a large number of DNA fragments (reads) rapidly, as well as the accuracy of the Smith–Waterman algorithm, which is exclusively employed to deal with the most ambiguous and shortest reads. Experimental results on platforms with Intel multicore processors show that HPG-Methyl significantly outperforms in both execution time and sensitivity state-of-the-art software such as Bismark, BS-Seeker or BSMAP, particularly for long bisulphite reads.

Availability and implementation: Software in the form of C libraries and functions, together with instructions to compile and execute this software. Available by sftp to anonymous@clariano.uv.es (password ‘anonymous’).

Contact: juan.orduna@uv.es or jdopazo@cipf.es

Categories: Journal Articles

UniAlign: protein structure alignment meets evolution

Mon, 09/21/2015 - 07:37

Motivation: During the evolution, functional sites on the surface of the protein as well as the hydrophobic core maintaining the structural integrity are well-conserved. However, available protein structure alignment methods align protein structures based solely on the 3D geometric similarity, limiting their ability to detect functionally relevant correspondences between the residues of the proteins, especially for distantly related homologous proteins.

Results: In this article, we propose a new protein pairwise structure alignment algorithm (UniAlign) that incorporates additional evolutionary information captured in the form of sequence similarity, sequence profiles and residue conservation. We define a per-residue score (UniScore) as a weighted sum of these and other features and develop an iterative optimization procedure to search for an alignment with the best overall UniScore. Our extensive experiments on CDD, HOMSTRAD and BAliBASE benchmark datasets show that UniAlign outperforms commonly used structure alignment methods. We further demonstrate UniAlign's ability to develop family-specific models to drastically improve the quality of the alignments.

Availability and implementation: UniAlign is available as a web service at: http://sacan.biomed.drexel.edu/unialign

Contact: ahmet.sacan@drexel.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

xHeinz: an algorithm for mining cross-species network modules under a flexible conservation model

Mon, 09/21/2015 - 07:37

Motivation: Integrative network analysis methods provide robust interpretations of differential high-throughput molecular profile measurements. They are often used in a biomedical context—to generate novel hypotheses about the underlying cellular processes or to derive biomarkers for classification and subtyping. The underlying molecular profiles are frequently measured and validated on animal or cellular models. Therefore the results are not immediately transferable to human. In particular, this is also the case in a study of the recently discovered interleukin-17 producing helper T cells (Th17), which are fundamental for anti-microbial immunity but also known to contribute to autoimmune diseases.

Results: We propose a mathematical model for finding active subnetwork modules that are conserved between two species. These are sets of genes, one for each species, which (i) induce a connected subnetwork in a species-specific interaction network, (ii) show overall differential behavior and (iii) contain a large number of orthologous genes. We propose a flexible notion of conservation, which turns out to be crucial for the quality of the resulting modules in terms of biological interpretability. We propose an algorithm that finds provably optimal or near-optimal conserved active modules in our model. We apply our algorithm to understand the mechanisms underlying Th17 T cell differentiation in both mouse and human. As a main biological result, we find that the key regulation of Th17 differentiation is conserved between human and mouse.

Availability and implementation: xHeinz, an implementation of our algorithm, as well as all input data and results, are available at http://software.cwi.nl/xheinz and as a Galaxy service at http://services.cbib.u-bordeaux2.fr/galaxy in CBiB Tools.

Contact: gunnar.klau@cwi.nl

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

Differential protein expression and peak selection in mass spectrometry data by binary discriminant analysis

Mon, 09/21/2015 - 07:37

Motivation: Proteomic mass spectrometry analysis is becoming routine in clinical diagnostics, for example to monitor cancer biomarkers using blood samples. However, differential proteomics and identification of peaks relevant for class separation remains challenging.

Results: Here, we introduce a simple yet effective approach for identifying differentially expressed proteins using binary discriminant analysis. This approach works by data-adaptive thresholding of protein expression values and subsequent ranking of the dichotomized features using a relative entropy measure. Our framework may be viewed as a generalization of the ‘peak probability contrast’ approach of Tibshirani et al. (2004) and can be applied both in the two-group and the multi-group setting. Our approach is computationally inexpensive and shows in the analysis of a large-scale drug discovery test dataset equivalent prediction accuracy as a random forest. Furthermore, we were able to identify in the analysis of mass spectrometry data from a pancreas cancer study biological relevant and statistically predictive marker peaks unrecognized in the original study.

Availability and implementation: The methodology for binary discriminant analysis is implemented in the R package binda, which is freely available under the GNU General Public License (version 3 or later) from CRAN at URL http://cran.r-project.org/web/packages/binda/. R scripts reproducing all described analyzes are available from the web page http://strimmerlab.org/software/binda/.

Contact: k.strimmer@imperial.ac.uk

Categories: Journal Articles

Sparse multi-view matrix factorization: a multivariate approach to multiple tissue comparisons

Mon, 09/21/2015 - 07:37

Motivation: Within any given tissue, gene expression levels can vary extensively among individuals. Such heterogeneity can be caused by genetic and epigenetic variability and may contribute to disease. The abundance of experimental data now enables the identification of features of gene expression profiles that are shared across tissues and those that are tissue-specific. While most current research is concerned with characterizing differential expression by comparing mean expression profiles across tissues, it is believed that a significant difference in a gene expression’s variance across tissues may also be associated with molecular mechanisms that are important for tissue development and function.

Results: We propose a sparse multi-view matrix factorization (sMVMF) algorithm to jointly analyse gene expression measurements in multiple tissues, where each tissue provides a different ‘view’ of the underlying organism. The proposed methodology can be interpreted as an extension of principal component analysis in that it provides the means to decompose the total sample variance in each tissue into the sum of two components: one capturing the variance that is shared across tissues and one isolating the tissue-specific variances. sMVMF has been used to jointly model mRNA expression profiles in three tissues obtained from a large and well-phenotyped twins cohort, TwinsUK. Using sMVMF, we are able to prioritize genes based on whether their variation patterns are specific to each tissue. Furthermore, using DNA methylation profiles available, we provide supporting evidence that adipose-specific gene expression patterns may be driven by epigenetic effects.

Availability and implementation: Python code is available at http://wwwf.imperial.ac.uk/~gmontana/.

Contact: giovanni.montana@kcl.ac.uk

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

CCLasso: correlation inference for compositional data through Lasso

Mon, 09/21/2015 - 07:37

Motivation: Direct analysis of microbial communities in the environment and human body has become more convenient and reliable owing to the advancements of high-throughput sequencing techniques for 16S rRNA gene profiling. Inferring the correlation relationship among members of microbial communities is of fundamental importance for genomic survey study. Traditional Pearson correlation analysis treating the observed data as absolute abundances of the microbes may lead to spurious results because the data only represent relative abundances. Special care and appropriate methods are required prior to correlation analysis for these compositional data.

Results: In this article, we first discuss the correlation definition of latent variables for compositional data. We then propose a novel method called CCLasso based on least squares with 1 penalty to infer the correlation network for latent variables of compositional data from metagenomic data. An effective alternating direction algorithm from augmented Lagrangian method is used to solve the optimization problem. The simulation results show that CCLasso outperforms existing methods, e.g. SparCC, in edge recovery for compositional data. It also compares well with SparCC in estimating correlation network of microbe species from the Human Microbiome Project.

Availability and implementation: CCLasso is open source and freely available from https://github.com/huayingfang/CCLasso under GNU LGPL v3.

Contact: dengmh@pku.edu.cn

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

The pervasiveness and plasticity of circadian oscillations: the coupled circadian-oscillators framework

Mon, 09/21/2015 - 07:37

Motivation: Circadian oscillations have been observed in animals, plants, fungi and cyanobacteria and play a fundamental role in coordinating the homeostasis and behavior of biological systems. Genetically encoded molecular clocks found in nearly every cell, based on negative transcription/translation feedback loops and involving only a dozen genes, play a central role in maintaining these oscillations. However, high-throughput gene expression experiments reveal that in a typical tissue, a much larger fraction (~10%) of all transcripts oscillate with the day–night cycle and the oscillating species vary with tissue type suggesting that perhaps a much larger fraction of all transcripts, and perhaps also other molecular species, may bear the potential for circadian oscillations.

Results: To better quantify the pervasiveness and plasticity of circadian oscillations, we conduct the first large-scale analysis aggregating the results of 18 circadian transcriptomic studies and 10 circadian metabolomic studies conducted in mice using different tissues and under different conditions. We find that over half of protein coding genes in the cell can produce transcripts that are circadian in at least one set of conditions and similarly for measured metabolites. Genetic or environmental perturbations can disrupt existing oscillations by changing their amplitudes and phases, suppressing them or giving rise to novel circadian oscillations. The oscillating species and their oscillations provide a characteristic signature of the physiological state of the corresponding cell/tissue. Molecular networks comprise many oscillator loops that have been sculpted by evolution over two trillion day–night cycles to have intrinsic circadian frequency. These oscillating loops are coupled by shared nodes in a large network of coupled circadian oscillators where the clock genes form a major hub. Cells can program and re-program their circadian repertoire through epigenetic and other mechanisms.

Availability and implementation: High-resolution and tissue/condition specific circadian data and networks available at http://circadiomics.igb.uci.edu.

Contact: pfbaldi@ics.uci.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

Automated profiling of individual cell-cell interactions from high-throughput time-lapse imaging microscopy in nanowell grids (TIMING)

Mon, 09/21/2015 - 07:37

Motivation: There is a need for effective automated methods for profiling dynamic cell–cell interactions with single-cell resolution from high-throughput time-lapse imaging data, especially, the interactions between immune effector cells and tumor cells in adoptive immunotherapy.

Results: Fluorescently labeled human T cells, natural killer cells (NK), and various target cells (NALM6, K562, EL4) were co-incubated on polydimethylsiloxane arrays of sub-nanoliter wells (nanowells), and imaged using multi-channel time-lapse microscopy. The proposed cell segmentation and tracking algorithms account for cell variability and exploit the nanowell confinement property to increase the yield of correctly analyzed nanowells from 45% (existing algorithms) to 98% for wells containing one effector and a single target, enabling automated quantification of cell locations, morphologies, movements, interactions, and deaths without the need for manual proofreading. Automated analysis of recordings from 12 different experiments demonstrated automated nanowell delineation accuracy >99%, automated cell segmentation accuracy >95%, and automated cell tracking accuracy of 90%, with default parameters, despite variations in illumination, staining, imaging noise, cell morphology, and cell clustering. An example analysis revealed that NK cells efficiently discriminate between live and dead targets by altering the duration of conjugation. The data also demonstrated that cytotoxic cells display higher motility than non-killers, both before and during contact.

Contact: broysam@central.uh.edu or nvaradar@central.uh.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

EXIMS: an improved data analysis pipeline based on a new peak picking method for EXploring Imaging Mass Spectrometry data

Mon, 09/21/2015 - 07:37

Motivation: Matrix Assisted Laser Desorption Ionization-Imaging Mass Spectrometry (MALDI-IMS) in ‘omics’ data acquisition generates detailed information about the spatial distribution of molecules in a given biological sample. Various data processing methods have been developed for exploring the resultant high volume data. However, most of these methods process data in the spectral domain and do not make the most of the important spatial information available through this technology. Therefore, we propose a novel streamlined data analysis pipeline specifically developed for MALDI-IMS data utilizing significant spatial information for identifying hidden significant molecular distribution patterns in these complex datasets.

Methods: The proposed unsupervised algorithm uses Sliding Window Normalization (SWN) and a new spatial distribution based peak picking method developed based on Gray level Co-Occurrence (GCO) matrices followed by clustering of biomolecules. We also use gist descriptors and an improved version of GCO matrices to extract features from molecular images and minimum medoid distance to automatically estimate the number of possible groups.

Results: We evaluated our algorithm using a new MALDI-IMS metabolomics dataset of a plant (Eucalypt) leaf. The algorithm revealed hidden significant molecular distribution patterns in the dataset, which the current Component Analysis and Segmentation Map based approaches failed to extract. We further demonstrate the performance of our peak picking method over other traditional approaches by using a publicly available MALDI-IMS proteomics dataset of a rat brain. Although SWN did not show any significant improvement as compared with using no normalization, the visual assessment showed an improvement as compared to using the median normalization.

Availability and implementation: The source code and sample data are freely available at http://exims.sourceforge.net/.

Contact: awgcdw@student.unimelb.edu.au or chalini_w@live.com

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

FinisherSC: a repeat-aware tool for upgrading de novo assembly using long reads

Mon, 09/21/2015 - 07:37

We introduce FinisherSC, a repeat-aware and scalable tool for upgrading de novo assembly using long reads. Experiments with real data suggest that FinisherSC can provide longer and higher quality contigs than existing tools while maintaining high concordance.

Availability and implementation: The tool and data are available and will be maintained at http://kakitone.github.io/finishingTool/

Contact: dntse@stanford.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs

Mon, 09/21/2015 - 07:37

Motivation: Genomics has revolutionized biological research, but quality assessment of the resulting assembled sequences is complicated and remains mostly limited to technical measures like N50.

Results: We propose a measure for quantitative assessment of genome assembly and annotation completeness based on evolutionarily informed expectations of gene content. We implemented the assessment procedure in open-source software, with sets of Benchmarking Universal Single-Copy Orthologs, named BUSCO.

Availability and implementation: Software implemented in Python and datasets available for download from http://busco.ezlab.org.

Contact: evgeny.zdobnov@unige.ch

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

IgSimulator: a versatile immunosequencing simulator

Mon, 09/21/2015 - 07:37

Motivation: The recent introduction of next-generation sequencing technologies to antibody studies have resulted in a growing number of immunoinformatics tools for antibody repertoire analysis. However, benchmarking these newly emerging tools remains problematic since the gold standard datasets that are needed to validate these tools are typically not available.

Results: Since simulating antibody repertoires is often the only feasible way to benchmark new immunoinformatics tools, we developed the IgSimulator tool that addresses various complications in generating realistic antibody repertoires. IgSimulator’s code has modular structure and can be easily adapted to new requirements to simulation.

Availability and implementation: IgSimulator is open source and freely available as a C++ and Python program running on all Unix-compatible platforms. The source code is available from yana-safonova.github.io/ig_simulator.

Contact: safonova.yana@gmail.com

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

ACE: accurate correction of errors using K-mer tries

Mon, 09/21/2015 - 07:37

Summary: The quality of high-throughput next-generation sequencing data significantly influences the performance and memory consumption of assembly and mapping algorithms. The most ubiquitous platform, Illumina, mainly suffers from substitution errors. We have developed a tool, ACE, based on K-mer tries to correct such errors. On real MiSeq and HiSeq Illumina archives, ACE yields higher gains in terms of coverage depth, outperforming state-of-the-art competitors in the majority of cases.

Availability and implementation: ACE is licensed under the GPL license and can be freely obtained at https://github.com/sheikhizadeh/ACE/. The program is implemented in C++ and runs on most Unix-derived operating systems.

Contact: siavash.sheikhizadehanari@wur.nl

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

GeIST: a pipeline for mapping integrated DNA elements

Mon, 09/21/2015 - 07:37

Summary: There are several experimental contexts in which it is important to identify DNA integration sites, such as insertional mutagenesis screens, gene and enhancer trap applications, and gene therapy. We previously developed an assay to identify millions of integrations in multiplexed barcoded samples at base-pair resolution. The sheer amount of data produced by this approach makes the mapping of individual sites non-trivial without bioinformatics support. This article presents the Genomic Integration Site Tracker (GeIST), a command-line pipeline designed to map the integration sites produced by this assay and identify the samples from which they came. GeIST version 2.1.0, a more adaptable version of our original pipeline, can identify integrations of murine leukemia virus, adeno-associated virus, Tol2 transposons or Ac/Ds transposons, and can be adapted for other inserted elements. It has been tested on experimental data for each of these delivery vectors and fine-tuned to account for sequencing and cloning artifacts.

Availability and implementation: GeIST uses a combination of Bash shell scripting and Perl. GeIST is available at http://research.nhgri.nih.gov/software/GeIST/.

Contact: burgess@mail.nih.gov

Categories: Journal Articles

DisVis: quantifying and visualizing accessible interaction space of distance-restrained biomolecular complexes

Mon, 09/21/2015 - 07:37

Summary: We present DisVis, a Python package and command line tool to calculate the reduced accessible interaction space of distance-restrained binary protein complexes, allowing for direct visualization and quantification of the information content of the distance restraints. The approach is general and can also be used as a knowledge-based distance energy term in FFT-based docking directly during the sampling stage.

Availability and implementation: The source code with documentation is freely available from https://github.com/haddocking/disvis.

Contact: a.m.j.j.bonvin@uu.nl

Supplementary information: Supplementary data are available at Bioinformatics online.

Welcome to the Shehu Laboratory

Bioinformatics Journal

Mango: a bias-correcting ChIA-PET analysis pipeline

DISTMIX: direct imputation of summary statistics for unmeasured SNPs from mixed ethnicity cohorts

SoloDel: a probabilistic model for detecting low-frequent somatic deletions from unmatched sequencing data

antaRNA: ant colony-based RNA sequence design

QVZ: lossy compression of quality values

A parallel and sensitive software tool for methylation analysis on multicore platforms

UniAlign: protein structure alignment meets evolution

xHeinz: an algorithm for mining cross-species network modules under a flexible conservation model

Differential protein expression and peak selection in mass spectrometry data by binary discriminant analysis

Sparse multi-view matrix factorization: a multivariate approach to multiple tissue comparisons

CCLasso: correlation inference for compositional data through Lasso

The pervasiveness and plasticity of circadian oscillations: the coupled circadian-oscillators framework

Automated profiling of individual cell-cell interactions from high-throughput time-lapse imaging microscopy in nanowell grids (TIMING)

EXIMS: an improved data analysis pipeline based on a new peak picking method for EXploring Imaging Mass Spectrometry data

FinisherSC: a repeat-aware tool for upgrading de novo assembly using long reads

BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs

IgSimulator: a versatile immunosequencing simulator

ACE: accurate correction of errors using K-mer tries

GeIST: a pipeline for mapping integrated DNA elements

DisVis: quantifying and visualizing accessible interaction space of distance-restrained biomolecular complexes

Nature

Proceedings of the Natural Academy of Sciences

PLoS Computational Biology

Algorithmica

Proteins: Structure, Function, Bioinformatics

Protein Science

Journal of Molecular Biology

Biophysical Journal

Journal of American Chemical Society

Journal of Structural Biology

BMC Structural Biology

BMC Bioinformatics

Bioinformatics Journal

Nucleic Acids Research

Science