Bioinformatics Journal

Bioinformatics - RSS feed of current issue

URL: http://bioinformatics.oxfordjournals.org

Updated: 8 years 21 weeks ago

Application of learning to rank to protein remote homology detection

Tue, 10/20/2015 - 09:50

Motivation: Protein remote homology detection is one of the fundamental problems in computational biology, aiming to find protein sequences in a database of known structures that are evolutionarily related to a given query protein. Some computational methods treat this problem as a ranking problem and achieve the state-of-the-art performance, such as PSI-BLAST, HHblits and ProtEmbed. This raises the possibility to combine these methods to improve the predictive performance. In this regard, we are to propose a new computational method called ProtDec-LTR for protein remote homology detection, which is able to combine various ranking methods in a supervised manner via using the Learning to Rank (LTR) algorithm derived from natural language processing.

Results: Experimental results on a widely used benchmark dataset showed that ProtDec-LTR can achieve an ROC1 score of 0.8442 and an ROC50 score of 0.9023 outperforming all the individual predictors and some state-of-the-art methods. These results indicate that it is correct to treat protein remote homology detection as a ranking problem, and predictive performance improvement can be achieved by combining different ranking approaches in a supervised manner via using LTR.

Availability and implementation: For users’ convenience, the software tools of three basic ranking predictors and Learning to Rank algorithm were provided at http://bioinformatics.hitsz.edu.cn/ProtDec-LTR/home/

Contact: bliu@insun.hit.edu.cn

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

GDFuzz3D: a method for protein 3D structure reconstruction from contact maps, based on a non-Euclidean distance function

Tue, 10/20/2015 - 09:50

Motivation: To date, only a few distinct successful approaches have been introduced to reconstruct a protein 3D structure from a map of contacts between its amino acid residues (a 2D contact map). Current algorithms can infer structures from information-rich contact maps that contain a limited fraction of erroneous predictions. However, it is difficult to reconstruct 3D structures from predicted contact maps that usually contain a high fraction of false contacts.

Results: We describe a new, multi-step protocol that predicts protein 3D structures from the predicted contact maps. The method is based on a novel distance function acting on a fuzzy residue proximity graph, which predicts a 2D distance map from a 2D predicted contact map. The application of a Multi-Dimensional Scaling algorithm transforms that predicted 2D distance map into a coarse 3D model, which is further refined by typical modeling programs into an all-atom representation. We tested our approach on contact maps predicted de novo by MULTICOM, the top contact map predictor according to CASP10. We show that our method outperforms FT-COMAR, the state-of-the-art method for 3D structure reconstruction from 2D maps. For all predicted 2D contact maps of relatively low sensitivity (60–84%), GDFuzz3D generates more accurate 3D models, with the average improvement of 4.87 Å in terms of RMSD.

Availability and implementation: GDFuzz3D server and standalone version are freely available at http://iimcb.genesilico.pl/gdserver/GDFuzz3D/.

Contact: iamb@genesilico.pl

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

Protein contact prediction by integrating joint evolutionary coupling analysis and supervised learning

Tue, 10/20/2015 - 09:50

Motivation: Protein contact prediction is important for protein structure and functional study. Both evolutionary coupling (EC) analysis and supervised machine learning methods have been developed, making use of different information sources. However, contact prediction is still challenging especially for proteins without a large number of sequence homologs.

Results: This article presents a group graphical lasso (GGL) method for contact prediction that integrates joint multi-family EC analysis and supervised learning to improve accuracy on proteins without many sequence homologs. Different from existing single-family EC analysis that uses residue coevolution information in only the target protein family, our joint EC analysis uses residue coevolution in both the target family and its related families, which may have divergent sequences but similar folds. To implement this, we model a set of related protein families using Gaussian graphical models and then coestimate their parameters by maximum-likelihood, subject to the constraint that these parameters shall be similar to some degree. Our GGL method can also integrate supervised learning methods to further improve accuracy. Experiments show that our method outperforms existing methods on proteins without thousands of sequence homologs, and that our method performs better on both conserved and family-specific contacts.

Availability and implementation: See http://raptorx.uchicago.edu/ContactMap/ for a web server implementing the method.

Contact: j3xu@ttic.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

A multivariate Bernoulli model to predict DNaseI hypersensitivity status from haplotype data

Tue, 10/20/2015 - 09:50

Motivation: Haplotype models enjoy a wide range of applications in population inference and disease gene discovery. The hidden Markov models traditionally used for haplotypes are hindered by the dubious assumption that dependencies occur only between consecutive pairs of variants. In this article, we apply the multivariate Bernoulli (MVB) distribution to model haplotype data. The MVB distribution relies on interactions among all sets of variants, thus allowing for the detection and exploitation of long-range and higher-order interactions. We discuss penalized estimation and present an efficient algorithm for fitting sparse versions of the MVB distribution to haplotype data. Finally, we showcase the benefits of the MVB model in predicting DNaseI hypersensitivity (DH) status—an epigenetic mark describing chromatin accessibility—from population-scale haplotype data.

Results: We fit the MVB model to real data from 59 individuals on whom both haplotypes and DH status in lymphoblastoid cell lines are publicly available. The model allows prediction of DH status from genetic data (prediction R2=0.12 in cross-validations). Comparisons of prediction under the MVB model with prediction under linear regression (best linear unbiased prediction) and logistic regression demonstrate that the MVB model achieves about 10% higher prediction R2 than the two competing methods in empirical data.

Availability and implementation: Software implementing the method described can be downloaded at http://bogdan.bioinformatics.ucla.edu/software/.

Contact: shihuwenbo@ucla.edu or pasaniuc@ucla.edu

Categories: Journal Articles

LayerCake: a tool for the visual comparison of viral deep sequencing data

Tue, 10/20/2015 - 09:50

Motivation: The advent of next-generation sequencing (NGS) has created unprecedented opportunities to examine viral populations within individual hosts, among infected individuals and over time. Comparing sequence variability across viral genomes allows for the construction of complex population structures, the analysis of which can yield powerful biological insights. However, the simultaneous display of sequence variation, coverage depth and quality scores across thousands of bases presents a unique visualization challenge that has not been fully met by current NGS analysis tools.

Results: Here, we present LayerCake, a self-contained visualization tool that allows for the rapid analysis of variation in viral NGS data. LayerCake enables the user to simultaneously visualize variations in multiple viral populations across entire genomes within a highly customizable framework, drawing attention to pertinent and interesting patterns of variation. We have successfully deployed LayerCake to assist with a variety of different genomics datasets.

Availability and implementation: Program downloads and detailed instructions are available at http://graphics.cs.wisc.edu/WP/layercake under a modified MIT license. LayerCake is a cross-platform tool written in the Processing framework for Java.

Contact: mcorrell@cs.wisc.edu

Categories: Journal Articles

Integrating full spectrum of sequence features into predicting functional microRNA-mRNA interactions

Tue, 10/20/2015 - 09:50

Motivation: MicroRNAs (miRNAs) play important roles in general biological processes and diseases pathogenesis. Identifying miRNA target genes is an essential step to fully understand the regulatory effects of miRNAs. Many computational methods based on the sequence complementary rules and the miRNA and mRNA expression profiles have been developed for this purpose. It is noted that there have been many sequence features of miRNA targets available, including the context features of the target sites, the thermodynamic stability and the accessibility energy for miRNA-mRNA interaction. However, most of current computational methods that combine sequence and expression information do not effectively integrate full spectrum of these features; instead, they perceive putative miRNA–mRNA interactions from sequence-based prediction as equally meaningful. Therefore, these sequence features have not been fully utilized for improving miRNA target prediction.

Results: We propose a novel regularized regression approach that is based on the adaptive Lasso procedure for detecting functional miRNA–mRNA interactions. Our method fully takes into account the gene sequence features and the miRNA and mRNA expression profiles. Given a set of sequence features for each putative miRNA–mRNA interaction and their expression values, our model quantifies the down-regulation effect of each miRNA on its targets while simultaneously estimating the contribution of each sequence feature to predicting functional miRNA–mRNA interactions. By applying our model to the expression datasets from two cancer studies, we have demonstrated our prediction results have achieved better sensitivity and specificity and are more biologically meaningful compared with those based on other methods.

Availability and implementation: The source code is available at: http://nba.uth.tmc.edu/homepage/liu/miRNALasso.

Supplementary information: Supplementary data are available at Bioinformatics online.

Contact: Yin.Liu@uth.tmc.edu

Categories: Journal Articles

C-It-Loci: a knowledge database for tissue-enriched loci

Tue, 10/20/2015 - 09:50

Motivation: Increasing evidences suggest that most of the genome is transcribed into RNAs, but many of them are not translated into proteins. All those RNAs that do not become proteins are called ‘non-coding RNAs (ncRNAs)’, which outnumbers protein-coding genes. Interestingly, these ncRNAs are shown to be more tissue specifically expressed than protein-coding genes. Given that tissue-specific expressions of transcripts suggest their importance in the expressed tissue, researchers are conducting biological experiments to elucidate the function of such ncRNAs. Owing greatly to the advancement of next-generation techniques, especially RNA-seq, the amount of high-throughput data are increasing rapidly. However, due to the complexity of the data as well as its high volume, it is not easy to re-analyze such data to extract tissue-specific expressions of ncRNAs from published datasets.

Results: Here, we introduce a new knowledge database called ‘C-It-Loci’, which allows a user to screen for tissue-specific transcripts across three organisms: human, mouse and zebrafish. C-It-Loci is intuitive and easy to use to identify not only protein-coding genes but also ncRNAs from various tissues. C-It-Loci defines homology through sequence and positional conservation to allow for the extraction of species-conserved loci. C-It-Loci can be used as a starting point for further biological experiments.

Availability and implementation: C-It-Loci is freely available online without registration at http://c-it-loci.uni-frankfurt.de.

Contact: uchida@med.uni-frankfurt.de

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

Nsite, NsiteH and NsiteM computer tools for studying transcription regulatory elements

Tue, 10/20/2015 - 09:50

Summary: Gene transcription is mostly conducted through interactions of various transcription factors and their binding sites on DNA (regulatory elements, REs). Today, we are still far from understanding the real regulatory content of promoter regions. Computer methods for identification of REs remain a widely used tool for studying and understanding transcriptional regulation mechanisms. The Nsite, NsiteH and NsiteM programs perform searches for statistically significant (non-random) motifs of known human, animal and plant one-box and composite REs in a single genomic sequence, in a pair of aligned homologous sequences and in a set of functionally related sequences, respectively.

Availability and implementation: Pre-compiled executables built under commonly used operating systems are available for download by visiting http://www.molquest.kaust.edu.sa and http://www.softberry.com.

Contact: solovictor@gmail.com

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

nextflu: real-time tracking of seasonal influenza virus evolution in humans

Tue, 10/20/2015 - 09:50

Summary: Seasonal influenza viruses evolve rapidly, allowing them to evade immunity in their human hosts and reinfect previously infected individuals. Similarly, vaccines against seasonal influenza need to be updated frequently to protect against an evolving virus population. We have thus developed a processing pipeline and browser-based visualization that allows convenient exploration and analysis of the most recent influenza virus sequence data. This web-application displays a phylogenetic tree that can be decorated with additional information such as the viral genotype at specific sites, sampling location and derived statistics that have been shown to be predictive of future virus dynamics. In addition, mutation, genotype and clade frequency trajectories are calculated and displayed.

Availability and implementation: Python and Javascript source code is freely available from https://github.com/blab/nextflu, while the web-application is live at http://nextflu.org.

Contact: tbedford@fredhutch.org

Categories: Journal Articles

al3c: high-performance software for parameter inference using Approximate Bayesian Computation

Tue, 10/20/2015 - 09:50

Motivation: The development of Approximate Bayesian Computation (ABC) algorithms for parameter inference which are both computationally efficient and scalable in parallel computing environments is an important area of research. Monte Carlo rejection sampling, a fundamental component of ABC algorithms, is trivial to distribute over multiple processors but is inherently inefficient. While development of algorithms such as ABC Sequential Monte Carlo (ABC-SMC) help address the inherent inefficiencies of rejection sampling, such approaches are not as easily scaled on multiple processors. As a result, current Bayesian inference software offerings that use ABC-SMC lack the ability to scale in parallel computing environments.

Results: We present al3c, a C++ framework for implementing ABC-SMC in parallel. By requiring only that users define essential functions such as the simulation model and prior distribution function, al3c abstracts the user from both the complexities of parallel programming and the details of the ABC-SMC algorithm. By using the al3c framework, the user is able to scale the ABC-SMC algorithm in parallel computing environments for his or her specific application, with minimal programming overhead.

Availability and implementation: al3c is offered as a static binary for Linux and OS-X computing environments. The user completes an XML configuration file and C++ plug-in template for the specific application, which are used by al3c to obtain the desired results. Users can download the static binaries, source code, reference documentation and examples (including those in this article) by visiting https://github.com/ahstram/al3c.

Contact: astram@usc.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

PSIKO2: a fast and versatile tool to infer population stratification on various levels in GWAS

Tue, 10/20/2015 - 09:50

Summary: Genome-wide association studies are an invaluable tool for identifying genotypic loci linked with agriculturally important traits or certain diseases. The signal on which such studies rely upon can, however, be obscured by population stratification making it necessary to account for it in some way. Population stratification is dependent on when admixture happened and thus can occur at various levels. To aid in its inference at the genome level, we recently introduced psiko, and comparison with leading methods indicates that it has attractive properties. However, until now, it could not be used for local ancestry inference which is preferable in cases of recent admixture as the genome level tends to be too coarse to properly account for processes acting on small segments of a genome. To also bring the powerful ideas underpinning psiko to bear in such studies, we extended it to psiko2, which we introduce here.

Availability and implementation: Source code, binaries and user manual are freely available at https://www.uea.ac.uk/computing/psiko.

Contact: Andrei-Alin.Popescu@uea.ac.uk or Katharina.Huber@cmp.uea.ac.uk

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

LDlink: a web-based application for exploring population-specific haplotype structure and linking correlated alleles of possible functional variants

Tue, 10/20/2015 - 09:50

Summary: Assessing linkage disequilibrium (LD) across ancestral populations is a powerful approach for investigating population-specific genetic structure as well as functionally mapping regions of disease susceptibility. Here, we present LDlink, a web-based collection of bioinformatic modules that query single nucleotide polymorphisms (SNPs) in population groups of interest to generate haplotype tables and interactive plots. Modules are designed with an emphasis on ease of use, query flexibility, and interactive visualization of results. Phase 3 haplotype data from the 1000 Genomes Project are referenced for calculating pairwise metrics of LD, searching for proxies in high LD, and enumerating all observed haplotypes. LDlink is tailored for investigators interested in mapping common and uncommon disease susceptibility loci by focusing on output linking correlated alleles and highlighting putative functional variants.

Availability and implementation: LDlink is a free and publically available web tool which can be accessed at http://analysistools.nci.nih.gov/LDlink/.

Contact: mitchell.machiela@nih.gov

Categories: Journal Articles

Data2Dynamics: a modeling environment tailored to parameter estimation in dynamical systems

Tue, 10/20/2015 - 09:50

Summary: Modeling of dynamical systems using ordinary differential equations is a popular approach in the field of systems biology. Two of the most critical steps in this approach are to construct dynamical models of biochemical reaction networks for large datasets and complex experimental conditions and to perform efficient and reliable parameter estimation for model fitting. We present a modeling environment for MATLAB that pioneers these challenges. The numerically expensive parts of the calculations such as the solving of the differential equations and of the associated sensitivity system are parallelized and automatically compiled into efficient C code. A variety of parameter estimation algorithms as well as frequentist and Bayesian methods for uncertainty analysis have been implemented and used on a range of applications that lead to publications.

Availability and implementation: The Data2Dynamics modeling environment is MATLAB based, open source and freely available at http://www.data2dynamics.org.

Contact: andreas.raue@fdm.uni-freiburg.de

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

Predicting tumor purity from methylation microarray data

Tue, 10/20/2015 - 09:50

Motivation: In cancer genomics research, one important problem is that the solid tissue sample obtained from clinical settings is always a mixture of cancer and normal cells. The sample mixture brings complication in data analysis and results in biased findings if not correctly accounted for. Estimating tumor purity is of great interest, and a number of methods have been developed using gene expression, copy number variation or point mutation data.

Results: We discover that in cancer samples, the distributions of data from Illumina Infinium 450 k methylation microarray are highly correlated with tumor purities. We develop a simple but effective method to estimate purities from the microarray data. Analyses of the Cancer Genome Atlas lung cancer data demonstrate favorable performance of the proposed method.

Availability and implementation: The method is implemented in InfiniumPurify, which is freely available at https://bitbucket.org/zhengxiaoqi/infiniumpurify.

Contact: xqzheng@shnu.edu.cn or hao.wu@emory.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

ISQuest: finding insertion sequences in prokaryotic sequence fragment data

Tue, 10/20/2015 - 09:50

Motivation: Insertion sequences (ISs) are transposable elements present in most bacterial and archaeal genomes that play an important role in genomic evolution. The increasing availability of sequenced prokaryotic genomes offers the opportunity to study ISs comprehensively, but development of efficient and accurate tools is required for discovery and annotation. Additionally, prokaryotic genomes are frequently deposited as incomplete, or draft stage because of the substantial cost and effort required to finish genome assembly projects. Development of methods to identify IS directly from raw sequence reads or draft genomes are therefore desirable. Software tools such as Optimized Annotation System for Insertion Sequences and IScan currently identify IS elements in completely assembled and annotated genomes; however, to our knowledge no methods have been developed to identify ISs from raw fragment data or partially assembled genomes. We have developed novel methods to solve this computationally challenging problem, and implemented these methods in the software package ISQuest. This software identifies bacterial ISs and their sequence elements—inverted and direct repeats—in raw read data or contigs using flexible search parameters. ISQuest is capable of finding ISs in hundreds of partially assembled genomes within hours, making it a valuable high-throughput tool for a global search of IS elements. We tested ISQuest on simulated read libraries of 3810 complete bacterial genomes and plasmids in GenBank and were capable of detecting 82% of the ISs and transposases annotated in GenBank with 80% sequence identity.

Contact: abiswas@cs.odu.edu

Categories: Journal Articles

DINGO: differential network analysis in genomics

Tue, 10/20/2015 - 09:50

Motivation: Cancer progression and development are initiated by aberrations in various molecular networks through coordinated changes across multiple genes and pathways. It is important to understand how these networks change under different stress conditions and/or patient-specific groups to infer differential patterns of activation and inhibition. Existing methods are limited to correlation networks that are independently estimated from separate group-specific data and without due consideration of relationships that are conserved across multiple groups.

Method: We propose a pathway-based differential network analysis in genomics (DINGO) model for estimating group-specific networks and making inference on the differential networks. DINGO jointly estimates the group-specific conditional dependencies by decomposing them into global and group-specific components. The delineation of these components allows for a more refined picture of the major driver and passenger events in the elucidation of cancer progression and development.

Results: Simulation studies demonstrate that DINGO provides more accurate group-specific conditional dependencies than achieved by using separate estimation approaches. We apply DINGO to key signaling pathways in glioblastoma to build differential networks for long-term survivors and short-term survivors in The Cancer Genome Atlas. The hub genes found by mRNA expression, DNA copy number, methylation and microRNA expression reveal several important roles in glioblastoma progression.

Availability and implementation: R Package at: odin.mdacc.tmc.edu/~vbaladan.

Contact: veera@mdanderson.org

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

Karect: accurate correction of substitution, insertion and deletion errors for next-generation sequencing data

Tue, 10/20/2015 - 09:50

Motivation: Next-generation sequencing generates large amounts of data affected by errors in the form of substitutions, insertions or deletions of bases. Error correction based on the high-coverage information, typically improves de novo assembly. Most existing tools can correct substitution errors only; some support insertions and deletions, but accuracy in many cases is low.

Results: We present Karect, a novel error correction technique based on multiple alignment. Our approach supports substitution, insertion and deletion errors. It can handle non-uniform coverage as well as moderately covered areas of the sequenced genome. Experiments with data from Illumina, 454 FLX and Ion Torrent sequencing machines demonstrate that Karect is more accurate than previous methods, both in terms of correcting individual-bases errors (up to 10% increase in accuracy gain) and post de novo assembly quality (up to 10% increase in NGA50). We also introduce an improved framework for evaluating the quality of error correction.

Availability and implementation: Karect is available at: http://aminallam.github.io/karect.

Contact: amin.allam@kaust.edu.sa

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

ProFET: Feature engineering captures high-level protein functions

Tue, 10/20/2015 - 09:50

Motivation: The amount of sequenced genomes and proteins is growing at an unprecedented pace. Unfortunately, manual curation and functional knowledge lag behind. Homologous inference often fails at labeling proteins with diverse functions and broad classes. Thus, identifying high-level protein functionality remains challenging. We hypothesize that a universal feature engineering approach can yield classification of high-level functions and unified properties when combined with machine learning approaches, without requiring external databases or alignment.

Results: In this study, we present a novel bioinformatics toolkit called ProFET (Protein Feature Engineering Toolkit). ProFET extracts hundreds of features covering the elementary biophysical and sequence derived attributes. Most features capture statistically informative patterns. In addition, different representations of sequences and the amino acids alphabet provide a compact, compressed set of features. The results from ProFET were incorporated in data analysis pipelines, implemented in python and adapted for multi-genome scale analysis. ProFET was applied on 17 established and novel protein benchmark datasets involving classification for a variety of binary and multi-class tasks. The results show state of the art performance. The extracted features’ show excellent biological interpretability. The success of ProFET applies to a wide range of high-level functions such as subcellular localization, structural classes and proteins with unique functional properties (e.g. neuropeptide precursors, thermophilic and nucleic acid binding). ProFET allows easy, universal discovery of new target proteins, as well as understanding the features underlying different high-level protein functions.

Availability and implementation: ProFET source code and the datasets used are freely available at https://github.com/ddofer/ProFET.

Contact: michall@cc.huji.ac.il

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

GC3-biased gene domains in mammalian genomes

Mon, 09/21/2015 - 07:37

Motivation: Synonymous codon usage bias has been shown to be correlated with many genomic features among different organisms. However, the biological significance of codon bias with respect to gene function and genome organization remains unclear.

Results: Guanine and cytosine content at the third codon position (GC3) could be used as a good indicator of codon bias. Here, we used relative GC3 bias values to compare the strength of GC3 bias of genes in human and mouse. We reported, for the first time, that GC3-rich and GC3-poor gene products might have distinct sub-cellular spatial distributions. Moreover, we extended the view of genomic gene domains and identified conserved GC3 biased gene domains along chromosomes. Our results indicated that similar GC3 biased genes might be co-translated in specific spatial regions to share local translational machineries, and that GC3 could be involved in the organization of genome architecture.

Availability and implementation: Source code is available upon request from the authors.

Contact: zhaozh@nic.bmi.ac.cn or zany1983@gmail.com

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

FourCSeq: analysis of 4C sequencing data

Mon, 09/21/2015 - 07:37

Motivation: Circularized Chromosome Conformation Capture (4C) is a powerful technique for studying the spatial interactions of a specific genomic region called the ‘viewpoint’ with the rest of the genome, both in a single condition or comparing different experimental conditions or cell types. Observed ligation frequencies typically show a strong, regular dependence on genomic distance from the viewpoint, on top of which specific interaction peaks are superimposed. Here, we address the computational task to find these specific peaks and to detect changes between different biological conditions.

Results: We model the overall trend of decreasing interaction frequency with genomic distance by fitting a smooth monotonically decreasing function to suitably transformed count data. Based on the fit, z-scores are calculated from the residuals, and high z-scores are interpreted as peaks providing evidence for specific interactions. To compare different conditions, we normalize fragment counts between samples, and call for differential contact frequencies using the statistical method DESeq2 adapted from RNA-Seq analysis.

Availability and implementation: A full end-to-end analysis pipeline is implemented in the R package FourCSeq available at www.bioconductor.org.

Contact: felix.klein@embl.de or whuber@embl.de

Supplementary information: Supplementary data are available at Bioinformatics online.

Welcome to the Shehu Laboratory

Bioinformatics Journal

Application of learning to rank to protein remote homology detection

GDFuzz3D: a method for protein 3D structure reconstruction from contact maps, based on a non-Euclidean distance function

Protein contact prediction by integrating joint evolutionary coupling analysis and supervised learning

A multivariate Bernoulli model to predict DNaseI hypersensitivity status from haplotype data

LayerCake: a tool for the visual comparison of viral deep sequencing data

Integrating full spectrum of sequence features into predicting functional microRNA-mRNA interactions

C-It-Loci: a knowledge database for tissue-enriched loci

Nsite, NsiteH and NsiteM computer tools for studying transcription regulatory elements

nextflu: real-time tracking of seasonal influenza virus evolution in humans

al3c: high-performance software for parameter inference using Approximate Bayesian Computation

PSIKO2: a fast and versatile tool to infer population stratification on various levels in GWAS

LDlink: a web-based application for exploring population-specific haplotype structure and linking correlated alleles of possible functional variants

Data2Dynamics: a modeling environment tailored to parameter estimation in dynamical systems

Predicting tumor purity from methylation microarray data

ISQuest: finding insertion sequences in prokaryotic sequence fragment data

DINGO: differential network analysis in genomics

Karect: accurate correction of substitution, insertion and deletion errors for next-generation sequencing data

ProFET: Feature engineering captures high-level protein functions

GC3-biased gene domains in mammalian genomes

FourCSeq: analysis of 4C sequencing data

Nature

Proceedings of the Natural Academy of Sciences

PLoS Computational Biology

Algorithmica

Proteins: Structure, Function, Bioinformatics

Protein Science

Journal of Molecular Biology

Biophysical Journal

Journal of American Chemical Society

Journal of Structural Biology

BMC Structural Biology

BMC Bioinformatics

Bioinformatics Journal

Nucleic Acids Research

Science