Bioinformatics Journal

Bioinformatics - RSS feed of current issue
  • A support vector machine for identification of single-nucleotide polymorphisms from next-generation sequencing data
    [May 2013]

    Motivation: Accurate determination of single-nucleotide polymorphisms (SNPs) from next-generation sequencing data is a significant challenge facing bioinformatics researchers. Most current methods use mechanistic models that assume nucleotides aligning to a given reference position are sampled from a binomial distribution. While such methods are sensitive, they are often unable to discriminate errors resulting from misaligned reads, sequencing errors or platform artifacts from true variants.

    Results: To enable more accurate SNP calling, we developed an algorithm that uses a trained support vector machine (SVM) to determine variants from .BAM or .SAM formatted alignments of sequence reads. Our SVM-based implementation determines SNPs with significantly greater sensitivity and specificity than alternative platforms, including the UnifiedGenotyper included with the Genome Analysis Toolkit, samtools and FreeBayes. In addition, the quality scores produced by our implementation more accurately reflect the likelihood that a variant is real when compared with those produced by the Genome Analysis Toolkit. While results depend on the model used, the implementation includes tools to easily build new models and refine existing models with additional training data.

    Availability: Source code and executables are available from github.com/brendanofallon/SNPSVM/

    Contact: brendan.d.ofallon@aruplab.com or david.crockett@aruplab.com

    Categories: Journal Articles
  • Fast simulation of reconstructed phylogenies under global time-dependent birth-death processes
    [May 2013]

    Motivation: Diversification rates and patterns may be inferred from reconstructed phylogenies. Both the time-dependent and the diversity-dependent birthdeath process can produce the same observed patterns of diversity over time. To develop and test new models describing the macro-evolutionary process of diversification, generic and fast algorithms to simulate under these models are necessary. Simulations are not only important for testing and developing models but play an influential role in the assessment of model fit.

    Results: In the present article, I consider as the model a global time-dependent birthdeath process where each species has the same rates but rates may vary over time. For this model, I derive the likelihood of the speciation times from a reconstructed phylogenetic tree and show that each speciation event is independent and identically distributed. This fact can be used to simulate efficiently reconstructed phylogenetic trees when conditioning on the number of species, the time of the process or both. I show the usability of the simulation by approximating the posterior predictive distribution of a birthdeath process with decreasing diversification rates applied on a published bird phylogeny (family Cettiidae).

    Availability: The methods described in this manuscript are implemented in the R package TESS, available from the repository CRAN (http://cran.r-project.org/web/packages/TESS/).

    Contact: hoehna@math.su.se

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • A novel web server predicts amino acid residue protection against hydrogen-deuterium exchange
    [May 2013]

    Motivation: To clarify the relationship between structural elements and polypeptide chain mobility, a set of statistical analyses of structures is necessary. Because at present proteins with determined spatial structures are much less numerous than those with amino acid sequence known, it is important to be able to predict the extent of proton protection from hydrogen–deuterium (HD) exchange basing solely on the protein primary structure.

    Results: Here we present a novel web server aimed to predict the degree of amino acid residue protection against HD exchange solely from the primary structure of the protein chain under study. On the basis of the amino acid sequence, the presented server offers the following three possibilities (predictors) for user’s choice. First, prediction of the number of contacts occurring in this protein, which is shown to be helpful in estimating the number of protons protected against HD exchange (sensitivity 0.71). Second, probability of H-bonding in this protein, which is useful for finding the number of unprotected protons (specificity 0.71). The last is the use of an artificial predictor. Also, we report on mass spectrometry analysis of HD exchange that has been first applied to free amino acids. Its results showed a good agreement with theoretical data (number of protons) for 10 globular proteins (correlation coefficient 0.73). We pioneered in compiling two datasets of experimental HD exchange data for 35 proteins.

    Availability: The H-Protection server is available for users at http://bioinfo.protres.ru/ogp/

    Contact: ogalzit@vega.protres.ru

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • Detecting regulatory gene-environment interactions with unmeasured environmental factors
    [May 2013]

    Motivation: Genomic studies have revealed a substantial heritable component of the transcriptional state of the cell. To fully understand the genetic regulation of gene expression variability, it is important to study the effect of genotype in the context of external factors such as alternative environmental conditions. In model systems, explicit environmental perturbations have been considered for this purpose, allowing to directly test for environment-specific genetic effects. However, such experiments are limited to species that can be profiled in controlled environments, hampering their use in important systems such as human. Moreover, even in seemingly tightly regulated experimental conditions, subtle environmental perturbations cannot be ruled out, and hence unknown environmental influences are frequent. Here, we propose a model-based approach to simultaneously infer unmeasured environmental factors from gene expression profiles and use them in genetic analyses, identifying environment-specific associations between polymorphic loci and individual gene expression traits.

    Results: In extensive simulation studies, we show that our method is able to accurately reconstruct environmental factors and their interactions with genotype in a variety of settings. We further illustrate the use of our model in a real-world dataset in which one environmental factor has been explicitly experimentally controlled. Our method is able to accurately reconstruct the true underlying environmental factor even if it is not given as an input, allowing to detect genuine genotype–environment interactions. In addition to the known environmental factor, we find unmeasured factors involved in novel genotype–environment interactions. Our results suggest that interactions with both known and unknown environmental factors significantly contribute to gene expression variability.

    Availability: and implementation: Software available at http://pmbio.github.io/envGPLVM/.

    Contact: oliver.stegle@ebi.ac.uk or nicolo.fusi@sheffield.ac.uk

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • Bayesian hierarchical model of protein-binding microarray k-mer data reduces noise and identifies transcription factor subclasses and preferred k-mers
    [May 2013]

    Motivation: Sequence-specific transcription factors (TFs) regulate the expression of their target genes through interactions with specific DNA-binding sites in the genome. Data on TF-DNA binding specificities are essential for understanding how regulatory specificity is achieved.

    Results: Numerous studies have used universal protein-binding microarray (PBM) technology to determine the in vitro binding specificities of hundreds of TFs for all possible 8 bp sequences (8mers). We have developed a Bayesian analysis of variance (ANOVA) model that decomposes these 8mer data into background noise, TF familywise effects and effects due to the particular TF. Adjusting for background noise improves PBM data quality and concordance with in vivo TF binding data. Moreover, our model provides simultaneous identification of TF subclasses and their shared sequence preferences, and also of 8mers bound preferentially by individual members of TF subclasses. Such results may aid in deciphering cis-regulatory codes and determinants of protein–DNA binding specificity.

    Availability and implementation: Source code, compiled code and R and Python scripts are available from http://thebrain.bwh.harvard.edu/hierarchicalANOVA.

    Contact: bojiang83@gmail.com or mlbulyk@receptor.med.harvard.edu

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • Improved ancestry inference using weights from external reference panels
    [May 2013]

    Motivation: Inference of ancestry using genetic data is motivated by applications in genetic association studies, population genetics and personal genomics. Here, we provide methods and software for improved ancestry inference using genome-wide single nucleotide polymorphism (SNP) weights from external reference panels. This approach makes it possible to leverage the rich ancestry information that is available from large external reference panels, without the administrative and computational complexities of re-analyzing the raw genotype data from the reference panel in subsequent studies.

    Results: We extensively validate our approach in multiple African American, Latino American and European American datasets, making use of genome-wide SNP weights derived from large reference panels, including HapMap 3 populations and 6546 European Americans from the Framingham Heart Study. We show empirically that our approach provides much greater accuracy than either the prevailing ancestry-informative marker (AIM) approach or the analysis of genome-wide target genotypes without a reference panel. For example, in an independent set of 1636 European American genome-wide association study samples, we attained prediction accuracy (R2) of 1.000 and 0.994 for the first two principal components using our method, compared with 0.418 and 0.407 using 150 published AIMs or 0.955 and 0.003 by applying principal component analysis directly to the target samples. We finally show that the higher accuracy in inferring ancestry using our method leads to more effective correction for population stratification in association studies.

    Availability: The SNPweights software is available online at http://www.hsph.harvard.edu/faculty/alkes-price/software/.

    Contact: aprice@hsph.harvard.edu or cychen@mail.harvard.edu.

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • Analysis of Latino populations from GALA and MEC studies reveals genomic loci with biased local ancestry estimation
    [May 2013]

    Motivation: Local ancestry analysis of genotype data from recently admixed populations (e.g. Latinos, African Americans) provides key insights into population history and disease genetics. Although methods for local ancestry inference have been extensively validated in simulations (under many unrealistic assumptions), no empirical study of local ancestry accuracy in Latinos exists to date. Hence, interpreting findings that rely on local ancestry in Latinos is challenging.

    Results: Here, we use 489 nuclear families from the mainland USA, Puerto Rico and Mexico in conjunction with 3204 unrelated Latinos from the Multiethnic Cohort study to provide the first empirical characterization of local ancestry inference accuracy in Latinos. Our approach for identifying errors does not rely on simulations but on the observation that local ancestry in families follows Mendelian inheritance. We measure the rate of local ancestry assignments that lead to Mendelian inconsistencies in local ancestry in trios (MILANC), which provides a lower bound on errors in the local ancestry estimates. We show that MILANC rates observed in simulations underestimate the rate observed in real data, and that MILANC varies substantially across the genome. Second, across a wide range of methods, we observe that loci with large deviations in local ancestry also show enrichment in MILANC rates. Therefore, local ancestry estimates at such loci should be interpreted with caution. Finally, we reconstruct ancestral haplotype panels to be used as reference panels in local ancestry inference and show that ancestry inference is significantly improved by incoroprating these reference panels.

    Availability and implementation: We provide the reconstructed reference panels together with the maps of MILANC rates as a public resource for researchers analyzing local ancestry in Latinos at http://bogdanlab.pathology.ucla.edu.

    Contact: bpasaniuc@mednet.ucla.edu

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • OKVAR-Boost: a novel boosting algorithm to infer nonlinear dynamics and interactions in gene regulatory networks
    [May 2013]

    Motivation: Reverse engineering of gene regulatory networks remains a central challenge in computational systems biology, despite recent advances facilitated by benchmark in silico challenges that have aided in calibrating their performance. A number of approaches using either perturbation (knock-out) or wild-type time-series data have appeared in the literature addressing this problem, with the latter using linear temporal models. Nonlinear dynamical models are particularly appropriate for this inference task, given the generation mechanism of the time-series data. In this study, we introduce a novel nonlinear autoregressive model based on operator-valued kernels that simultaneously learns the model parameters, as well as the network structure.

    Results: A flexible boosting algorithm (OKVAR-Boost) that shares features from L2-boosting and randomization-based algorithms is developed to perform the tasks of parameter learning and network inference for the proposed model. Specifically, at each boosting iteration, a regularized Operator-valued Kernel-based Vector AutoRegressive model (OKVAR) is trained on a random subnetwork. The final model consists of an ensemble of such models. The empirical estimation of the ensemble model’s Jacobian matrix provides an estimation of the network structure. The performance of the proposed algorithm is first evaluated on a number of benchmark datasets from the DREAM3 challenge and then on real datasets related to the In vivo Reverse-Engineering and Modeling Assessment (IRMA) and T-cell networks. The high-quality results obtained strongly indicate that it outperforms existing approaches.

    Availability: The OKVAR-Boost Matlab code is available as the archive: http://amis-group.fr/sourcecode-okvar-boost/OKVARBoost-v1.0.zip.

    Contact: florence.dalche@ibisc.univ-evry.fr

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • Measuring gene functional similarity based on group-wise comparison of GO terms
    [May 2013]

    Motivation: Compared with sequence and structure similarity, functional similarity is more informative for understanding the biological roles and functions of genes. Many important applications in computational molecular biology require functional similarity, such as gene clustering, protein function prediction, protein interaction evaluation and disease gene prioritization. Gene Ontology (GO) is now widely used as the basis for measuring gene functional similarity. Some existing methods combined semantic similarity scores of single term pairs to estimate gene functional similarity, whereas others compared terms in groups to measure it. However, these methods may make error-prone judgments about gene functional similarity. It remains a challenge that measuring gene functional similarity reliably.

    Result: We propose a novel method called SORA to measure gene functional similarity in GO context. First of all, SORA computes the information content (IC) of a term making use of semantic specificity and coverage. Second, SORA measures the IC of a term set by means of combining inherited and extended IC of the terms based on the structure of GO. Finally, SORA estimates gene functional similarity using the IC overlap ratio of term sets. SORA is evaluated against five state-of-the-art methods in the file on the public platform for collaborative evaluation of GO-based semantic similarity measure. The carefully comparisons show SORA is superior to other methods in general. Further analysis suggests that it primarily benefits from the structure of GO, which implies expressive information about gene function. SORA offers an effective and reliable way to compare gene function.

    Availability: The web service of SORA is freely available at http://nclab.hit.edu.cn/SORA/.

    Contact: maozuguo@hit.edu.cn

    Categories: Journal Articles
  • tmVar: a text mining approach for extracting sequence variants in biomedical literature
    [May 2013]

    Motivation: Text-mining mutation information from the literature becomes a critical part of the bioinformatics approach for the analysis and interpretation of sequence variations in complex diseases in the post-genomic era. It has also been used for assisting the creation of disease-related mutation databases. Most of existing approaches are rule-based and focus on limited types of sequence variations, such as protein point mutations. Thus, extending their extraction scope requires significant manual efforts in examining new instances and developing corresponding rules. As such, new automatic approaches are greatly needed for extracting different kinds of mutations with high accuracy.

    Results: Here, we report tmVar, a text-mining approach based on conditional random field (CRF) for extracting a wide range of sequence variants described at protein, DNA and RNA levels according to a standard nomenclature developed by the Human Genome Variation Society. By doing so, we cover several important types of mutations that were not considered in past studies. Using a novel CRF label model and feature set, our method achieves higher performance than a state-of-the-art method on both our corpus (91.4 versus 78.1% in F-measure) and their own gold standard (93.9 versus 89.4% in F-measure). These results suggest that tmVar is a high-performance method for mutation extraction from biomedical literature.

    Availability: tmVar software and its corpus of 500 manually curated abstracts are available for download at http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/pub/tmVar.

    Contact: zhiyong.lu@nih.gov

    Categories: Journal Articles
  • Active learning-based information structure analysis of full scientific articles and two applications for biomedical literature review
    [May 2013]

    Motivation: Techniques that are capable of automatically analyzing the information structure of scientific articles could be highly useful for improving information access to biomedical literature. However, most existing approaches rely on supervised machine learning (ML) and substantial labeled data that are expensive to develop and apply to different sub-fields of biomedicine. Recent research shows that minimal supervision is sufficient for fairly accurate information structure analysis of biomedical abstracts. However, is it realistic for full articles given their high linguistic and informational complexity? We introduce and release a novel corpus of 50 biomedical articles annotated according to the Argumentative Zoning (AZ) scheme, and investigate active learning with one of the most widely used ML models—Support Vector Machines (SVM)—on this corpus. Additionally, we introduce two novel applications that use AZ to support real-life literature review in biomedicine via question answering and summarization.

    Results: We show that active learning with SVM trained on 500 labeled sentences (6% of the corpus) performs surprisingly well with the accuracy of 82%, just 2% lower than fully supervised learning. In our question answering task, biomedical researchers find relevant information significantly faster from AZ-annotated than unannotated articles. In the summarization task, sentences extracted from particular zones are significantly more similar to gold standard summaries than those extracted from particular sections of full articles. These results demonstrate that active learning of full articles’ information structure is indeed realistic and the accuracy is high enough to support real-life literature review in biomedicine.

    Availability: The annotated corpus, our AZ classifier and the two novel applications are available at http://www.cl.cam.ac.uk/~yg244/12bioinfo.html.

    Contact: yg244@cam.ac.uk

    Categories: Journal Articles
  • APP2: automatic tracing of 3D neuron morphology based on hierarchical pruning of a gray-weighted image distance-tree
    [May 2013]

    Motivation: Tracing of neuron morphology is an essential technique in computational neuroscience. However, despite a number of existing methods, few open-source techniques are completely or sufficiently automated and at the same time are able to generate robust results for real 3D microscopy images.

    Results: We developed all-path-pruning 2.0 (APP2) for 3D neuron tracing. The most important idea is to prune an initial reconstruction tree of a neuron’s morphology using a long-segment-first hierarchical procedure instead of the original termini-first-search process in APP. To further enhance the robustness of APP2, we compute the distance transform of all image voxels directly for a gray-scale image, without the need to binarize the image before invoking the conventional distance transform. We also design a fast-marching algorithm-based method to compute the initial reconstruction trees without pre-computing a large graph. This method allows us to trace large images. We bench-tested APP2 on ~700 3D microscopic images and found that APP2 can generate more satisfactory results in most cases than several previous methods.

    Availability: The software has been implemented as an open-source Vaa3D plugin. The source code is available in the Vaa3D code repository http://vaa3d.org.

    Contact: hanchuanp@alleninstitute.org

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • ELOPER: elongation of paired-end reads as a pre-processing tool for improved de novo genome assembly
    [May 2013]

    Motivation: Paired-end sequencing resulting in gapped short reads is commonly used for de novo genome assembly. Assembly methods use paired-end sequences in a two-step process, first treating each read-end independently, only later invoking the pairing to join the contiguous assemblies (contigs) into gapped scaffolds. Here, we present ELOPER, a pre-processing tool for pair-end sequences that produces a better read library for assembly programs.

    Results: ELOPER proceeds by simultaneously considering both ends of paired reads generating elongated reads. We show that ELOPER theoretically doubles read-lengths while halving the number of reads. We provide evidence that pre-processing read libraries using ELOPER leads to considerably improved assemblies as predicted from the Lander–Waterman model.

    Availability: http://sourceforge.net/projects/eloper.

    Contact: yanai@technion.ac.il

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • MCScanX-transposed: detecting transposed gene duplications based on multiple colinearity scans
    [May 2013]

    Summary: Gene duplication occurs via different modes such as segmental and single-gene duplications. Transposed gene duplication, a specific form of single-gene duplication, ‘copies’ a gene from an ancestral chromosomal location to a novel location. MCScanX is a toolkit for detection and evolutionary analysis of gene colinearity. We have developed MCScanX-transposed, a software package to detect transposed gene duplications that occurred within different epochs, based on execution of MCScanX within and between related genomes. MCScanX-transposed can be also used for integrative analysis of gene duplication modes for a genome and to annotate a gene family of interest with gene duplication modes.

    Availability: MCScanX-transposed is freely available at http://chibba.pgml.uga.edu/mcscan2/transposed/.

    Contact: wyp1125@gmail.com or paterson@plantbio.uga.edu

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • FishingCNV: a graphical software package for detecting rare copy number variations in exome-sequencing data
    [May 2013]

    Summary: Rare copy number variations (CNVs) are frequent causes of genetic diseases. We developed a graphical software package based on a novel approach that can consistently identify CNVs of all types (homozygous deletions, heterozygous deletions, heterozygous duplications) from exome-sequencing data without the need of a paired control. The algorithm compares coverage depth in a test sample against a background distribution of control samples and uses principal component analysis to remove batch effects. It is user friendly and can be run on a personal computer.

    Availability and implementation: The main scripts are implemented in R (2.15), and the GUI is created using Java 1.6. It can be run on all major operating systems. A non-GUI version for pipeline implementation is also available. The program is freely available online: https://sourceforge.net/projects/fishingcnv/

    Contact: yuhao.shi@mail.mcgill.ca

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • A tool for RNA sequencing sample identity check
    [May 2013]

    Summary: RNA sequencing data are becoming a major method of choice to study transcriptomes, including the mapping of gene expression quantitative trait loci (eQTLs). RNA sample contamination or swapping is a serious problem for downstream analysis and may result in false discovery and lose power to detect the true biological relationships. When genetic data are available, for example, in eQTL studies or samples have been previously genotyped or DNA sequenced, it is possible to combine genetic data and RNA-seq data to detect sample contamination and resolve sample swapping problems. In this article, we introduce a tool (IDCheck) that allows easy assessment of concordance between genotype (from SNP arrays or DNA sequencing) and gene expression (RNA-seq) samples. IDCheck compares the identity of RNA-seq reads and SNP genotypes using a likelihood-based method. Based on maximum likelihood estimates of relevant parameters, we can detect sample contamination and identify correct sample pairs when swapping occurs. Our tool provides an efficient and convenient way to evaluate and resolve these problems.

    Availability: A complete description of the software is included on the application home page. The software is freely available in the public domain at http://eqtl.rc.fas.harvard.edu/idcheck/.

    Contact: lliang@hsph.harvard.edu

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Categories: Journal Articles
  • PathVisio-Faceted Search: an exploration tool for multi-dimensional navigation of large pathways
    [May 2013]

    Purpose: The PathVisio-Faceted Search plugin helps users explore and understand complex pathways by overlaying experimental data and data from webservices, such as Ensembl BioMart, onto diagrams drawn using formalized notations in PathVisio. The plugin then provides a filtering mechanism, known as a faceted search, to find and highlight diagram nodes (e.g. genes and proteins) of interest based on imported data. The tool additionally provides a flexible scripting mechanism to handle complex queries.

    Availability: The PathVisio-Faceted Search plugin is compatible with PathVisio 3.0 and above. PathVisio is compatible with Windows, Mac OS X and Linux. The plugin, documentation, example diagrams and Groovy scripts are available at http://PathVisio.org/wiki/PathVisioFacetedSearchHelp. The plugin is free, open-source and licensed by the Apache 2.0 License.

    Contact: augustin@mail.nih.gov or jakeyfried@gmail.com

    Categories: Journal Articles
  • Biographer: web-based editing and rendering of SBGN compliant biochemical networks
    [May 2013]

    Motivation: The rapid accumulation of knowledge in the field of Systems Biology during the past years requires advanced, but simple-to-use, methods for the visualization of information in a structured and easily comprehensible manner.

    Results: We have developed biographer, a web-based renderer and editor for reaction networks, which can be integrated as a library into tools dealing with network-related information. Our software enables visualizations based on the emerging standard Systems Biology Graphical Notation. It is able to import networks encoded in various formats such as SBML, SBGN-ML and jSBGN, a custom lightweight exchange format. The core package is implemented in HTML5, CSS and JavaScript and can be used within any kind of web-based project. It features interactive graph-editing tools and automatic graph layout algorithms. In addition, we provide a standalone graph editor and a web server, which contains enhanced features like web services for the import and export of models and visualizations in different formats.

    Availability: The biographer tool can be used at and downloaded from the web page http://biographer.biologie.hu-berlin.de/. The different software packages, including a server-indepenent version as well as a web server for Windows and Linux based systems, are available at http://code.google.com/p/biographer/ under the open-source license LGPL.

    Contact: edda.klipp@biologie.hu-berlin.de or handorf@physik.hu-berlin.de

    Categories: Journal Articles
  • MonaLisa--visualization and analysis of functional modules in biochemical networks
    [May 2013]

    Summary: Structural modeling of biochemical networks enables qualitative as well as quantitative analysis of those networks. Automated network decomposition into functional modules is a crucial point in network analysis. Although there exist approaches for the analysis of networks, there is no open source tool available that combines editing, visualization and the computation of steady-state functional modules. We introduce a new tool called MonaLisa, which combines computation and visualization of functional modules as well as an editor for biochemical Petri nets. The analysis techniques allow for network decomposition into functional modules, for example t-invariants (elementary modes), maximal common transition sets, minimal cut sets and t-clusters. The graphical user interface provides various functionalities to construct and modify networks as well as to visualize the results of the analysis.

    Availability and implementation: MonaLisa is licensed under the Artistic License 2.0. It is freely available at http://www.bioinformatik.uni-frankfurt.de/software.html. MonaLisa requires at least Java 6 and runs under Linux, Microsoft Windows and Mac OS.

    Contact: ina.koch@bioinformatik.uni-frankfurt.de

    Categories: Journal Articles
  • NetworkPrioritizer: a versatile tool for network-based prioritization of candidate disease genes or other molecules
    [May 2013]

    Summary: The prioritization of candidate disease genes is often based on integrated datasets and their network representation with genes as nodes connected by edges for biological relationships. However, the majority of prioritization methods does not allow for a straightforward integration of the user’s own input data. Therefore, we developed the Cytoscape plugin NetworkPrioritizer that particularly supports the integrative network-based prioritization of candidate disease genes or other molecules. Our versatile software tool computes a number of important centrality measures to rank nodes based on their relevance for network connectivity and provides different methods to aggregate and compare rankings.

    Availability: NetworkPrioritizer and the online documentation are freely available at http://www.networkprioritizer.de.

    Contact: mario.albrecht@mpi-inf.mpg.de

    Categories: Journal Articles