Journal Articles

Automated profiling of individual cell-cell interactions from high-throughput time-lapse imaging microscopy in nanowell grids (TIMING)

Bioinformatics Journal - Mon, 09/21/2015 - 07:37

Motivation: There is a need for effective automated methods for profiling dynamic cell–cell interactions with single-cell resolution from high-throughput time-lapse imaging data, especially, the interactions between immune effector cells and tumor cells in adoptive immunotherapy.

Results: Fluorescently labeled human T cells, natural killer cells (NK), and various target cells (NALM6, K562, EL4) were co-incubated on polydimethylsiloxane arrays of sub-nanoliter wells (nanowells), and imaged using multi-channel time-lapse microscopy. The proposed cell segmentation and tracking algorithms account for cell variability and exploit the nanowell confinement property to increase the yield of correctly analyzed nanowells from 45% (existing algorithms) to 98% for wells containing one effector and a single target, enabling automated quantification of cell locations, morphologies, movements, interactions, and deaths without the need for manual proofreading. Automated analysis of recordings from 12 different experiments demonstrated automated nanowell delineation accuracy >99%, automated cell segmentation accuracy >95%, and automated cell tracking accuracy of 90%, with default parameters, despite variations in illumination, staining, imaging noise, cell morphology, and cell clustering. An example analysis revealed that NK cells efficiently discriminate between live and dead targets by altering the duration of conjugation. The data also demonstrated that cytotoxic cells display higher motility than non-killers, both before and during contact.

Contact: broysam@central.uh.edu or nvaradar@central.uh.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

EXIMS: an improved data analysis pipeline based on a new peak picking method for EXploring Imaging Mass Spectrometry data

Bioinformatics Journal - Mon, 09/21/2015 - 07:37

Motivation: Matrix Assisted Laser Desorption Ionization-Imaging Mass Spectrometry (MALDI-IMS) in ‘omics’ data acquisition generates detailed information about the spatial distribution of molecules in a given biological sample. Various data processing methods have been developed for exploring the resultant high volume data. However, most of these methods process data in the spectral domain and do not make the most of the important spatial information available through this technology. Therefore, we propose a novel streamlined data analysis pipeline specifically developed for MALDI-IMS data utilizing significant spatial information for identifying hidden significant molecular distribution patterns in these complex datasets.

Methods: The proposed unsupervised algorithm uses Sliding Window Normalization (SWN) and a new spatial distribution based peak picking method developed based on Gray level Co-Occurrence (GCO) matrices followed by clustering of biomolecules. We also use gist descriptors and an improved version of GCO matrices to extract features from molecular images and minimum medoid distance to automatically estimate the number of possible groups.

Results: We evaluated our algorithm using a new MALDI-IMS metabolomics dataset of a plant (Eucalypt) leaf. The algorithm revealed hidden significant molecular distribution patterns in the dataset, which the current Component Analysis and Segmentation Map based approaches failed to extract. We further demonstrate the performance of our peak picking method over other traditional approaches by using a publicly available MALDI-IMS proteomics dataset of a rat brain. Although SWN did not show any significant improvement as compared with using no normalization, the visual assessment showed an improvement as compared to using the median normalization.

Availability and implementation: The source code and sample data are freely available at http://exims.sourceforge.net/.

Contact: awgcdw@student.unimelb.edu.au or chalini_w@live.com

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

FinisherSC: a repeat-aware tool for upgrading de novo assembly using long reads

Bioinformatics Journal - Mon, 09/21/2015 - 07:37

We introduce FinisherSC, a repeat-aware and scalable tool for upgrading de novo assembly using long reads. Experiments with real data suggest that FinisherSC can provide longer and higher quality contigs than existing tools while maintaining high concordance.

Availability and implementation: The tool and data are available and will be maintained at http://kakitone.github.io/finishingTool/

Contact: dntse@stanford.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs

Bioinformatics Journal - Mon, 09/21/2015 - 07:37

Motivation: Genomics has revolutionized biological research, but quality assessment of the resulting assembled sequences is complicated and remains mostly limited to technical measures like N50.

Results: We propose a measure for quantitative assessment of genome assembly and annotation completeness based on evolutionarily informed expectations of gene content. We implemented the assessment procedure in open-source software, with sets of Benchmarking Universal Single-Copy Orthologs, named BUSCO.

Availability and implementation: Software implemented in Python and datasets available for download from http://busco.ezlab.org.

Contact: evgeny.zdobnov@unige.ch

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

IgSimulator: a versatile immunosequencing simulator

Bioinformatics Journal - Mon, 09/21/2015 - 07:37

Motivation: The recent introduction of next-generation sequencing technologies to antibody studies have resulted in a growing number of immunoinformatics tools for antibody repertoire analysis. However, benchmarking these newly emerging tools remains problematic since the gold standard datasets that are needed to validate these tools are typically not available.

Results: Since simulating antibody repertoires is often the only feasible way to benchmark new immunoinformatics tools, we developed the IgSimulator tool that addresses various complications in generating realistic antibody repertoires. IgSimulator’s code has modular structure and can be easily adapted to new requirements to simulation.

Availability and implementation: IgSimulator is open source and freely available as a C++ and Python program running on all Unix-compatible platforms. The source code is available from yana-safonova.github.io/ig_simulator.

Contact: safonova.yana@gmail.com

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

ACE: accurate correction of errors using K-mer tries

Bioinformatics Journal - Mon, 09/21/2015 - 07:37

Summary: The quality of high-throughput next-generation sequencing data significantly influences the performance and memory consumption of assembly and mapping algorithms. The most ubiquitous platform, Illumina, mainly suffers from substitution errors. We have developed a tool, ACE, based on K-mer tries to correct such errors. On real MiSeq and HiSeq Illumina archives, ACE yields higher gains in terms of coverage depth, outperforming state-of-the-art competitors in the majority of cases.

Availability and implementation: ACE is licensed under the GPL license and can be freely obtained at https://github.com/sheikhizadeh/ACE/. The program is implemented in C++ and runs on most Unix-derived operating systems.

Contact: siavash.sheikhizadehanari@wur.nl

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

GeIST: a pipeline for mapping integrated DNA elements

Bioinformatics Journal - Mon, 09/21/2015 - 07:37

Summary: There are several experimental contexts in which it is important to identify DNA integration sites, such as insertional mutagenesis screens, gene and enhancer trap applications, and gene therapy. We previously developed an assay to identify millions of integrations in multiplexed barcoded samples at base-pair resolution. The sheer amount of data produced by this approach makes the mapping of individual sites non-trivial without bioinformatics support. This article presents the Genomic Integration Site Tracker (GeIST), a command-line pipeline designed to map the integration sites produced by this assay and identify the samples from which they came. GeIST version 2.1.0, a more adaptable version of our original pipeline, can identify integrations of murine leukemia virus, adeno-associated virus, Tol2 transposons or Ac/Ds transposons, and can be adapted for other inserted elements. It has been tested on experimental data for each of these delivery vectors and fine-tuned to account for sequencing and cloning artifacts.

Availability and implementation: GeIST uses a combination of Bash shell scripting and Perl. GeIST is available at http://research.nhgri.nih.gov/software/GeIST/.

Contact: burgess@mail.nih.gov

Categories: Journal Articles

DisVis: quantifying and visualizing accessible interaction space of distance-restrained biomolecular complexes

Bioinformatics Journal - Mon, 09/21/2015 - 07:37

Summary: We present DisVis, a Python package and command line tool to calculate the reduced accessible interaction space of distance-restrained binary protein complexes, allowing for direct visualization and quantification of the information content of the distance restraints. The approach is general and can also be used as a knowledge-based distance energy term in FFT-based docking directly during the sampling stage.

Availability and implementation: The source code with documentation is freely available from https://github.com/haddocking/disvis.

Contact: a.m.j.j.bonvin@uu.nl

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

oposSOM: R-package for high-dimensional portraying of genome-wide expression landscapes on bioconductor

Bioinformatics Journal - Mon, 09/21/2015 - 07:37

Motivation: Comprehensive analysis of genome-wide molecular data challenges bioinformatics methodology in terms of intuitive visualization with single-sample resolution, biomarker selection, functional information mining and highly granular stratification of sample classes. oposSOM combines those functionalities making use of a comprehensive analysis and visualization strategy based on self-organizing maps (SOM) machine learning which we call ‘high-dimensional data portraying’. The method was successfully applied in a series of studies using mostly transcriptome data but also data of other OMICs realms.

Availability and implementation: oposSOM is now publicly available as Bioconductor R package.

Contact: wirth@izbi.uni-leipzig.de

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

A web application for the unspecific detection of differentially expressed DNA regions in strand-specific expression data

Bioinformatics Journal - Mon, 09/21/2015 - 07:37

Genomic technologies allow laboratories to produce large-scale data sets, either through the use of next-generation sequencing or microarray platforms. To explore these data sets and obtain maximum value from the data, researchers view their results alongside all the known features of a given reference genome. To study transcriptional changes that occur under a given condition, researchers search for regions of the genome that are differentially expressed between different experimental conditions. In order to identify these regions several algorithms have been developed over the years, along with some bioinformatic platforms that enable their use. However, currently available applications for comparative microarray analysis exclusively focus on changes in gene expression within known transcribed regions of predicted protein-coding genes, the changes that occur in non-predictable genetic elements, such as non-coding RNAs. Here, we present a web application for the visualization of strand-specific tiling microarray or next-generation sequencing data that allows customized detection of differentially expressed regions all along the genome in an unspecific manner, that allows identification of all RNA sequences, predictable or not.

Availability and implementation: The web application is freely accessible at http://tilingscan.uv.es/. TilingScan is implemented in PHP and JavaScript.

Contact: vicente.arnau@uv.es

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

lpNet: a linear programming approach to reconstruct signal transduction networks

Bioinformatics Journal - Mon, 09/21/2015 - 07:37

Summary: With the widespread availability of high-throughput experimental technologies it has become possible to study hundreds to thousands of cellular factors simultaneously, such as coding- or non-coding mRNA or protein concentrations. Still, extracting information about the underlying regulatory or signaling interactions from these data remains a difficult challenge. We present a flexible approach towards network inference based on linear programming. Our method reconstructs the interactions of factors from a combination of perturbation/non-perturbation and steady-state/time-series data. We show both on simulated and real data that our methods are able to reconstruct the underlying networks fast and efficiently, thus shedding new light on biological processes and, in particular, into disease’s mechanisms of action. We have implemented the approach as an R package available through bioconductor.

Availability and implementation: This R package is freely available under the Gnu Public License (GPL-3) from bioconductor.org (http://bioconductor.org/packages/release/bioc/html/lpNet.html) and is compatible with most operating systems (Windows, Linux, Mac OS) and hardware architectures.

Contact: bettina.knapp@helmholtz-muenchen.de

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

TiQuant: software for tissue analysis, quantification and surface reconstruction

Bioinformatics Journal - Mon, 09/21/2015 - 07:37

Motivation: TiQuant is a modular software tool for efficient quantification of biological tissues based on volume data obtained by biomedical image modalities. It includes a number of versatile image and volume processing chains tailored to the analysis of different tissue types which have been experimentally verified. TiQuant implements a novel method for the reconstruction of three-dimensional surfaces of biological systems, data that often cannot be obtained experimentally but which is of utmost importance for tissue modelling in systems biology.

Availability and implementation: TiQuant is freely available for non-commercial use at msysbio.com/tiquant. Windows, OSX and Linux are supported.

Contact: hoehme@uni-leipzig.de

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

RPdb: a database of experimentally verified cellular reprogramming records

Bioinformatics Journal - Mon, 09/21/2015 - 07:37

Summary: Many cell lines can be reprogrammed to other cell lines by forced expression of a few transcription factors or by specifically designed culture methods, which have attracted a great interest in the field of regenerative medicine and stem cell research. Plenty of cell lines have been used to generate induced pluripotent stem cells (IPSCs) by expressing a group of genes and microRNAs. These IPSCs can differentiate into somatic cells to promote tissue regeneration. Similarly, many somatic cells can be directly reprogrammed to other cells without a stem cell state. All these findings are helpful in searching for new reprogramming methods and understanding the biological mechanism inside. However, to the best of our knowledge, there is still no database dedicated to integrating the reprogramming records. We built RPdb (cellular reprogramming database) to collect cellular reprogramming information and make it easy to access. All entries in RPdb are manually extracted from more than 2000 published articles, which is helpful for researchers in regenerative medicine and cell biology.

Availability and Implementation: RPdb is freely available on the web at http://bioinformatics.ustc.edu.cn/rpdb with all major browsers supported.

Contact: aoli@ustc.edu.cn

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

The loss of inhibitory C-terminal conformations in disease associated P123H β-synuclein

Protein Science - Mon, 09/21/2015 - 00:05
Abstract

β-synuclein (βS) is a homologue of α-synuclein (αS), the major protein component of Lewy bodies in patients with Parkinson's disease. In contrast to αS, βS does not form fibrils, mitigates αS toxicity in vivo and inhibits αS fibril formation in vitro. Previously a missense mutation of βS, P123H, was identified in patients with Dementia with Lewy Body disease. The single P123H mutation at the C-terminus of βS is able to convert βS from a nontoxic to a toxic protein that is also able to accelerate formation of inclusions when it is in the presence of αS in vivo. To elucidate the molecular mechanisms of these processes, we compare the conformational properties of the monomer forms of αS, βS and P123H-βS, and the effects on fibril formation of coincubation of αS with βS, and with P123H-βS. NMR residual dipolar couplings and secondary structure propensities show that the P123H mutation of βS renders it more flexible C-terminal to the mutation site and more αS-like. In vitro Thioflavin T fluorescence experiments show that P123H-βS accelerates αS fibril formation upon coincubation, as opposed to wild type βS that acts as an inhibitor of αS aggregation. When P123H-βS becomes more αS-like it is unable to perform the protective function of βS, which suggests that the extended polyproline II motif of βS in the C-terminus is critical to its nontoxic nature and to inhibition of αS upon coincubation. These studies may provide a basis for understanding which regions to target for therapeutic intervention in Parkinson's disease.

Categories: Journal Articles

The value of protein structure classification information—Surveying the scientific literature

ABSTRACT

The Structural Classification of Proteins (SCOP) and Class, Architecture, Topology, Homology (CATH) databases have been valuable resources for protein structure classification for over 20 years. Development of SCOP (version 1) concluded in June 2009 with SCOP 1.75. The SCOPe (SCOP–extended) database offers continued development of the classic SCOP hierarchy, adding over 33,000 structures. We have attempted to assess the impact of these two decade old resources and guide future development. To this end, we surveyed recent articles to learn how structure classification data are used. Of 571 articles published in 2012–2013 that cite SCOP, 439 actually use data from the resource. We found that the type of use was fairly evenly distributed among four top categories: A) study protein structure or evolution (27% of articles), B) train and/or benchmark algorithms (28% of articles), C) augment non-SCOP datasets with SCOP classification (21% of articles), and D) examine the classification of one protein/a small set of proteins (22% of articles). Most articles described computational research, although 11% described purely experimental research, and a further 9% included both. We examined how CATH and SCOP were used in 158 articles that cited both databases: while some studies used only one dataset, the majority used data from both resources. Protein structure classification remains highly relevant for a diverse range of problems and settings. Proteins 2015; 83:2025–2038. © 2015 The Authors. Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc.

Categories: Journal Articles

Template-based protein structure prediction in CASP11 and retrospect of I-TASSER in the last decade

ABSTRACT

We report the structure prediction results of a new composite pipeline for template-based modeling (TBM) in the 11th CASP experiment. Starting from multiple structure templates identified by LOMETS based meta-threading programs, the QUARK ab initio folding program is extended to generate initial full-length models under strong constraints from template alignments. The final atomic models are then constructed by I-TASSER based fragment reassembly simulations, followed by the fragment-guided molecular dynamic simulation and the MQAP-based model selection. It was found that the inclusion of QUARK-TBM simulations as an intermediate modeling step could help improve the quality of the I-TASSER models for both Easy and Hard TBM targets. Overall, the average TM-score of the first I-TASSER model is 12% higher than that of the best LOMETS templates, with the RMSD in the same threading-aligned regions reduced from 5.8 to 4.7 Å. Nevertheless, there are nearly 18% of TBM domains with the templates deteriorated by the structure assembly pipeline, which may be attributed to the errors of secondary structure and domain orientation predictions that propagate through and degrade the procedures of template identification and final model selections. To examine the record of progress, we made a retrospective report of the I-TASSER pipeline in the last five CASP experiments (CASP7-11). The data show no clear progress of the LOMETS threading programs over PSI-BLAST; but obvious progress on structural improvement relative to threading templates was witnessed in recent CASP experiments, which is probably attributed to the integration of the extended ab initio folding simulation with the threading assembly pipeline and the introduction of atomic-level structure refinements following the reduced modeling simulations. Proteins 2015. © 2015 Wiley Periodicals, Inc.

Categories: Journal Articles

Structure of HIV-1 reverse transcriptase bound to a novel 38-mer hairpin template-primer DNA aptamer

Protein Science - Fri, 09/18/2015 - 00:54
Abstract

The development of a modified DNA aptamer that binds HIV-1 reverse transcriptase (RT) with ultra-high affinity has enabled the X-ray structure determination of an HIV-1 RT-DNA complex to 2.3 Å resolution without the need for an antibody Fab fragment or RT-DNA cross-linking. The 38-mer hairpin-DNA aptamer has a 15 base-pair duplex, a three-deoxythymidine hairpin loop, and a five-nucleotide 5′-overhang. The aptamer binds RT in a template-primer configuration with the 3′-end positioned at the polymerase active site and has 2′-O-methyl modifications at the second and fourth duplex template nucleotides that interact with the p66 fingers and palm subdomains. This structure represents the highest resolution RT-nucleic acid structure to date. The RT-aptamer complex is catalytically active and can serve as a platform for studying fundamental RT mechanisms and for development of anti-HIV inhibitors through fragment screening and other approaches. Additionally, the structure allows for a detailed look at a unique aptamer design and provides the molecular basis for its remarkably high affinity for RT.

Categories: Journal Articles

Aromatic residues in RNase T stack with nucleobases to guide the sequence-specific recognition and cleavage of nucleic acids

Protein Science - Fri, 09/18/2015 - 00:51
Abstract

RNase T is a classical member of the DEDDh family of exonucleases with a unique sequence preference in that its 3′-to-5′ exonuclease activity is blocked by a 3′-terminal dinucleotide CC in digesting both single-stranded RNA and DNA. Our previous crystal structure analysis of RNase T-DNA complexes show that four phenylalanine residues, F29, F77, F124, and F146, stack with the two 3′-terminal nucleobases. To elucidate if the π–π stacking interactions between aromatic residues and nucleobases play a critical role in sequence-specific protein–nucleic acid recognition, here we mutated two to four of the phenylalanine residues in RNase T to tryptophan (W mutants) and tyrosine (Y mutants). The Escherichia coli strains expressing either the W mutants or the Y mutants had slow growth phenotypes, suggesting that all of these mutants could not fully substitute the function of the wild-type RNase T in vivo. DNA digestion assays revealed W mutants shared similar sequence specificity with wild-type RNase T. However, the Y mutants exhibited altered sequence-dependent activity, digesting ssDNA with both 3′-end CC and GG sequences. Moreover, the W and Y mutants had reduced DNA-binding activity and lower thermal stability as compared to wild-type RNase T. Taken together, our results suggest that the four phenylalanine residues in RNase T not only play critical roles in sequence-specific recognition, but also in overall protein stability. Our results provide the first evidence showing that the π−π stacking interactions between nucleobases and protein aromatic residues may guide the sequence-specific activity for DNA and RNA enzymes.

Categories: Journal Articles

A call to deal with the data deluge

Nature - Thu, 09/17/2015 - 23:00

A call to deal with the data deluge

Nature 525, 7570 (2015). doi:10.1038/525429f

Author: Chris Woolston

Researchers debate whether an ‘overflow’ of data is straining biomedical science.

Categories: Journal Articles
Syndicate content