Journal Articles

A disintegrating minor planet transiting a white dwarf

Nature - Tue, 10/20/2015 - 23:00

A disintegrating minor planet transiting a white dwarf

Nature 526, 7574 (2015). doi:10.1038/nature15527

Authors: Andrew Vanderburg, John Asher Johnson, Saul Rappaport, Allyson Bieryla, Jonathan Irwin, John Arban Lewis, David Kipping, Warren R. Brown, Patrick Dufour, David R. Ciardi, Ruth Angus, Laura Schaefer, David W. Latham, David Charbonneau, Charles Beichman, Jason Eastman, Nate McCrady, Robert A. Wittenmyer & Jason T. Wright

Most stars become white dwarfs after they have exhausted their nuclear fuel (the Sun will be one such). Between one-quarter and one-half of white dwarfs have elements heavier than helium in their atmospheres, even though these elements ought to sink rapidly into the stellar interiors (unless they are occasionally replenished). The abundance ratios of heavy elements in the atmospheres of white dwarfs are similar to the ratios in rocky bodies in the Solar System. This fact, together with the existence of warm, dusty debris disks surrounding about four per cent of white dwarfs, suggests that rocky debris from the planetary systems of white-dwarf progenitors occasionally pollutes the atmospheres of the stars. The total accreted mass of this debris is sometimes comparable to the mass of large asteroids in the Solar System. However, rocky, disintegrating bodies around a white dwarf have not yet been observed. Here we report observations of a white dwarf—WD 1145+017—being transited by at least one, and probably several, disintegrating planetesimals, with periods ranging from 4.5 hours to 4.9 hours. The strongest transit signals occur every 4.5 hours and exhibit varying depths (blocking up to 40 per cent of the star’s brightness) and asymmetric profiles, indicative of a small object with a cometary tail of dusty effluent material. The star has a dusty debris disk, and the star’s spectrum shows prominent lines from heavy elements such as magnesium, aluminium, silicon, calcium, iron, and nickel. This system provides further evidence that the pollution of white dwarfs by heavy elements might originate from disrupted rocky bodies such as asteroids and minor planets.

Categories: Journal Articles

The rise of fully turbulent flow

Nature - Tue, 10/20/2015 - 23:00

The rise of fully turbulent flow

Nature 526, 7574 (2015). doi:10.1038/nature15701

Authors: Dwight Barkley, Baofang Song, Vasudevan Mukund, Grégoire Lemoult, Marc Avila & Björn Hof

Over a century of research into the origin of turbulence in wall-bounded shear flows has resulted in a puzzling picture in which turbulence appears in a variety of different states competing with laminar background flow. At moderate flow speeds, turbulence is confined to localized patches; it is only at higher speeds that the entire flow becomes turbulent. The origin of the different states encountered during this transition, the front dynamics of the turbulent regions and the transformation to full turbulence have yet to be explained. By combining experiments, theory and computer simulations, here we uncover a bifurcation scenario that explains the transformation to fully turbulent pipe flow and describe the front dynamics of the different states encountered in the process. Key to resolving this problem is the interpretation of the flow as a bistable system with nonlinear propagation (advection) of turbulent fronts. These findings bridge the gap between our understanding of the onset of turbulence and fully turbulent flows.

Categories: Journal Articles

Intercellular wiring enables electron transfer between methanotrophic archaea and bacteria

Nature - Tue, 10/20/2015 - 23:00

Intercellular wiring enables electron transfer between methanotrophic archaea and bacteria

Nature 526, 7574 (2015). doi:10.1038/nature15733

Authors: Gunter Wegener, Viola Krukenberg, Dietmar Riedel, Halina E. Tegetmeyer & Antje Boetius

The anaerobic oxidation of methane (AOM) with sulfate controls the emission of the greenhouse gas methane from the ocean floor. In marine sediments, AOM is performed by dual-species consortia of anaerobic methanotrophic archaea (ANME) and sulfate-reducing bacteria (SRB) inhabiting the methane–sulfate transition zone. The biochemical pathways and biological adaptations enabling this globally relevant process are not fully understood. Here we study the syntrophic interaction in thermophilic AOM (TAOM) between ANME-1 archaea and their consortium partner SRB HotSeep-1 (ref. 6) at 60 °C to test the hypothesis of a direct interspecies exchange of electrons. The activity of TAOM consortia was compared to the first ANME-free culture of an AOM partner bacterium that grows using hydrogen as the sole electron donor. The thermophilic ANME-1 do not produce sufficient hydrogen to sustain the observed growth of the HotSeep-1 partner. Enhancing the growth of the HotSeep-1 partner by hydrogen addition represses methane oxidation and the metabolic activity of ANME-1. Further supporting the hypothesis of direct electron transfer between the partners, we observe that under TAOM conditions, both ANME and the HotSeep-1 bacteria overexpress genes for extracellular cytochrome production and form cell-to-cell connections that resemble the nanowire structures responsible for interspecies electron transfer between syntrophic consortia of Geobacter. HotSeep-1 highly expresses genes for pili production only during consortial growth using methane, and the nanowire-like structures are absent in HotSeep-1 cells isolated with hydrogen. These observations suggest that direct electron transfer is a principal mechanism in TAOM, which may also explain the enigmatic functioning and specificity of other methanotrophic ANME–SRB consortia.

Categories: Journal Articles

Relative Order of Sulfuric Acid, Bisulfate, Hydronium, and Cations at the Air–Water Interface

Journal of American Chemical Society - Tue, 10/20/2015 - 16:11

Journal of the American Chemical SocietyDOI: 10.1021/jacs.5b08636
Categories: Journal Articles

Surface and Bulk Effects in Photochemical Reactions and Photomechanical Effects in Dynamic Molecular Crystals

Journal of American Chemical Society - Tue, 10/20/2015 - 16:10

Journal of the American Chemical SocietyDOI: 10.1021/jacs.5b07806
Categories: Journal Articles

Fluorescein Derivatives as Bifunctional Molecules for the Simultaneous Inhibiting and Labeling of FTO Protein

Journal of American Chemical Society - Tue, 10/20/2015 - 16:08

Journal of the American Chemical SocietyDOI: 10.1021/jacs.5b06690
Categories: Journal Articles

Integrative Genomics-Based Discovery of Novel Regulators of the Innate Antiviral Response

PLoS Computational Biology - Tue, 10/20/2015 - 16:00

by Robin van der Lee, Qian Feng, Martijn A. Langereis, Rob ter Horst, Radek Szklarczyk, Mihai G. Netea, Arno C. Andeweg, Frank J. M. van Kuppeveld, Martijn A. Huynen

The RIG-I-like receptor (RLR) pathway is essential for detecting cytosolic viral RNA to trigger the production of type I interferons (IFNα/β) that initiate an innate antiviral response. Through systematic assessment of a wide variety of genomics data, we discovered 10 molecular signatures of known RLR pathway components that collectively predict novel members. We demonstrate that RLR pathway genes, among others, tend to evolve rapidly, interact with viral proteins, contain a limited set of protein domains, are regulated by specific transcription factors, and form a tightly connected interaction network. Using a Bayesian approach to integrate these signatures, we propose likely novel RLR regulators. RNAi knockdown experiments revealed a high prediction accuracy, identifying 94 genes among 187 candidates tested (~50%) that affected viral RNA-induced production of IFNβ. The discovered antiviral regulators may participate in a wide range of processes that highlight the complexity of antiviral defense (e.g. MAP3K11, CDK11B, PSMA3, TRIM14, HSPA9B, CDC37, NUP98, G3BP1), and include uncharacterized factors (DDX17, C6orf58, C16orf57, PKN2, SNW1). Our validated RLR pathway list (http://rlr.cmbi.umcn.nl/), obtained using a combination of integrative genomics and experiments, is a new resource for innate antiviral immunity research.
Categories: Journal Articles

Proteny: discovering and visualizing statistically significant syntenic clusters at the proteome level

Bioinformatics Journal - Tue, 10/20/2015 - 09:50

Background: With more and more genomes being sequenced, detecting synteny between genomes becomes more and more important. However, for microorganisms the genomic divergence quickly becomes large, resulting in different codon usage and shuffling of gene order and gene elements such as exons.

Results: We present Proteny, a methodology to detect synteny between diverged genomes. It operates on the amino acid sequence level to be insensitive to codon usage adaptations and clusters groups of exons disregarding order to handle diversity in genomic ordering between genomes. Furthermore, Proteny assigns significance levels to the syntenic clusters such that they can be selected on statistical grounds. Finally, Proteny provides novel ways to visualize results at different scales, facilitating the exploration and interpretation of syntenic regions. We test the performance of Proteny on a standard ground truth dataset, and we illustrate the use of Proteny on two closely related genomes (two different strains of Aspergillus niger) and on two distant genomes (two species of Basidiomycota). In comparison to other tools, we find that Proteny finds clusters with more true homologies in fewer clusters that contain more genes, i.e. Proteny is able to identify a more consistent synteny. Further, we show how genome rearrangements, assembly errors, gene duplications and the conservation of specific genes can be easily studied with Proteny.

Availability and implementation: Proteny is freely available at the Delft Bioinformatics Lab website http://bioinformatics.tudelft.nl/dbl/software.

Contact: t.gehrmann@tudelft.nl

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

A DNA shape-based regulatory score improves position-weight matrix-based recognition of transcription factor binding sites

Bioinformatics Journal - Tue, 10/20/2015 - 09:50

Motivation: The position-weight matrix (PWM) is a useful representation of a transcription factor binding site (TFBS) sequence pattern because the PWM can be estimated from a small number of representative TFBS sequences. However, because the PWM probability model assumes independence between individual nucleotide positions, the PWMs for some TFs poorly discriminate binding sites from non-binding-sites that have similar sequence content. Since the local three-dimensional DNA structure (‘shape’) is a determinant of TF binding specificity and since DNA shape has a significant sequence-dependence, we combined DNA shape-derived features into a TF-generalized regulatory score and tested whether the score could improve PWM-based discrimination of TFBS from non-binding-sites.

Results: We compared a traditional PWM model to a model that combines the PWM with a DNA shape feature-based regulatory potential score, for accuracy in detecting binding sites for 75 vertebrate transcription factors. The PWM + shape model was more accurate than the PWM-only model, for 45% of TFs tested, with no significant loss of accuracy for the remaining TFs.

Availability and implementation: The shape-based model is available as an open-source R package at that is archived on the GitHub software repository at https://github.com/ramseylab/regshape/.

Contact: stephen.ramsey@oregonstate.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

Estimating beta diversity for under-sampled communities using the variably weighted Odum dissimilarity index and OTUshuff

Bioinformatics Journal - Tue, 10/20/2015 - 09:50

Motivation: In profiling the composition and structure of complex microbial communities via high throughput amplicon sequencing, a very low proportion of community members are typically sampled. As a result of this incomplete sampling, estimates of dissimilarity between communities are often inflated, an issue we term pseudo β-diversity.

Results: We present a set of tools to identify and correct for the presence of pseudo β-diversity in contrasts between microbial communities. The variably weighted Odum dissimilarity (DwOdum) allows for down-weighting the influence of either abundant or rare taxa in calculating a measure of similarity between two communities. We show that down-weighting the influence of rare taxa can be used to minimize pseudo β-diversity arising from incomplete sampling. Down-weighting the influence of abundant taxa can increase the sensitivity of hypothesis testing. OTUshuff is an associated test for identifying the presence of pseudo β-diversity in pairwise community contrasts.

Availability and implementation: A Perl script for calculating the DwOdum score from a taxon abundance table and performing pairwise contrasts with OTUshuff can be obtained at http://www.ars.usda.gov/services/software/software.htm?modecode=30-12-10-00.

Contact: daniel.manter@ars.usda.gov

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

Functional classification of CATH superfamilies: a domain-based approach for protein function annotation

Bioinformatics Journal - Tue, 10/20/2015 - 09:50

Motivation: Computational approaches that can predict protein functions are essential to bridge the widening function annotation gap especially since <1.0% of all proteins in UniProtKB have been experimentally characterized. We present a domain-based method for protein function classification and prediction of functional sites that exploits functional sub-classification of CATH superfamilies. The superfamilies are sub-classified into functional families (FunFams) using a hierarchical clustering algorithm supervised by a new classification method, FunFHMMer.

Results: FunFHMMer generates more functionally coherent groupings of protein sequences than other domain-based protein classifications. This has been validated using known functional information. The conserved positions predicted by the FunFams are also found to be enriched in known functional residues. Moreover, the functional annotations provided by the FunFams are found to be more precise than other domain-based resources. FunFHMMer currently identifies 110 439 FunFams in 2735 superfamilies which can be used to functionally annotate > 16 million domain sequences.

Availability and implementation: All FunFam annotation data are made available through the CATH webpages (http://www.cathdb.info). The FunFHMMer webserver (http://www.cathdb.info/search/by_funfhmmer) allows users to submit query sequences for assignment to a CATH FunFam.

Contact: sayoni.das.12@ucl.ac.uk

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

ERGC: an efficient referential genome compression algorithm

Bioinformatics Journal - Tue, 10/20/2015 - 09:50

Motivation: Genome sequencing has become faster and more affordable. Consequently, the number of available complete genomic sequences is increasing rapidly. As a result, the cost to store, process, analyze and transmit the data is becoming a bottleneck for research and future medical applications. So, the need for devising efficient data compression and data reduction techniques for biological sequencing data is growing by the day. Although there exists a number of standard data compression algorithms, they are not efficient in compressing biological data. These generic algorithms do not exploit some inherent properties of the sequencing data while compressing. To exploit statistical and information-theoretic properties of genomic sequences, we need specialized compression algorithms. Five different next-generation sequencing data compression problems have been identified and studied in the literature. We propose a novel algorithm for one of these problems known as reference-based genome compression.

Results: We have done extensive experiments using five real sequencing datasets. The results on real genomes show that our proposed algorithm is indeed competitive and performs better than the best known algorithms for this problem. It achieves compression ratios that are better than those of the currently best performing algorithms. The time to compress and decompress the whole genome is also very promising.

Availability and implementation: The implementations are freely available for non-commercial purposes. They can be downloaded from http://engr.uconn.edu/~rajasek/ERGC.zip.

Contact: rajasek@engr.uconn.edu

Categories: Journal Articles

Error filtering, pair assembly and error correction for next-generation sequencing reads

Bioinformatics Journal - Tue, 10/20/2015 - 09:50

Motivation: Next-generation sequencing produces vast amounts of data with errors that are difficult to distinguish from true biological variation when coverage is low.

Results: We demonstrate large reductions in error frequencies, especially for high-error-rate reads, by three independent means: (i) filtering reads according to their expected number of errors, (ii) assembling overlapping read pairs and (iii) for amplicon reads, by exploiting unique sequence abundances to perform error correction. We also show that most published paired read assemblers calculate incorrect posterior quality scores.

Availability and implementation: These methods are implemented in the USEARCH package. Binaries are freely available at http://drive5.com/usearch.

Contact: robert@drive5.com

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

JASSA: a comprehensive tool for prediction of SUMOylation sites and SIMs

Bioinformatics Journal - Tue, 10/20/2015 - 09:50

Motivation: Post-translational modification by the Small Ubiquitin-like Modifier (SUMO) proteins, a process termed SUMOylation, is involved in many fundamental cellular processes. SUMO proteins are conjugated to a protein substrate, creating an interface for the recruitment of cofactors harboring SUMO-interacting motifs (SIMs). Mapping both SUMO-conjugation sites and SIMs is required to study the functional consequence of SUMOylation. To define the best candidate sites for experimental validation we designed JASSA, a Joint Analyzer of SUMOylation site and SIMs.

Results: JASSA is a predictor that uses a scoring system based on a Position Frequency Matrix derived from the alignment of experimental SUMOylation sites or SIMs. Compared with existing web-tools, JASSA displays on par or better performances. Novel features were implemented towards a better evaluation of the prediction, including identification of database hits matching the query sequence and representation of candidate sites within the secondary structural elements and/or the 3D fold of the protein of interest, retrievable from deposited PDB files.

Availability and Implementation: JASSA is freely accessible at http://www.jassa.fr/. Website is implemented in PHP and MySQL, with all major browsers supported.

Contact: guillaume.beauclair@inserm.fr

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

Application of learning to rank to protein remote homology detection

Bioinformatics Journal - Tue, 10/20/2015 - 09:50

Motivation: Protein remote homology detection is one of the fundamental problems in computational biology, aiming to find protein sequences in a database of known structures that are evolutionarily related to a given query protein. Some computational methods treat this problem as a ranking problem and achieve the state-of-the-art performance, such as PSI-BLAST, HHblits and ProtEmbed. This raises the possibility to combine these methods to improve the predictive performance. In this regard, we are to propose a new computational method called ProtDec-LTR for protein remote homology detection, which is able to combine various ranking methods in a supervised manner via using the Learning to Rank (LTR) algorithm derived from natural language processing.

Results: Experimental results on a widely used benchmark dataset showed that ProtDec-LTR can achieve an ROC1 score of 0.8442 and an ROC50 score of 0.9023 outperforming all the individual predictors and some state-of-the-art methods. These results indicate that it is correct to treat protein remote homology detection as a ranking problem, and predictive performance improvement can be achieved by combining different ranking approaches in a supervised manner via using LTR.

Availability and implementation: For users’ convenience, the software tools of three basic ranking predictors and Learning to Rank algorithm were provided at http://bioinformatics.hitsz.edu.cn/ProtDec-LTR/home/

Contact: bliu@insun.hit.edu.cn

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

GDFuzz3D: a method for protein 3D structure reconstruction from contact maps, based on a non-Euclidean distance function

Bioinformatics Journal - Tue, 10/20/2015 - 09:50

Motivation: To date, only a few distinct successful approaches have been introduced to reconstruct a protein 3D structure from a map of contacts between its amino acid residues (a 2D contact map). Current algorithms can infer structures from information-rich contact maps that contain a limited fraction of erroneous predictions. However, it is difficult to reconstruct 3D structures from predicted contact maps that usually contain a high fraction of false contacts.

Results: We describe a new, multi-step protocol that predicts protein 3D structures from the predicted contact maps. The method is based on a novel distance function acting on a fuzzy residue proximity graph, which predicts a 2D distance map from a 2D predicted contact map. The application of a Multi-Dimensional Scaling algorithm transforms that predicted 2D distance map into a coarse 3D model, which is further refined by typical modeling programs into an all-atom representation. We tested our approach on contact maps predicted de novo by MULTICOM, the top contact map predictor according to CASP10. We show that our method outperforms FT-COMAR, the state-of-the-art method for 3D structure reconstruction from 2D maps. For all predicted 2D contact maps of relatively low sensitivity (60–84%), GDFuzz3D generates more accurate 3D models, with the average improvement of 4.87 Å in terms of RMSD.

Availability and implementation: GDFuzz3D server and standalone version are freely available at http://iimcb.genesilico.pl/gdserver/GDFuzz3D/.

Contact: iamb@genesilico.pl

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

Protein contact prediction by integrating joint evolutionary coupling analysis and supervised learning

Bioinformatics Journal - Tue, 10/20/2015 - 09:50

Motivation: Protein contact prediction is important for protein structure and functional study. Both evolutionary coupling (EC) analysis and supervised machine learning methods have been developed, making use of different information sources. However, contact prediction is still challenging especially for proteins without a large number of sequence homologs.

Results: This article presents a group graphical lasso (GGL) method for contact prediction that integrates joint multi-family EC analysis and supervised learning to improve accuracy on proteins without many sequence homologs. Different from existing single-family EC analysis that uses residue coevolution information in only the target protein family, our joint EC analysis uses residue coevolution in both the target family and its related families, which may have divergent sequences but similar folds. To implement this, we model a set of related protein families using Gaussian graphical models and then coestimate their parameters by maximum-likelihood, subject to the constraint that these parameters shall be similar to some degree. Our GGL method can also integrate supervised learning methods to further improve accuracy. Experiments show that our method outperforms existing methods on proteins without thousands of sequence homologs, and that our method performs better on both conserved and family-specific contacts.

Availability and implementation: See http://raptorx.uchicago.edu/ContactMap/ for a web server implementing the method.

Contact: j3xu@ttic.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

Categories: Journal Articles

A multivariate Bernoulli model to predict DNaseI hypersensitivity status from haplotype data

Bioinformatics Journal - Tue, 10/20/2015 - 09:50

Motivation: Haplotype models enjoy a wide range of applications in population inference and disease gene discovery. The hidden Markov models traditionally used for haplotypes are hindered by the dubious assumption that dependencies occur only between consecutive pairs of variants. In this article, we apply the multivariate Bernoulli (MVB) distribution to model haplotype data. The MVB distribution relies on interactions among all sets of variants, thus allowing for the detection and exploitation of long-range and higher-order interactions. We discuss penalized estimation and present an efficient algorithm for fitting sparse versions of the MVB distribution to haplotype data. Finally, we showcase the benefits of the MVB model in predicting DNaseI hypersensitivity (DH) status—an epigenetic mark describing chromatin accessibility—from population-scale haplotype data.

Results: We fit the MVB model to real data from 59 individuals on whom both haplotypes and DH status in lymphoblastoid cell lines are publicly available. The model allows prediction of DH status from genetic data (prediction R2=0.12 in cross-validations). Comparisons of prediction under the MVB model with prediction under linear regression (best linear unbiased prediction) and logistic regression demonstrate that the MVB model achieves about 10% higher prediction R2 than the two competing methods in empirical data.

Availability and implementation: Software implementing the method described can be downloaded at http://bogdan.bioinformatics.ucla.edu/software/.

Contact: shihuwenbo@ucla.edu or pasaniuc@ucla.edu

Categories: Journal Articles

LayerCake: a tool for the visual comparison of viral deep sequencing data

Bioinformatics Journal - Tue, 10/20/2015 - 09:50

Motivation: The advent of next-generation sequencing (NGS) has created unprecedented opportunities to examine viral populations within individual hosts, among infected individuals and over time. Comparing sequence variability across viral genomes allows for the construction of complex population structures, the analysis of which can yield powerful biological insights. However, the simultaneous display of sequence variation, coverage depth and quality scores across thousands of bases presents a unique visualization challenge that has not been fully met by current NGS analysis tools.

Results: Here, we present LayerCake, a self-contained visualization tool that allows for the rapid analysis of variation in viral NGS data. LayerCake enables the user to simultaneously visualize variations in multiple viral populations across entire genomes within a highly customizable framework, drawing attention to pertinent and interesting patterns of variation. We have successfully deployed LayerCake to assist with a variety of different genomics datasets.

Availability and implementation: Program downloads and detailed instructions are available at http://graphics.cs.wisc.edu/WP/layercake under a modified MIT license. LayerCake is a cross-platform tool written in the Processing framework for Java.

Contact: mcorrell@cs.wisc.edu

Categories: Journal Articles
Syndicate content