PLoS Computational Biology
Defining the Estimated Core Genome of Bacterial Populations Using a Bayesian Decision Model
by Andries J. van Tonder, Shilan Mistry, James E. Bray, Dorothea M. C. Hill, Alison J. Cody, Chris L. Farmer, Keith P. Klugman, Anne von Gottberg, Stephen D. Bentley, Julian Parkhill, Keith A. Jolley, Martin C. J. Maiden, Angela B. BrueggemannThe bacterial core genome is of intense interest and the volume of whole genome sequence data in the public domain available to investigate it has increased dramatically. The aim of our study was to develop a model to estimate the bacterial core genome from next-generation whole genome sequencing data and use this model to identify novel genes associated with important biological functions. Five bacterial datasets were analysed, comprising 2096 genomes in total. We developed a Bayesian decision model to estimate the number of core genes, calculated pairwise evolutionary distances (p-distances) based on nucleotide sequence diversity, and plotted the median p-distance for each core gene relative to its genome location. We designed visually-informative genome diagrams to depict areas of interest in genomes. Case studies demonstrated how the model could identify areas for further study, e.g. 25% of the core genes with higher sequence diversity in the Campylobacter jejuni and Neisseria meningitidis genomes encoded hypothetical proteins. The core gene with the highest p-distance value in C. jejuni was annotated in the reference genome as a putative hydrolase, but further work revealed that it shared sequence homology with beta-lactamase/metallo-beta-lactamases (enzymes that provide resistance to a range of broad-spectrum antibiotics) and thioredoxin reductase genes (which reduce oxidative stress and are essential for DNA replication) in other C. jejuni genomes. Our Bayesian model of estimating the core genome is principled, easy to use and can be applied to large genome datasets. This study also highlighted the lack of knowledge currently available for many core genes in bacterial genomes of significant global public health importance.
by Jonathan Laudanski, Benjamin Torben-Nielsen, Idan Segev, Shihab ShammaAn important task performed by a neuron is the selection of relevant inputs from among thousands of synapses impinging on the dendritic tree. Synaptic plasticity enables this by strenghtening a subset of synapses that are, presumably, functionally relevant to the neuron. A different selection mechanism exploits the resonance of the dendritic membranes to preferentially filter synaptic inputs based on their temporal rates. A widely held view is that a neuron has one resonant frequency and thus can pass through one rate. Here we demonstrate through mathematical analyses and numerical simulations that dendritic resonance is inevitably a spatially distributed property; and therefore the resonance frequency varies along the dendrites, and thus endows neurons with a powerful spatiotemporal selection mechanism that is sensitive both to the dendritic location and the temporal structure of the incoming synaptic inputs.
by Sean Ekins, Ethan O. Perlstein
by Victor J. Barranca, Gregor Kovačič, Douglas Zhou, David CaiConsidering that many natural stimuli are sparse, can a sensory system evolve to take advantage of this sparsity? We explore this question and show that significant downstream reductions in the numbers of neurons transmitting stimuli observed in early sensory pathways might be a consequence of this sparsity. First, we model an early sensory pathway using an idealized neuronal network comprised of receptors and downstream sensory neurons. Then, by revealing a linear structure intrinsic to neuronal network dynamics, our work points to a potential mechanism for transmitting sparse stimuli, related to compressed-sensing (CS) type data acquisition. Through simulation, we examine the characteristics of networks that are optimal in sparsity encoding, and the impact of localized receptive fields beyond conventional CS theory. The results of this work suggest a new network framework of signal sparsity, freeing the notion from any dependence on specific component-space representations. We expect our CS network mechanism to provide guidance for studying sparse stimulus transmission along realistic sensory pathways as well as engineering network designs that utilize sparsity encoding.
by Cyril F. Reboul, James C. Whisstock, Michelle A. DunstoneCholesterol Dependent Cytolysins (CDCs) are important bacterial virulence factors that form large (200–300 Å) membrane embedded pores in target cells. Currently, insights from X-ray crystallography, biophysical and single particle cryo-Electron Microscopy (cryo-EM) experiments suggest that soluble monomers first interact with the membrane surface via a C-terminal Immunoglobulin-like domain (Ig; Domain 4). Membrane bound oligomers then assemble into a prepore oligomeric form, following which the prepore assembly collapses towards the membrane surface, with concomitant release and insertion of the membrane spanning subunits. During this rearrangement it is proposed that Domain 2, a region comprising three β-strands that links the pore forming region (Domains 1 and 3) and the Ig domain, must undergo a significant yet currently undetermined, conformational change. Here we address this problem through a systematic molecular modeling and structural bioinformatics approach. Our work shows that simple rigid body rotations may account for the observed collapse of the prepore towards the membrane surface. Support for this idea comes from analysis of published cryo-EM maps of the pneumolysin pore, available crystal structures and molecular dynamics simulations. The latter data in particular reveal that Domains 1, 2 and 4 are able to undergo significant rotational movements with respect to each other. Together, our data provide new and testable insights into the mechanism of pore formation by CDCs.
Specificity and Affinity Quantification of Flexible Recognition from Underlying Energy Landscape Topography
by Xiakun Chu, Jin WangFlexibility in biomolecular recognition is essential and critical for many cellular activities. Flexible recognition often leads to moderate affinity but high specificity, in contradiction with the conventional wisdom that high affinity and high specificity are coupled. Furthermore, quantitative understanding of the role of flexibility in biomolecular recognition is still challenging. Here, we meet the challenge by quantifying the intrinsic biomolecular recognition energy landscapes with and without flexibility through the underlying density of states. We quantified the thermodynamic intrinsic specificity by the topography of the intrinsic binding energy landscape and the kinetic specificity by association rate. We found that the thermodynamic and kinetic specificity are strongly correlated. Furthermore, we found that flexibility decreases binding affinity on one hand, but increases binding specificity on the other hand, and the decreasing or increasing proportion of affinity and specificity are strongly correlated with the degree of flexibility. This shows more (less) flexibility leads to weaker (stronger) coupling between affinity and specificity. Our work provides a theoretical foundation and quantitative explanation of the previous qualitative studies on the relationship among flexibility, affinity and specificity. In addition, we found that the folding energy landscapes are more funneled with binding, indicating that binding helps folding during the recognition. Finally, we demonstrated that the whole binding-folding energy landscapes can be integrated by the rigid binding and isolated folding energy landscapes under weak flexibility. Our results provide a novel way to quantify the affinity and specificity in flexible biomolecular recognition.
Tracing the Evolution of Lineage-Specific Transcription Factor Binding Sites in a Birth-Death Framework
by Ken Daigoro Yokoyama, Yang Zhang, Jian MaChanges in cis-regulatory element composition that result in novel patterns of gene expression are thought to be a major contributor to the evolution of lineage-specific traits. Although transcription factor binding events show substantial variation across species, most computational approaches to study regulatory elements focus primarily upon highly conserved sites, and rely heavily upon multiple sequence alignments. However, sequence conservation based approaches have limited ability to detect lineage-specific elements that could contribute to species-specific traits. In this paper, we describe a novel framework that utilizes a birth-death model to trace the evolution of lineage-specific binding sites without relying on detailed base-by-base cross-species alignments. Our model was applied to analyze the evolution of binding sites based on the ChIP-seq data for six transcription factors (GATA1, SOX2, CTCF, MYC, MAX, ETS1) along the lineage toward human after human-mouse common ancestor. We estimate that a substantial fraction of binding sites (∼58–79% for each factor) in humans have origins since the divergence with mouse. Over 15% of all binding sites are unique to hominids. Such elements are often enriched near genes associated with specific pathways, and harbor more common SNPs than older binding sites in the human genome. These results support the ability of our method to identify lineage-specific regulatory elements and help understand their roles in shaping variation in gene regulation across species.
by Eizaburo Doi, Michael S. LewickiA fundamental task of a sensory system is to infer information about the environment. It has long been suggested that an important goal of the first stage of this process is to encode the raw sensory signal efficiently by reducing its redundancy in the neural representation. Some redundancy, however, would be expected because it can provide robustness to noise inherent in the system. Encoding the raw sensory signal itself is also problematic, because it contains distortion and noise. The optimal solution would be constrained further by limited biological resources. Here, we analyze a simple theoretical model that incorporates these key aspects of sensory coding, and apply it to conditions in the retina. The model specifies the optimal way to incorporate redundancy in a population of noisy neurons, while also optimally compensating for sensory distortion and noise. Importantly, it allows an arbitrary input-to-output cell ratio between sensory units (photoreceptors) and encoding units (retinal ganglion cells), providing predictions of retinal codes at different eccentricities. Compared to earlier models based on redundancy reduction, the proposed model conveys more information about the original signal. Interestingly, redundancy reduction can be near-optimal when the number of encoding units is limited, such as in the peripheral retina. We show that there exist multiple, equally-optimal solutions whose receptive field structure and organization vary significantly. Among these, the one which maximizes the spatial locality of the computation, but not the sparsity of either synaptic weights or neural responses, is consistent with known basic properties of retinal receptive fields. The model further predicts that receptive field structure changes less with light adaptation at higher input-to-output cell ratios, such as in the periphery.
A Scalable and Accurate Targeted Gene Assembly Tool (SAT-Assembler) for Next-Generation Sequencing Data
by Yuan Zhang, Yanni Sun, James R. ColeGene assembly, which recovers gene segments from short reads, is an important step in functional analysis of next-generation sequencing data. Lacking quality reference genomes, de novo assembly is commonly used for RNA-Seq data of non-model organisms and metagenomic data. However, heterogeneous sequence coverage caused by heterogeneous expression or species abundance, similarity between isoforms or homologous genes, and large data size all pose challenges to de novo assembly. As a result, existing assembly tools tend to output fragmented contigs or chimeric contigs, or have high memory footprint. In this work, we introduce a targeted gene assembly program SAT-Assembler, which aims to recover gene families of particular interest to biologists. It addresses the above challenges by conducting family-specific homology search, homology-guided overlap graph construction, and careful graph traversal. It can be applied to both RNA-Seq and metagenomic data. Our experimental results on an Arabidopsis RNA-Seq data set and two metagenomic data sets show that SAT-Assembler has smaller memory usage, comparable or better gene coverage, and lower chimera rate for assembling a set of genes from one or multiple pathways compared with other assembly tools. Moreover, the family-specific design and rapid homology search allow SAT-Assembler to be naturally compatible with parallel computing platforms. The source code of SAT-Assembler is available at https://sourceforge.net/projects/sat-assembler/. The data sets and experimental settings can be found in supplementary material.
Epigenetic Landscapes Explain Partially Reprogrammed Cells and Identify Key Reprogramming Genes
by Alex H. Lang, Hu Li, James J. Collins, Pankaj MehtaA common metaphor for describing development is a rugged “epigenetic landscape” where cell fates are represented as attracting valleys resulting from a complex regulatory network. Here, we introduce a framework for explicitly constructing epigenetic landscapes that combines genomic data with techniques from spin-glass physics. Each cell fate is a dynamic attractor, yet cells can change fate in response to external signals. Our model suggests that partially reprogrammed cells are a natural consequence of high-dimensional landscapes, and predicts that partially reprogrammed cells should be hybrids that co-express genes from multiple cell fates. We verify this prediction by reanalyzing existing datasets. Our model reproduces known reprogramming protocols and identifies candidate transcription factors for reprogramming to novel cell fates, suggesting epigenetic landscapes are a powerful paradigm for understanding cellular identity.
by Tetsuhiro S. Hatakeyama, Kunihiko KanekoCellular memory, which allows cells to retain information from their environment, is important for a variety of cellular functions, such as adaptation to external stimuli, cell differentiation, and synaptic plasticity. Although posttranslational modifications have received much attention as a source of cellular memory, the mechanisms directing such alterations have not been fully uncovered. It may be possible to embed memory in multiple stable states in dynamical systems governing modifications. However, several experiments on modifications of proteins suggest long-term relaxation depending on experienced external conditions, without explicit switches over multi-stable states. As an alternative to a multistability memory scheme, we propose “kinetic memory” for epigenetic cellular memory, in which memory is stored as a slow-relaxation process far from a stable fixed state. Information from previous environmental exposure is retained as the long-term maintenance of a cellular state, rather than switches over fixed states. To demonstrate this kinetic memory, we study several models in which multimeric proteins undergo catalytic modifications (e.g., phosphorylation and methylation), and find that a slow relaxation process of the modification state, logarithmic in time, appears when the concentration of a catalyst (enzyme) involved in the modification reactions is lower than that of the substrates. Sharp transitions from a normal fast-relaxation phase into this slow-relaxation phase are revealed, and explained by enzyme-limited competition among modification reactions. The slow-relaxation process is confirmed by simulations of several models of catalytic reactions of protein modifications, and it enables the memorization of external stimuli, as its time course depends crucially on the history of the stimuli. This kinetic memory provides novel insight into a broad class of cellular memory and functions. In particular, applications for long-term potentiation are discussed, including dynamic modifications of calcium-calmodulin kinase II and cAMP-response element-binding protein essential for synaptic plasticity.
Mechanical Cell-Matrix Feedback Explains Pairwise and Collective Endothelial Cell Behavior In Vitro
by René F. M. van Oers, Elisabeth G. Rens, Danielle J. LaValley, Cynthia A. Reinhart-King, Roeland M. H. MerksIn vitro cultures of endothelial cells are a widely used model system of the collective behavior of endothelial cells during vasculogenesis and angiogenesis. When seeded in an extracellular matrix, endothelial cells can form blood vessel-like structures, including vascular networks and sprouts. Endothelial morphogenesis depends on a large number of chemical and mechanical factors, including the compliancy of the extracellular matrix, the available growth factors, the adhesion of cells to the extracellular matrix, cell-cell signaling, etc. Although various computational models have been proposed to explain the role of each of these biochemical and biomechanical effects, the understanding of the mechanisms underlying in vitro angiogenesis is still incomplete. Most explanations focus on predicting the whole vascular network or sprout from the underlying cell behavior, and do not check if the same model also correctly captures the intermediate scale: the pairwise cell-cell interactions or single cell responses to ECM mechanics. Here we show, using a hybrid cellular Potts and finite element computational model, that a single set of biologically plausible rules describing (a) the contractile forces that endothelial cells exert on the ECM, (b) the resulting strains in the extracellular matrix, and (c) the cellular response to the strains, suffices for reproducing the behavior of individual endothelial cells and the interactions of endothelial cell pairs in compliant matrices. With the same set of rules, the model also reproduces network formation from scattered cells, and sprouting from endothelial spheroids. Combining the present mechanical model with aspects of previously proposed mechanical and chemical models may lead to a more complete understanding of in vitro angiogenesis.
Optimal Behavioral Hierarchy
by Alec Solway, Carlos Diuk, Natalia Córdova, Debbie Yee, Andrew G. Barto, Yael Niv, Matthew M. BotvinickHuman behavior has long been recognized to display hierarchical structure: actions fit together into subtasks, which cohere into extended goal-directed activities. Arranging actions hierarchically has well established benefits, allowing behaviors to be represented efficiently by the brain, and allowing solutions to new tasks to be discovered easily. However, these payoffs depend on the particular way in which actions are organized into a hierarchy, the specific way in which tasks are carved up into subtasks. We provide a mathematical account for what makes some hierarchies better than others, an account that allows an optimal hierarchy to be identified for any set of tasks. We then present results from four behavioral experiments, suggesting that human learners spontaneously discover optimal action hierarchies.
Construction and Validation of a Regulatory Network for Pluripotency and Self-Renewal of Mouse Embryonic Stem Cells
by Huilei Xu, Yen-Sin Ang, Ana Sevilla, Ihor R. Lemischka, Avi Ma'ayanA 30-node signed and directed network responsible for self-renewal and pluripotency of mouse embryonic stem cells (mESCs) was extracted from several ChIP-Seq and knockdown followed by expression prior studies. The underlying regulatory logic among network components was then learned using the initial network topology and single cell gene expression measurements from mESCs cultured in serum/LIF or serum-free 2i/LIF conditions. Comparing the learned network regulatory logic derived from cells cultured in serum/LIF vs. 2i/LIF revealed differential roles for Nanog, Oct4/Pou5f1, Sox2, Esrrb and Tcf3. Overall, gene expression in the serum/LIF condition was more variable than in the 2i/LIF but mostly consistent across the two conditions. Expression levels for most genes in single cells were bimodal across the entire population and this motivated a Boolean modeling approach. In silico predictions derived from removal of nodes from the Boolean dynamical model were validated with experimental single and combinatorial RNA interference (RNAi) knockdowns of selected network components. Quantitative post-RNAi expression level measurements of remaining network components showed good agreement with the in silico predictions. Computational removal of nodes from the Boolean network model was also used to predict lineage specification outcomes. In summary, data integration, modeling, and targeted experiments were used to improve our understanding of the regulatory topology that controls mESC fate decisions as well as to develop robust directed lineage specification protocols.
by Anne-Florence Bitbol, David J. SchwabNatural selection drives populations towards higher fitness, but crossing fitness valleys or plateaus may facilitate progress up a rugged fitness landscape involving epistasis. We investigate quantitatively the effect of subdividing an asexual population on the time it takes to cross a fitness valley or plateau. We focus on a generic and minimal model that includes only population subdivision into equivalent demes connected by global migration, and does not require significant size changes of the demes, environmental heterogeneity or specific geographic structure. We determine the optimal speedup of valley or plateau crossing that can be gained by subdivision, if the process is driven by the deme that crosses fastest. We show that isolated demes have to be in the sequential fixation regime for subdivision to significantly accelerate crossing. Using Markov chain theory, we obtain analytical expressions for the conditions under which optimal speedup is achieved: valley or plateau crossing by the subdivided population is then as fast as that of its fastest deme. We verify our analytical predictions through stochastic simulations. We demonstrate that subdivision can substantially accelerate the crossing of fitness valleys and plateaus in a wide range of parameters extending beyond the optimal window. We study the effect of varying the degree of subdivision of a population, and investigate the trade-off between the magnitude of the optimal speedup and the width of the parameter range over which it occurs. Our results, obtained for fitness valleys and plateaus, also hold for weakly beneficial intermediate mutations. Finally, we extend our work to the case of a population connected by migration to one or several smaller islands. Our results demonstrate that subdivision with migration alone can significantly accelerate the crossing of fitness valleys and plateaus, and shed light onto the quantitative conditions necessary for this to occur.
Top-Down Inputs Enhance Orientation Selectivity in Neurons of the Primary Visual Cortex during Perceptual Learning
by Samat Moldakarimov, Maxim Bazhenov, Terrence J. SejnowskiPerceptual learning has been used to probe the mechanisms of cortical plasticity in the adult brain. Feedback projections are ubiquitous in the cortex, but little is known about their role in cortical plasticity. Here we explore the hypothesis that learning visual orientation discrimination involves learning-dependent plasticity of top-down feedback inputs from higher cortical areas, serving a different function from plasticity due to changes in recurrent connections within a cortical area. In a Hodgkin-Huxley-based spiking neural network model of visual cortex, we show that modulation of feedback inputs to V1 from higher cortical areas results in shunting inhibition in V1 neurons, which changes the response properties of V1 neurons. The orientation selectivity of V1 neurons is enhanced without changing orientation preference, preserving the topographic organizations in V1. These results provide new insights to the mechanisms of plasticity in the adult brain, reconciling apparently inconsistent experiments and providing a new hypothesis for a functional role of the feedback connections.
by Seungyeul Yoo, Tao Huang, Joshua D. Campbell, Eunjee Lee, Zhidong Tu, Mark W. Geraci, Charles A. Powell, Eric E. Schadt, Avrum Spira, Jun ZhuErrors in sample annotation or labeling often occur in large-scale genetic or genomic studies and are difficult to avoid completely during data generation and management. For integrative genomic studies, it is critical to identify and correct these errors. Different types of genetic and genomic data are inter-connected by cis-regulations. On that basis, we developed a computational approach, Multi-Omics Data Matcher (MODMatcher), to identify and correct sample labeling errors in multiple types of molecular data, which can be used in further integrative analysis. Our results indicate that inspection of sample annotation and labeling error is an indispensable data quality assurance step. Applied to a large lung genomic study, MODMatcher increased statistically significant genetic associations and genomic correlations by more than two-fold. In a simulation study, MODMatcher provided more robust results by using three types of omics data than two types of omics data. We further demonstrate that MODMatcher can be broadly applied to large genomic data sets containing multiple types of omics data, such as The Cancer Genome Atlas (TCGA) data sets.
The Protective Role of Symmetric Stem Cell Division on the Accumulation of Heritable Damage
by Peter T. McHale, Arthur D. LanderStem cell divisions are either asymmetric—in which one daughter cell remains a stem cell and one does not—or symmetric, in which both daughter cells adopt the same fate, either stem or non-stem. Recent studies show that in many tissues operating under homeostatic conditions stem cell division patterns are strongly biased toward the symmetric outcome, raising the question of whether symmetry confers some benefit. Here, we show that symmetry, via extinction of damaged stem-cell clones, reduces the lifetime risk of accumulating phenotypically silent heritable damage (mutations or aberrant epigenetic changes) in individual stem cells. This effect is greatest in rapidly cycling tissues subject to accelerating rates of damage accumulation over time, a scenario that describes the progression of many cancers. A decrease in the rate of cellular damage accumulation may be an important factor favoring symmetric patterns of stem cell division.
by Yujiang Wang, Marc Goodfellow, Peter Neal Taylor, Gerold BaierRecent experimental and clinical studies have provided diverse insight into the mechanisms of human focal seizure initiation and propagation. Often these findings exist at different scales of observation, and are not reconciled into a common understanding. Here we develop a new, multiscale mathematical model of cortical electric activity with realistic mesoscopic connectivity. Relating the model dynamics to experimental and clinical findings leads us to propose three classes of dynamical mechanisms for the onset of focal seizures in a unified framework. These three classes are: (i) globally induced focal seizures; (ii) globally supported focal seizures; (iii) locally induced focal seizures. Using model simulations we illustrate these onset mechanisms and show how the three classes can be distinguished. Specifically, we find that although all focal seizures typically appear to arise from localised tissue, the mechanisms of onset could be due to either localised processes or processes on a larger spatial scale. We conclude that although focal seizures might have different patient-specific aetiologies and electrographic signatures, our model suggests that dynamically they can still be classified in a clinically useful way. Additionally, this novel classification according to the dynamical mechanisms is able to resolve some of the previously conflicting experimental and clinical findings.
by Qiang Cui, Ruth Nussinov