PLoS Computational Biology
Wikipedia Usage Estimates Prevalence of Influenza-Like Illness in the United States in Near Real-Time
by David J. McIver, John S. BrownsteinCirculating levels of both seasonal and pandemic influenza require constant surveillance to ensure the health and safety of the population. While up-to-date information is critical, traditional surveillance systems can have data availability lags of up to two weeks. We introduce a novel method of estimating, in near-real time, the level of influenza-like illness (ILI) in the United States (US) by monitoring the rate of particular Wikipedia article views on a daily basis. We calculated the number of times certain influenza- or health-related Wikipedia articles were accessed each day between December 2007 and August 2013 and compared these data to official ILI activity levels provided by the Centers for Disease Control and Prevention (CDC). We developed a Poisson model that accurately estimates the level of ILI activity in the American population, up to two weeks ahead of the CDC, with an absolute average difference between the two estimates of just 0.27% over 294 weeks of data. Wikipedia-derived ILI models performed well through both abnormally high media coverage events (such as during the 2009 H1N1 pandemic) as well as unusually severe influenza seasons (such as the 2012–2013 influenza season). Wikipedia usage accurately estimated the week of peak ILI activity 17% more often than Google Flu Trends data and was often more accurate in its measure of ILI intensity. With further study, this method could potentially be implemented for continuous monitoring of ILI activity in the US and to provide support for traditional influenza surveillance tools.
How the Brain Decides When to Work and When to Rest: Dissociation of Implicit-Reactive from Explicit-Predictive Computational Processes
by Florent Meyniel, Lou Safra, Mathias PessiglioneA pervasive case of cost-benefit problem is how to allocate effort over time, i.e. deciding when to work and when to rest. An economic decision perspective would suggest that duration of effort is determined beforehand, depending on expected costs and benefits. However, the literature on exercise performance emphasizes that decisions are made on the fly, depending on physiological variables. Here, we propose and validate a general model of effort allocation that integrates these two views. In this model, a single variable, termed cost evidence, accumulates during effort and dissipates during rest, triggering effort cessation and resumption when reaching bounds. We assumed that such a basic mechanism could explain implicit adaptation, whereas the latent parameters (slopes and bounds) could be amenable to explicit anticipation. A series of behavioral experiments manipulating effort duration and difficulty was conducted in a total of 121 healthy humans to dissociate implicit-reactive from explicit-predictive computations. Results show 1) that effort and rest durations are adapted on the fly to variations in cost-evidence level, 2) that the cost-evidence fluctuations driving the behavior do not match explicit ratings of exhaustion, and 3) that actual difficulty impacts effort duration whereas expected difficulty impacts rest duration. Taken together, our findings suggest that cost evidence is implicitly monitored online, with an accumulation rate proportional to actual task difficulty. In contrast, cost-evidence bounds and dissipation rate might be adjusted in anticipation, depending on explicit task difficulty.
Computational Prediction of Alanine Scanning and Ligand Binding Energetics in G-Protein Coupled Receptors
by Lars Boukharta, Hugo Gutiérrez-de-Terán, Johan ÅqvistSite-directed mutagenesis combined with binding affinity measurements is widely used to probe the nature of ligand interactions with GPCRs. Such experiments, as well as structure-activity relationships for series of ligands, are usually interpreted with computationally derived models of ligand binding modes. However, systematic approaches for accurate calculations of the corresponding binding free energies are still lacking. Here, we report a computational strategy to quantitatively predict the effects of alanine scanning and ligand modifications based on molecular dynamics free energy simulations. A smooth stepwise scheme for free energy perturbation calculations is derived and applied to a series of thirteen alanine mutations of the human neuropeptide Y1 receptor and series of eight analogous antagonists. The robustness and accuracy of the method enables univocal interpretation of existing mutagenesis and binding data. We show how these calculations can be used to validate structural models and demonstrate their ability to discriminate against suboptimal ones.
by Robert Brown, Bogdan PasaniucInferring the ancestry at each locus in the genome of recently admixed individuals (e.g., Latino Americans) plays a major role in medical and population genetic inferences, ranging from finding disease-risk loci, to inferring recombination rates, to mapping missing contigs in the human genome. Although many methods for local ancestry inference have been proposed, most are designed for use with genotyping arrays and fail to make use of the full spectrum of data available from sequencing. In addition, current haplotype-based approaches are very computationally demanding, requiring large computational time for moderately large sample sizes. Here we present new methods for local ancestry inference that leverage continent-specific variants (CSVs) to attain increased performance over existing approaches in sequenced admixed genomes. A key feature of our approach is that it incorporates the admixed genomes themselves jointly with public datasets, such as 1000 Genomes, to improve the accuracy of CSV calling. We use simulations to show that our approach attains accuracy similar to widely used computationally intensive haplotype-based approaches with large decreases in runtime. Most importantly, we show that our method recovers comparable local ancestries, as the 1000 Genomes consensus local ancestry calls in the real admixed individuals from the 1000 Genomes Project. We extend our approach to account for low-coverage sequencing and show that accurate local ancestry inference can be attained at low sequencing coverage. Finally, we generalize CSVs to sub-continental population-specific variants (sCSVs) and show that in some cases it is possible to determine the sub-continental ancestry for short chromosomal segments on the basis of sCSVs.
Continuous Attractor Network Model for Conjunctive Position-by-Velocity Tuning of Grid Cells
by Bailu Si, Sandro Romani, Misha TsodyksThe spatial responses of many of the cells recorded in layer II of rodent medial entorhinal cortex (MEC) show a triangular grid pattern, which appears to provide an accurate population code for animal spatial position. In layer III, V and VI of the rat MEC, grid cells are also selective to head-direction and are modulated by the speed of the animal. Several putative mechanisms of grid-like maps were proposed, including attractor network dynamics, interactions with theta oscillations or single-unit mechanisms such as firing rate adaptation. In this paper, we present a new attractor network model that accounts for the conjunctive position-by-velocity selectivity of grid cells. Our network model is able to perform robust path integration even when the recurrent connections are subject to random perturbations.
by Lander Willem, Sean Stijven, Ekaterina Vladislavleva, Jan Broeckhove, Philippe Beutels, Niel HensModeling plays a major role in policy making, especially for infectious disease interventions but such models can be complex and computationally intensive. A more systematic exploration is needed to gain a thorough systems understanding. We present an active learning approach based on machine learning techniques as iterative surrogate modeling and model-guided experimentation to systematically analyze both common and edge manifestations of complex model runs. Symbolic regression is used for nonlinear response surface modeling with automatic feature selection. First, we illustrate our approach using an individual-based model for influenza vaccination. After optimizing the parameter space, we observe an inverse relationship between vaccination coverage and cumulative attack rate reinforced by herd immunity. Second, we demonstrate the use of surrogate modeling techniques on input-response data from a deterministic dynamic model, which was designed to explore the cost-effectiveness of varicella-zoster virus vaccination. We use symbolic regression to handle high dimensionality and correlated inputs and to identify the most influential variables. Provided insight is used to focus research, reduce dimensionality and decrease decision uncertainty. We conclude that active learning is needed to fully understand complex systems behavior. Surrogate models can be readily explored at no computational expense, and can also be used as emulator to improve rapid policy making in various settings.
Timing of Neuropeptide Coupling Determines Synchrony and Entrainment in the Mammalian Circadian Clock
by Bharath Ananthasubramaniam, Erik D. Herzog, Hanspeter HerzelRobust synchronization is a critical feature of several systems including the mammalian circadian clock. The master circadian clock in mammals consists of about 20000 ‘sloppy’ neuronal oscillators within the hypothalamus that keep robust time by synchronization driven by inter-neuronal coupling. The complete understanding of this synchronization in the mammalian circadian clock and the mechanisms underlying it remain an open question. Experiments and computational studies have shown that coupling individual oscillators can achieve robust synchrony, despite heterogeneity and different network topologies. But, much less is known regarding the mechanisms and circuits involved in achieving this coupling, due to both system complexity and experimental limitations. Here, we computationally study the coupling mediated by the primary coupling neuropeptide, vasoactive intestinal peptide (VIP) and its canonical receptor, VPAC2R, using the transcriptional elements and generic mode of VIP-VPAC2R signaling. We find that synchrony is only possible if VIP (an inducer of Per expression) is released in-phase with activators of Per expression. Moreover, anti-phasic VIP release suppresses coherent rhythms by moving the network into a desynchronous state. Importantly, experimentally observed rhythms in VPAC2R have little effect on network synchronization, but can improve the amplitude of the SCN network rhythms while narrowing the network entrainment range. We further show that these findings are valid across several computational network models. Thus, we identified a general design principle to achieve robust synchronization: An activating coupling agent, such as VIP, must act in-phase with the activity of core-clock promoters. More generally, the phase of coupling is as critical as the strength of coupling from the viewpoint of synchrony and entrainment.
by Christiana N. Fogg, Diane E. Kovats
Impact of Different Oseltamivir Regimens on Treating Influenza A Virus Infection and Resistance Emergence: Insights from a Modelling Study
by Laetitia Canini, Jessica M. Conway, Alan S. Perelson, Fabrice CarratSeveral studies have proven oseltamivir to be efficient in reducing influenza viral titer and symptom intensity. However, the usefulness of oseltamivir can be compromised by the emergence and spread of drug-resistant virus. The selective pressure exerted by different oseltamivir therapy regimens have received little attention. Combining models of drug pharmacokinetics, pharmacodynamics, viral kinetics and symptom dynamics, we explored the efficacy of oseltamivir in reducing both symptoms (symptom efficacy) and viral load (virological efficacy). We simulated samples of 1000 subjects using previously estimated between-subject variability in viral and symptom dynamic parameters to describe the observed heterogeneity in a patient population. We simulated random mutations conferring resistance to oseltamivir. We explored the effect of therapy initiation time, dose, intake frequency and therapy duration on influenza infection, illness dynamics, and emergence of viral resistance. Symptom and virological efficacies were strongly associated with therapy initiation time. The proportion of subjects shedding resistant virus was 27-fold higher when prophylaxis was initiated during the incubation period compared with no treatment. It fell to below 1% when treatment was initiated after symptom onset for twice-a-day intakes. Lower doses and prophylaxis regimens led to lower efficacies and increased risk of resistance emergence. We conclude that prophylaxis initiated during the incubation period is the main factor leading to resistance emergence.
by David A. Rasmussen, Erik M. Volz, Katia KoelleCoalescent theory is routinely used to estimate past population dynamics and demographic parameters from genealogies. While early work in coalescent theory only considered simple demographic models, advances in theory have allowed for increasingly complex demographic scenarios to be considered. The success of this approach has lead to coalescent-based inference methods being applied to populations with rapidly changing population dynamics, including pathogens like RNA viruses. However, fitting epidemiological models to genealogies via coalescent models remains a challenging task, because pathogen populations often exhibit complex, nonlinear dynamics and are structured by multiple factors. Moreover, it often becomes necessary to consider stochastic variation in population dynamics when fitting such complex models to real data. Using recently developed structured coalescent models that accommodate complex population dynamics and population structure, we develop a statistical framework for fitting stochastic epidemiological models to genealogies. By combining particle filtering methods with Bayesian Markov chain Monte Carlo methods, we are able to fit a wide class of stochastic, nonlinear epidemiological models with different forms of population structure to genealogies. We demonstrate our framework using two structured epidemiological models: a model with disease progression between multiple stages of infection and a two-population model reflecting spatial structure. We apply the multi-stage model to HIV genealogies and show that the proposed method can be used to estimate the stage-specific transmission rates and prevalence of HIV. Finally, using the two-population model we explore how much information about population structure is contained in genealogies and what sample sizes are necessary to reliably infer parameters like migration rates.
by Andrea Insabato, Laura Dempere-Marco, Mario Pannunzi, Gustavo Deco, Ranulfo RomoDecision making is a process of utmost importance in our daily lives, the study of which has been receiving notable attention for decades. Nevertheless, the neural mechanisms underlying decision making are still not fully understood. Computational modeling has revealed itself as a valuable asset to address some of the fundamental questions. Biophysically plausible models, in particular, are useful in bridging the different levels of description that experimental studies provide, from the neural spiking activity recorded at the cellular level to the performance reported at the behavioral level. In this article, we have reviewed some of the recent progress made in the understanding of the neural mechanisms that underlie decision making. We have performed a critical evaluation of the available results and address, from a computational perspective, aspects of both experimentation and modeling that so far have eluded comprehension. To guide the discussion, we have selected a central theme which revolves around the following question: how does the spatiotemporal structure of sensory stimuli affect the perceptual decision-making process? This question is a timely one as several issues that still remain unresolved stem from this central theme. These include: (i) the role of spatiotemporal input fluctuations in perceptual decision making, (ii) how to extend the current results and models derived from two-alternative choice studies to scenarios with multiple competing evidences, and (iii) to establish whether different types of spatiotemporal input fluctuations affect decision-making outcomes in distinctive ways. And although we have restricted our discussion mostly to visual decisions, our main conclusions are arguably generalizable; hence, their possible extension to other sensory modalities is one of the points in our discussion.
by Roland F. Schwarz, Anne Trinh, Botond Sipos, James D. Brenton, Nick Goldman, Florian MarkowetzIntra-tumour genetic heterogeneity is the result of ongoing evolutionary change within each cancer. The expansion of genetically distinct sub-clonal populations may explain the emergence of drug resistance, and if so, would have prognostic and predictive utility. However, methods for objectively quantifying tumour heterogeneity have been missing and are particularly difficult to establish in cancers where predominant copy number variation prevents accurate phylogenetic reconstruction owing to horizontal dependencies caused by long and cascading genomic rearrangements. To address these challenges, we present MEDICC, a method for phylogenetic reconstruction and heterogeneity quantification based on a Minimum Event Distance for Intra-tumour Copy-number Comparisons. Using a transducer-based pairwise comparison function, we determine optimal phasing of major and minor alleles, as well as evolutionary distances between samples, and are able to reconstruct ancestral genomes. Rigorous simulations and an extensive clinical study show the power of our method, which outperforms state-of-the-art competitors in reconstruction accuracy, and additionally allows unbiased numerical quantification of tumour heterogeneity. Accurate quantification and evolutionary inference are essential to understand the functional consequences of tumour heterogeneity. The MEDICC algorithms are independent of the experimental techniques used and are applicable to both next-generation sequencing and array CGH data.
Prediction and Prioritization of Rare Oncogenic Mutations in the Cancer Kinome Using Novel Features and Multiple Classifiers
by ManChon U, Eric Talevich, Samiksha Katiyar, Khaled Rasheed, Natarajan KannanCancer is a genetic disease that develops through a series of somatic mutations, a subset of which drive cancer progression. Although cancer genome sequencing studies are beginning to reveal the mutational patterns of genes in various cancers, identifying the small subset of “causative” mutations from the large subset of “non-causative” mutations, which accumulate as a consequence of the disease, is a challenge. In this article, we present an effective machine learning approach for identifying cancer-associated mutations in human protein kinases, a class of signaling proteins known to be frequently mutated in human cancers. We evaluate the performance of 11 well known supervised learners and show that a multiple-classifier approach, which combines the performances of individual learners, significantly improves the classification of known cancer-associated mutations. We introduce several novel features related specifically to structural and functional characteristics of protein kinases and find that the level of conservation of the mutated residue at specific evolutionary depths is an important predictor of oncogenic effect. We consolidate the novel features and the multiple-classifier approach to prioritize and experimentally test a set of rare unconfirmed mutations in the epidermal growth factor receptor tyrosine kinase (EGFR). Our studies identify T725M and L861R as rare cancer-associated mutations inasmuch as these mutations increase EGFR activity in the absence of the activating EGF ligand in cell-based assays.
A Synergism between Adaptive Effects and Evolvability Drives Whole Genome Duplication to Fixation
by Thomas D. Cuypers, Paulien HogewegWhole genome duplication has shaped eukaryotic evolutionary history and has been associated with drastic environmental change and species radiation. While the most common fate of WGD duplicates is a return to single copy, retained duplicates have been found enriched for highly interacting genes. This pattern has been explained by a neutral process of subfunctionalization and more recently, dosage balance selection. However, much about the relationship between environmental change, WGD and adaptation remains unknown. Here, we study the duplicate retention pattern postWGD, by letting virtual cells adapt to environmental changes. The virtual cells have structured genomes that encode a regulatory network and simple metabolism. Populations are under selection for homeostasis and evolve by point mutations, small indels and WGD. After populations had initially adapted fully to fluctuating resource conditions re-adaptation to a broad range of novel environments was studied by tracking mutations in the line of descent. WGD was established in a minority (≈30%) of lineages, yet, these were significantly more successful at re-adaptation. Unexpectedly, WGD lineages conserved more seemingly redundant genes, yet had higher per gene mutation rates. While WGD duplicates of all functional classes were significantly over-retained compared to a model of neutral losses, duplicate retention was clearly biased towards highly connected TFs. Importantly, no subfunctionalization occurred in conserved pairs, strongly suggesting that dosage balance shaped retention. Meanwhile, singles diverged significantly. WGD, therefore, is a powerful mechanism to cope with environmental change, allowing conservation of a core machinery, while adapting the peripheral network to accommodate change.
by Hamed Nili, Cai Wingfield, Alexander Walther, Li Su, William Marslen-Wilson, Nikolaus KriegeskorteNeuronal population codes are increasingly being investigated with multivariate pattern-information analyses. A key challenge is to use measured brain-activity patterns to test computational models of brain information processing. One approach to this problem is representational similarity analysis (RSA), which characterizes a representation in a brain or computational model by the distance matrix of the response patterns elicited by a set of stimuli. The representational distance matrix encapsulates what distinctions between stimuli are emphasized and what distinctions are de-emphasized in the representation. A model is tested by comparing the representational distance matrix it predicts to that of a measured brain region. RSA also enables us to compare representations between stages of processing within a given brain or model, between brain and behavioral data, and between individuals and species. Here, we introduce a Matlab toolbox for RSA. The toolbox supports an analysis approach that is simultaneously data- and hypothesis-driven. It is designed to help integrate a wide range of computational models into the analysis of multichannel brain-activity measurements as provided by modern functional imaging and neuronal recording techniques. Tools for visualization and inference enable the user to relate sets of models to sets of brain regions and to statistically test and compare the models using nonparametric inference methods. The toolbox supports searchlight-based RSA, to continuously map a measured brain volume in search of a neuronal population code with a specific geometry. Finally, we introduce the linear-discriminant t value as a measure of representational discriminability that bridges the gap between linear decoding analyses and RSA. In order to demonstrate the capabilities of the toolbox, we apply it to both simulated and real fMRI data. The key functions are equally applicable to other modalities of brain-activity measurement. The toolbox is freely available to the community under an open-source license agreement (http://www.mrc-cbu.cam.ac.uk/methods-and-resources/toolboxes/license/).
by The PLOS Computational Biology Staff
by Jeffrey P. Perley, Judith Mikolajczak, Marietta L. Harrison, Gregery T. Buzzard, Ann E. RundellComputational approaches to tune the activation of intracellular signal transduction pathways both predictably and selectively will enable researchers to explore and interrogate cell biology with unprecedented precision. Techniques to control complex nonlinear systems typically involve the application of control theory to a descriptive mathematical model. For cellular processes, however, measurement assays tend to be too time consuming for real-time feedback control and models offer rough approximations of the biological reality, thus limiting their utility when considered in isolation. We overcome these problems by combining nonlinear model predictive control with a novel adaptive weighting algorithm that blends predictions from multiple models to derive a compromise open-loop control sequence. The proposed strategy uses weight maps to inform the controller of the tendency for models to differ in their ability to accurately reproduce the system dynamics under different experimental perturbations (i.e. control inputs). These maps, which characterize the changing model likelihoods over the admissible control input space, are constructed using preexisting experimental data and used to produce a model-based open-loop control framework. In effect, the proposed method designs a sequence of control inputs that force the signaling dynamics along a predefined temporal response without measurement feedback while mitigating the effects of model uncertainty. We demonstrate this technique on the well-known Erk/MAPK signaling pathway in T cells. In silico assessment demonstrates that this approach successfully reduces target tracking error by 52% or better when compared with single model-based controllers and non-adaptive multiple model-based controllers. In vitro implementation of the proposed approach in Jurkat cells confirms a 63% reduction in tracking error when compared with the best of the single-model controllers. This study provides an experimentally-corroborated control methodology that utilizes the knowledge encoded within multiple mathematical models of intracellular signaling to design control inputs that effectively direct cell behavior in open-loop.
by Michelle D. Brazas, Fran Lewitter, Maria Victoria Schneider, Celia W. G. van Gelder, Patricia M. Palagi
rDock: A Fast, Versatile and Open Source Program for Docking Ligands to Proteins and Nucleic Acids
by Sergio Ruiz-Carmona, Daniel Alvarez-Garcia, Nicolas Foloppe, A. Beatriz Garmendia-Doval, Szilveszter Juhos, Peter Schmidtke, Xavier Barril, Roderick E. Hubbard, S. David MorleyIdentification of chemical compounds with specific biological activities is an important step in both chemical biology and drug discovery. When the structure of the intended target is available, one approach is to use molecular docking programs to assess the chemical complementarity of small molecules with the target; such calculations provide a qualitative measure of affinity that can be used in virtual screening (VS) to rank order a list of compounds according to their potential to be active. rDock is a molecular docking program developed at Vernalis for high-throughput VS (HTVS) applications. Evolved from RiboDock, the program can be used against proteins and nucleic acids, is designed to be computationally very efficient and allows the user to incorporate additional constraints and information as a bias to guide docking. This article provides an overview of the program structure and features and compares rDock to two reference programs, AutoDock Vina (open source) and Schrödinger's Glide (commercial). In terms of computational speed for VS, rDock is faster than Vina and comparable to Glide. For binding mode prediction, rDock and Vina are superior to Glide. The VS performance of rDock is significantly better than Vina, but inferior to Glide for most systems unless pharmacophore constraints are used; in that case rDock and Glide are of equal performance. The program is released under the Lesser General Public License and is freely available for download, together with the manuals, example files and the complete test sets, at http://rdock.sourceforge.net/
Atomistic Picture for the Folding Pathway of a Hybrid-1 Type Human Telomeric DNA G-quadruplex
by Yunqiang Bian, Cheng Tan, Jun Wang, Yuebiao Sheng, Jian Zhang, Wei WangIn this work we studied the folding process of the hybrid-1 type human telomeric DNA G-quadruplex with solvent and ions explicitly modeled. Enabled by the powerful bias-exchange metadynamics and large-scale conventional molecular dynamic simulations, the free energy landscape of this G-DNA was obtained for the first time and four folding intermediates were identified, including a triplex and a basically formed quadruplex. The simulations also provided atomistic pictures for the structures and cation binding patterns of the intermediates. The results showed that the structure formation and cation binding are cooperative and mutually supporting each other. The syn/anti reorientation dynamics of the intermediates was also investigated. It was found that the nucleotides usually take correct syn/anti configurations when they form native and stable hydrogen bonds with the others, while fluctuating between two configurations when they do not. Misfolded intermediates with wrong syn/anti configurations were observed in the early intermediates but not in the later ones. Based on the simulations, we also discussed the roles of the non-native interactions. Besides, the formation process of the parallel conformation in the first two G-repeats and the associated reversal loop were studied. Based on the above results, we proposed a folding pathway for the hybrid-1 type G-quadruplex with atomistic details, which is new and more complete compared with previous ones. The knowledge gained for this type of G-DNA may provide a general insight for the folding of the other G-quadruplexes.