PLoS Computational Biology
A Data-Driven Mathematical Model of CA-MRSA Transmission among Age Groups: Evaluating the Effect of Control Interventions
by Xiaoxia Wang, Sarada Panchanathan, Gerardo ChowellCommunity associated methicillin-resistant Staphylococcus aureus (CA-MRSA) has become a major cause of skin and soft tissue infections (SSTIs) in the US. We developed an age-structured compartmental model to study the spread of CA-MRSA at the population level and assess the effect of control intervention strategies. We used Monte-Carlo Markov Chain (MCMC) techniques to parameterize our model using monthly time series data on SSTIs incidence in children (≤19 years) during January 2004 -December 2006 in Maricopa County, Arizona. Our model-based forecast for the period January 2007–December 2008 also provided a good fit to data. We also carried out an uncertainty and sensitivity analysis on the control reproduction number, which we estimated at 1.3 (95% CI [1.2,1.4]) based on the model fit to data. Using our calibrated model, we evaluated the effect of typical intervention strategies namely reducing the contact rate of infected individuals owing to awareness of infection and decolonization strategies targeting symptomatic infected individuals on both and the long-term disease dynamics. We also evaluated the impact of hypothetical decolonization strategies targeting asymptomatic colonized individuals. We found that strategies focused on infected individuals were not capable of achieving disease control when implemented alone or in combination. In contrast, our results suggest that decolonization strategies targeting the pediatric population colonized with CA-MRSA have the potential of achieving disease elimination.
Dread and the Disvalue of Future Pain
by Giles W. Story, Ivaylo Vlaev, Ben Seymour, Joel S. Winston, Ara Darzi, Raymond J. DolanStandard theories of decision-making involving delayed outcomes predict that people should defer a punishment, whilst advancing a reward. In some cases, such as pain, people seem to prefer to expedite punishment, implying that its anticipation carries a cost, often conceptualized as ‘dread’. Despite empirical support for the existence of dread, whether and how it depends on prospective delay is unknown. Furthermore, it is unclear whether dread represents a stable component of value, or is modulated by biases such as framing effects. Here, we examine choices made between different numbers of painful shocks to be delivered faithfully at different time points up to 15 minutes in the future, as well as choices between hypothetical painful dental appointments at time points of up to approximately eight months in the future, to test alternative models for how future pain is disvalued. We show that future pain initially becomes increasingly aversive with increasing delay, but does so at a decreasing rate. This is consistent with a value model in which moment-by-moment dread increases up to the time of expected pain, such that dread becomes equivalent to the discounted expectation of pain. For a minority of individuals pain has maximum negative value at intermediate delay, suggesting that the dread function may itself be prospectively discounted in time. Framing an outcome as relief reduces the overall preference to expedite pain, which can be parameterized by reducing the rate of the dread-discounting function. Our data support an account of disvaluation for primary punishments such as pain, which differs fundamentally from existing models applied to financial punishments, in which dread exerts a powerful but time-dependent influence over choice.
by Avinash Kumar Shanmugam, Geoff Macintyre, Magali Michaut, Thomas Abeel
Cell-Based Multi-Parametric Model of Cleft Progression during Submandibular Salivary Gland Branching Morphogenesis
by Shayoni Ray, Daniel Yuan, Nimit Dhulekar, Basak Oztan, Bülent Yener, Melinda LarsenCleft formation during submandibular salivary gland branching morphogenesis is the critical step initiating the growth and development of the complex adult organ. Previous experimental studies indicated requirements for several epithelial cellular processes, such as proliferation, migration, cell-cell adhesion, cell-extracellular matrix (matrix) adhesion, and cellular contraction in cleft formation; however, the relative contribution of each of these processes is not fully understood since it is not possible to experimentally manipulate each factor independently. We present here a comprehensive analysis of several cellular parameters regulating cleft progression during branching morphogenesis in the epithelial tissue of an early embryonic salivary gland at a local scale using an on lattice Monte-Carlo simulation model, the Glazier-Graner-Hogeweg model. We utilized measurements from time-lapse images of mouse submandibular gland organ explants to construct a temporally and spatially relevant cell-based 2D model. Our model simulates the effect of cellular proliferation, actomyosin contractility, cell-cell and cell-matrix adhesions on cleft progression, and it was used to test specific hypotheses regarding the function of these parameters in branching morphogenesis. We use innovative features capturing several aspects of cleft morphology and quantitatively analyze clefts formed during functional modification of the cellular parameters. Our simulations predict that a low epithelial mitosis rate and moderate level of actomyosin contractility in the cleft cells promote cleft progression. Raising or lowering levels of contractility and mitosis rate resulted in non-progressive clefts. We also show that lowered cell-cell adhesion in the cleft region and increased cleft cell-matrix adhesions are required for cleft progression. Using a classifier-based analysis, the relative importance of these four contributing cellular factors for effective cleft progression was determined as follows: cleft cell contractility, cleft region cell-cell adhesion strength, epithelial cell mitosis rate, and cell-matrix adhesion strength.
by Nickolay A. Khazanov, Heather A. CarlsonThe residue composition of a ligand binding site determines the interactions available for diffusion-mediated ligand binding, and understanding general composition of these sites is of great importance if we are to gain insight into the functional diversity of the proteome. Many structure-based drug design methods utilize such heuristic information for improving prediction or characterization of ligand-binding sites in proteins of unknown function. The Binding MOAD database if one of the largest curated sets of protein-ligand complexes, and provides a source of diverse, high-quality data for establishing general trends of residue composition from currently available protein structures. We present an analysis of 3,295 non-redundant proteins with 9,114 non-redundant binding sites to identify residues over-represented in binding regions versus the rest of the protein surface. The Binding MOAD database delineates biologically-relevant “valid” ligands from “invalid” small-molecule ligands bound to the protein. Invalids are present in the crystallization medium and serve no known biological function. Contacts are found to differ between these classes of ligands, indicating that residue composition of biologically relevant binding sites is distinct not only from the rest of the protein surface, but also from surface regions capable of opportunistic binding of non-functional small molecules. To confirm these trends, we perform a rigorous analysis of the variation of residue propensity with respect to the size of the dataset and the content bias inherent in structure sets obtained from a large protein structure database. The optimal size of the dataset for establishing general trends of residue propensities, as well as strategies for assessing the significance of such trends, are suggested for future studies of binding-site composition.
Assessing Computational Methods for Transcription Factor Target Gene Identification Based on ChIP-seq Data
by Weronika Sikora-Wohlfeld, Marit Ackermann, Eleni G. Christodoulou, Kalaimathy Singaravelu, Andreas BeyerChromatin immunoprecipitation coupled with deep sequencing (ChIP-seq) has great potential for elucidating transcriptional networks, by measuring genome-wide binding of transcription factors (TFs) at high resolution. Despite the precision of these experiments, identification of genes directly regulated by a TF (target genes) is not trivial. Numerous target gene scoring methods have been used in the past. However, their suitability for the task and their performance remain unclear, because a thorough comparative assessment of these methods is still lacking. Here we present a systematic evaluation of computational methods for defining TF targets based on ChIP-seq data. We validated predictions based on 68 ChIP-seq studies using a wide range of genomic expression data and functional information. We demonstrate that peak-to-gene assignment is the most crucial step for correct target gene prediction and propose a parameter-free method performing most consistently across the evaluation tests.
by Antoine Frénoy, François Taddei, Dusan MisevicWhen cooperation has a direct cost and an indirect benefit, a selfish behavior is more likely to be selected for than an altruistic one. Kin and group selection do provide evolutionary explanations for the stability of cooperation in nature, but we still lack the full understanding of the genomic mechanisms that can prevent cheater invasion. In our study we used Aevol, an agent-based, in silico genomic platform to evolve populations of digital organisms that compete, reproduce, and cooperate by secreting a public good for tens of thousands of generations. We found that cooperating individuals may share a phenotype, defined as the amount of public good produced, but have very different abilities to resist cheater invasion. To understand the underlying genetic differences between cooperator types, we performed bio-inspired genomics analyses of our digital organisms by recording and comparing the locations of metabolic and secretion genes, as well as the relevant promoters and terminators. Association between metabolic and secretion genes (promoter sharing, overlap via frame shift or sense-antisense encoding) was characteristic for populations with robust cooperation and was more likely to evolve when secretion was costly. In mutational analysis experiments, we demonstrated the potential evolutionary consequences of the genetic association by performing a large number of mutations and measuring their phenotypic and fitness effects. The non-cooperating mutants arising from the individuals with genetic association were more likely to have metabolic deleterious mutations that eventually lead to selection eliminating such mutants from the population due to the accompanying fitness decrease. Effectively, cooperation evolved to be protected and robust to mutations through entangled genetic architecture. Our results confirm the importance of second-order selection on evolutionary outcomes, uncover an important genetic mechanism for the evolution and maintenance of cooperation, and suggest promising methods for preventing gene loss in synthetically engineered organisms.
A Petri Net Model of Granulomatous Inflammation: Implications for IL-10 Mediated Control of Leishmania donovani Infection
by Luca Albergante, Jon Timmis, Lynette Beattie, Paul M. KayeExperimental visceral leishmaniasis, caused by infection of mice with the protozoan parasite Leishmania donovani, is characterized by focal accumulation of inflammatory cells in the liver, forming discrete “granulomas” within which the parasite is eventually eliminated. To shed new light on fundamental aspects of granuloma formation and function, we have developed an in silico Petri net model that simulates hepatic granuloma development throughout the course of infection. The model was extensively validated by comparison with data derived from experimental studies in mice, and the model robustness was assessed by a sensitivity analysis. The model recapitulated the progression of disease as seen during experimental infection and also faithfully predicted many of the changes in cellular composition seen within granulomas over time. By conducting in silico experiments, we have identified a previously unappreciated level of inter-granuloma diversity in terms of the development of anti-leishmanial activity. Furthermore, by simulating the impact of IL-10 gene deficiency in a variety of lymphocyte and myeloid cell populations, our data suggest a dominant local regulatory role for IL-10 produced by infected Kupffer cells at the core of the granuloma.
Identification of Key Hinge Residues Important for Nucleotide-Dependent Allostery in E. coli Hsp70/DnaK
by Peter Man-Un Ung, Andrea D. Thompson, Lyra Chang, Jason E. Gestwicki, Heather A. CarlsonDnaK is a molecular chaperone that has important roles in protein folding. The hydrolysis of ATP is essential to this activity, and the effects of nucleotides on the structure and function of DnaK have been extensively studied. However, the key residues that govern the conformational motions that define the apo, ATP-bound, and ADP-bound states are not entirely clear. Here, we used molecular dynamics simulations, mutagenesis, and enzymatic assays to explore the molecular basis of this process. Simulations of DnaK's nucleotide-binding domain (NBD) in the apo, ATP-bound, and ADP/Pi-bound states suggested that each state has a distinct conformation, consistent with available biochemical and structural information. The simulations further suggested that large shearing motions between subdomains I-A and II-A dominated the conversion between these conformations. We found that several evolutionally conserved residues, especially G228 and G229, appeared to function as a hinge for these motions, because they predominantly populated two distinct states depending on whether ATP or ADP/Pi was bound. Consistent with the importance of these “hinge” residues, alanine point mutations caused DnaK to have reduced chaperone activities in vitro and in vivo. Together, these results clarify how sub-domain motions communicate allostery in DnaK.
Electrostatically Accelerated Encounter and Folding for Facile Recognition of Intrinsically Disordered Proteins
by Debabani Ganguly, Weihong Zhang, Jianhan ChenAchieving facile specific recognition is essential for intrinsically disordered proteins (IDPs) that are involved in cellular signaling and regulation. Consideration of the physical time scales of protein folding and diffusion-limited protein-protein encounter has suggested that the frequent requirement of protein folding for specific IDP recognition could lead to kinetic bottlenecks. How IDPs overcome such potential kinetic bottlenecks to viably function in signaling and regulation in general is poorly understood. Our recent computational and experimental study of cell-cycle regulator p27 (Ganguly et al., J. Mol. Biol. (2012)) demonstrated that long-range electrostatic forces exerted on enriched charges of IDPs could accelerate protein-protein encounter via “electrostatic steering” and at the same time promote “folding-competent” encounter topologies to enhance the efficiency of IDP folding upon encounter. Here, we further investigated the coupled binding and folding mechanisms and the roles of electrostatic forces in the formation of three IDP complexes with more complex folded topologies. The surface electrostatic potentials of these complexes lack prominent features like those observed for the p27/Cdk2/cyclin A complex to directly suggest the ability of electrostatic forces to facilitate folding upon encounter. Nonetheless, similar electrostatically accelerated encounter and folding mechanisms were consistently predicted for all three complexes using topology-based coarse-grained simulations. Together with our previous analysis of charge distributions in known IDP complexes, our results support a prevalent role of electrostatic interactions in promoting efficient coupled binding and folding for facile specific recognition. These results also suggest that there is likely a co-evolution of IDP folded topology, charge characteristics, and coupled binding and folding mechanisms, driven at least partially by the need to achieve fast association kinetics for cellular signaling and regulation.
by Lucas Theis, Andrè Maia Chagas, Daniel Arnstein, Cornelius Schwarz, Matthias BethgeGeneralized linear models (GLMs) represent a popular choice for the probabilistic characterization of neural spike responses. While GLMs are attractive for their computational tractability, they also impose strong assumptions and thus only allow for a limited range of stimulus-response relationships to be discovered. Alternative approaches exist that make only very weak assumptions but scale poorly to high-dimensional stimulus spaces. Here we seek an approach which can gracefully interpolate between the two extremes. We extend two frequently used special cases of the GLM—a linear and a quadratic model—by assuming that the spike-triggered and non-spike-triggered distributions can be adequately represented using Gaussian mixtures. Because we derive the model from a generative perspective, its components are easy to interpret as they correspond to, for example, the spike-triggered distribution and the interspike interval distribution. The model is able to capture complex dependencies on high-dimensional stimuli with far fewer parameters than other approaches such as histogram-based methods. The added flexibility comes at the cost of a non-concave log-likelihood. We show that in practice this does not have to be an issue and the mixture-based model is able to outperform generalized linear and quadratic models.
Systematic Analysis of Compositional Order of Proteins Reveals New Characteristics of Biological Functions and a Universal Correlate of Macroevolution
by Erez Persi, David HornWe present a novel analysis of compositional order (CO) based on the occurrence of Frequent amino-acid Triplets (FTs) that appear much more than random in protein sequences. The method captures all types of proteomic compositional order including single amino-acid runs, tandem repeats, periodic structure of motifs and otherwise low complexity amino-acid regions. We introduce new order measures, distinguishing between ‘regularity’, ‘periodicity’ and ‘vocabulary’, to quantify these phenomena and to facilitate the identification of evolutionary effects. Detailed analysis of representative species across the tree-of-life demonstrates that CO proteins exhibit numerous functional enrichments, including a wide repertoire of particular patterns of dependencies on regularity and periodicity. Comparison between human and mouse proteomes further reveals the interplay of CO with evolutionary trends, such as faster substitution rate in mouse leading to decrease of periodicity, while innovation along the human lineage leads to larger regularity. Large-scale analysis of 94 proteomes leads to systematic ordering of all major taxonomic groups according to FT-vocabulary size. This is measured by the count of Different Frequent Triplets (DFT) in proteomes. The latter provides a clear hierarchical delineation of vertebrates, invertebrates, plants, fungi and prokaryotes, with thermophiles showing the lowest level of FT-vocabulary. Among eukaryotes, this ordering correlates with phylogenetic proximity. Interestingly, in all kingdoms CO accumulation in the proteome has universal characteristics. We suggest that CO is a genomic-information correlate of both macroevolution and various protein functions. The results indicate a mechanism of genomic ‘innovation’ at the peptide level, involved in protein elongation, shaped in a universal manner by mutational and selective forces.
Multiscale Modeling of Influenza A Virus Infection Supports the Development of Direct-Acting Antivirals
by Frank S. Heldt, Timo Frensing, Antje Pflugmacher, Robin Gröpler, Britta Peschel, Udo ReichlInfluenza A viruses are respiratory pathogens that cause seasonal epidemics with up to 500,000 deaths each year. Yet there are currently only two classes of antivirals licensed for treatment and drug-resistant strains are on the rise. A major challenge for the discovery of new anti-influenza agents is the identification of drug targets that efficiently interfere with viral replication. To support this step, we developed a multiscale model of influenza A virus infection which comprises both the intracellular level where the virus synthesizes its proteins, replicates its genome, and assembles new virions and the extracellular level where it spreads to new host cells. This integrated modeling approach recapitulates a wide range of experimental data across both scales including the time course of all three viral RNA species inside an infected cell and the infection dynamics in a cell population. It also allowed us to systematically study how interfering with specific steps of the viral life cycle affects virus production. We find that inhibitors of viral transcription, replication, protein synthesis, nuclear export, and assembly/release are most effective in decreasing virus titers whereas targeting virus entry primarily delays infection. In addition, our results suggest that for some antivirals therapy success strongly depends on the lifespan of infected cells and, thus, on the dynamics of virus-induced apoptosis or the host's immune response. Hence, the proposed model provides a systems-level understanding of influenza A virus infection and therapy as well as an ideal platform to include further levels of complexity toward a comprehensive description of infectious diseases.
by Rotem Ben-Hamo, Sol EfroniThe transcriptional networks that regulate gene expression and modifications to this network are at the core of the cancer phenotype. MicroRNAs, a well-studied species of small non-coding RNA molecules, have been shown to have a central role in regulating gene expression as part of this transcriptional network. Further, microRNA deregulation is associated with cancer development and with tumor progression. Glioblastoma Multiform (GBM) is the most common, aggressive and malignant primary tumor of the brain and is associated with one of the worst 5-year survival rates among all human cancers. To study the transcriptional network and its modifications in GBM, we utilized gene expression, microRNA sequencing, whole genome sequencing and clinical data from hundreds of patients from different datasets. Using these data and a novel microRNA-gene association approach we introduce, we have identified unique microRNAs and their associated genes. This unique behavior is composed of the ability of the quantifiable association of the microRNA and the gene expression levels, which we show stratify patients into clinical subgroups of high statistical significance. Importantly, this stratification goes unobserved by other methods and is not affiliated by other subsets or phenotypes within the data. To investigate the robustness of the introduced approach, we demonstrate, in unrelated datasets, robustness of findings. Among the set of identified microRNA-gene associations, we closely study the example of MAF and hsa-miR-330-3p, and show how their co-behavior stratifies patients into prognosis clinical groups and how whole genome sequences tells us more about a specific genomic variation as a possible basis for patient variances. We argue that these identified associations may indicate previously unexplored specific disease control mechanisms and may be used as basis for further study and for possible therapeutic intervention.
by Takeshi Hase, Samik Ghosh, Ryota Yamanaka, Hiroaki KitanoElucidating gene regulatory network (GRN) from large scale experimental data remains a central challenge in systems biology. Recently, numerous techniques, particularly consensus driven approaches combining different algorithms, have become a potentially promising strategy to infer accurate GRNs. Here, we develop a novel consensus inference algorithm, TopkNet that can integrate multiple algorithms to infer GRNs. Comprehensive performance benchmarking on a cloud computing framework demonstrated that (i) a simple strategy to combine many algorithms does not always lead to performance improvement compared to the cost of consensus and (ii) TopkNet integrating only high-performance algorithms provide significant performance improvement compared to the best individual algorithms and community prediction. These results suggest that a priori determination of high-performance algorithms is a key to reconstruct an unknown regulatory network. Similarity among gene-expression datasets can be useful to determine potential optimal algorithms for reconstruction of unknown regulatory networks, i.e., if expression-data associated with known regulatory network is similar to that with unknown regulatory network, optimal algorithms determined for the known regulatory network can be repurposed to infer the unknown regulatory network. Based on this observation, we developed a quantitative measure of similarity among gene-expression datasets and demonstrated that, if similarity between the two expression datasets is high, TopkNet integrating algorithms that are optimal for known dataset perform well on the unknown dataset. The consensus framework, TopkNet, together with the similarity measure proposed in this study provides a powerful strategy towards harnessing the wisdom of the crowds in reconstruction of unknown regulatory networks.
Understanding the Connection between Epigenetic DNA Methylation and Nucleosome Positioning from Computer Simulations
by Guillem Portella, Federica Battistini, Modesto OrozcoCytosine methylation is one of the most important epigenetic marks that regulate the process of gene expression. Here, we have examined the effect of epigenetic DNA methylation on nucleosomal stability using molecular dynamics simulations and elastic deformation models. We found that methylation of CpG steps destabilizes nucleosomes, especially when these are placed in sites where the DNA minor groove faces the histone core. The larger stiffness of methylated CpG steps is a crucial factor behind the decrease in nucleosome stability. Methylation changes the positioning and phasing of the nucleosomal DNA, altering the accessibility of DNA to regulatory proteins, and accordingly gene functionality. Our theoretical calculations highlight a simple physical-based explanation on the foundations of epigenetic signaling.
by Miranda Stobbe, Tarun Mishra, Geoff MacintyreWhen meeting someone for the first time—whether another PhD student, or the Founding Editor-in-chief of PLOS Computational Biology—nothing breaks the ice like eating pancakes or having drinks together. A social atmosphere provides a relaxed, informal environment where people can connect, share ideas, and form collaborations. Being able to build a network and thrive in a social environment is crucial to a successful scientific career. This article highlights the importance of bringing people together who speak the same scientific language in an informal setting. Using examples of events held by Regional Student Groups of the ISCB's Student Council, this article shows that socializing is much more than simply sharing a drink.
Conserved Substitution Patterns around Nucleosome Footprints in Eukaryotes and Archaea Derive from Frequent Nucleosome Repositioning through Evolution
by Tobias Warnecke, Erin A. Becker, Marc T. Facciotti, Corey Nislow, Ben LehnerNucleosomes, the basic repeat units of eukaryotic chromatin, have been suggested to influence the evolution of eukaryotic genomes, both by altering the propensity of DNA to mutate and by selection acting to maintain or exclude nucleosomes in particular locations. Contrary to the popular idea that nucleosomes are unique to eukaryotes, histone proteins have also been discovered in some archaeal genomes. Archaeal nucleosomes, however, are quite unlike their eukaryotic counterparts in many respects, including their assembly into tetramers (rather than octamers) from histone proteins that lack N- and C-terminal tails. Here, we show that despite these fundamental differences the association between nucleosome footprints and sequence evolution is strikingly conserved between humans and the model archaeon Haloferax volcanii. In light of this finding we examine whether selection or mutation can explain concordant substitution patterns in the two kingdoms. Unexpectedly, we find that neither the mutation nor the selection model are sufficient to explain the observed association between nucleosomes and sequence divergence. Instead, we demonstrate that nucleosome-associated substitution patterns are more consistent with a third model where sequence divergence results in frequent repositioning of nucleosomes during evolution. Indeed, we show that nucleosome repositioning is both necessary and largely sufficient to explain the association between current nucleosome positions and biased substitution patterns. This finding highlights the importance of considering the direction of causality between genetic and epigenetic change.
by Martin Boerlin, Christian K. Machens, Sophie DenèveTwo observations about the cortex have puzzled neuroscientists for a long time. First, neural responses are highly variable. Second, the level of excitation and inhibition received by each neuron is tightly balanced at all times. Here, we demonstrate that both properties are necessary consequences of neural networks that represent information efficiently in their spikes. We illustrate this insight with spiking networks that represent dynamical variables. Our approach is based on two assumptions: We assume that information about dynamical variables can be read out linearly from neural spike trains, and we assume that neurons only fire a spike if that improves the representation of the dynamical variables. Based on these assumptions, we derive a network of leaky integrate-and-fire neurons that is able to implement arbitrary linear dynamical systems. We show that the membrane voltage of the neurons is equivalent to a prediction error about a common population-level signal. Among other things, our approach allows us to construct an integrator network of spiking neurons that is robust against many perturbations. Most importantly, neural variability in our networks cannot be equated to noise. Despite exhibiting the same single unit properties as widely used population code models (e.g. tuning curves, Poisson distributed spike trains), balanced networks are orders of magnitudes more reliable. Our approach suggests that spikes do matter when considering how the brain computes, and that the reliability of cortical representations could have been strongly underestimated.
Structure-Based Function Prediction of Uncharacterized Protein Using Binding Sites Comparison
by Janez Konc, Milan Hodošček, Mitja Ogrizek, Joanna Trykowska Konc, Dušanka JanežičA challenge in structural genomics is prediction of the function of uncharacterized proteins. When proteins cannot be related to other proteins of known activity, identification of function based on sequence or structural homology is impossible and in such cases it would be useful to assess structurally conserved binding sites in connection with the protein's function. In this paper, we propose the function of a protein of unknown activity, the Tm1631 protein from Thermotoga maritima, by comparing its predicted binding site to a library containing thousands of candidate structures. The comparison revealed numerous similarities with nucleotide binding sites including specifically, a DNA-binding site of endonuclease IV. We constructed a model of this Tm1631 protein with a DNA-ligand from the newly found similar binding site using ProBiS, and validated this model by molecular dynamics. The interactions predicted by the Tm1631-DNA model corresponded to those known to be important in endonuclease IV-DNA complex model and the corresponding binding free energies, calculated from these models were in close agreement. We thus propose that Tm1631 is a DNA binding enzyme with endonuclease activity that recognizes DNA lesions in which at least two consecutive nucleotides are unpaired. Our approach is general, and can be applied to any protein of unknown function. It might also be useful to guide experimental determination of function of uncharacterized proteins.