Proteins: Structure, Function, Bioinformatics
Normal mode analysis (NMA) has been a powerful tool for studying protein dynamics. Elastic network models (ENM), through their simplicity, have made normal mode computations accessible to a much broader research community and for many more bio-molecular systems. The drawback of ENMs, however, is that they are less accurate than NMA. In this work, through steps of simplification that starts with NMA and ends with elastic network models we build a tight connection between NMA and elastic network models. In the process of bridging between the two, we have also discovered several high-quality simplified models. Our best simplified model has a mean correlation with the original NMA that is as high as 0.88. In addition, the model is force-field independent and does not require energy minimization, and thus can be applied directly to experimental structures. Another benefit of drawing the connection is a clearer understanding why elastic network models work well and how it can be further improved. We discover that ANM can be greatly enhanced by including an additional torsional term and a geometry term. © Proteins 2014;. © 2014 Wiley Periodicals, Inc.
To assess the state-of-the-art in antibody structure modeling, a blinded study was conducted. Eleven unpublished Fab crystal structures were used as a benchmark to compare Fv models generated by seven structure prediction methodologies. In the first round, each participant submitted three non-ranked complete Fv models for each target. In the second round, CDR-H3 modeling was performed in the context of the correct environment provided by the crystal structures with CDR-H3 removed. In this report we describe the reference structures and present our assessment of the models. Some of the essential sources of errors in the predictions were traced to the selection of the structure template, both in terms of the CDR canonical structures and VL/VH packing. On top of this, the errors present in the Protein Data Bank structures were sometimes propagated in the current models, which emphasized the need for the curated structural database devoid of errors. Modeling non-canonical structures, including CDR-H3, remains the biggest challenge for antibody structure prediction. Proteins 2014. © 2014 Wiley Periodicals, Inc.
Blind prediction performance of RosettaAntibody 3.0: Grafting, relaxation, kinematic loop modeling, and full CDR optimization
Antibody Modeling Assessment II (AMA-II) provided an opportunity to benchmark RosettaAntibody on a set of 11 unpublished antibody structures. RosettaAntibody produced accurate, physically realistic models, with all framework regions and 42 of the 55 non-H3 CDR loops predicted to under an Ångström. The performance is notable when modeling H3 on a homology framework, where RosettaAntibody produced the best model among all participants for four of the 11 targets, two of which were predicted with sub-Ångström accuracy. To improve RosettaAntibody, we pursued the causes of model errors. The most common limitation was template unavailability, underscoring the need for more antibody structures and/or better de novo loop methods. In some cases, better templates could have been found by considering residues outside of the CDRs. De novo CDR H3 modeling remains challenging at long loop lengths, but constraining the C-terminal end of H3 to a kinked conformation allows near-native conformations to be sampled more frequently. We also found that incorrect VL–VH orientations caused models with low H3 RMSDs to score poorly, suggesting that correct VL–VH orientations will improve discrimination between near-native and incorrect conformations. These observations will guide the future development of RosettaAntibody. Proteins 2014. © 2014 Wiley Periodicals, Inc.
Assessment of protein side-chain conformation prediction methods in different residue environments
Computational prediction of side-chain conformation is an important component of protein structure prediction. Accurate side-chain prediction is crucial for practical applications of protein structure models that need atomic-detailed resolution such as protein and ligand design. We evaluated the accuracy of eight side-chain prediction methods in reproducing the side-chain conformations of experimentally solved structures deposited to the Protein Data Bank. Prediction accuracy was evaluated for a total of four different structural environments (buried, surface, interface, and membrane-spanning) in three different protein types (monomeric, multimeric, and membrane). Overall, the highest accuracy was observed for buried residues in monomeric and multimeric proteins. Notably, side-chains at protein interfaces and membrane-spanning regions were better predicted than surface residues even though the methods did not all use multimeric and membrane proteins for training. Thus, we conclude that the current methods are as practically useful for modeling protein docking interfaces and membrane-spanning regions as for modeling monomers. Proteins 2014. © 2014 Wiley Periodicals, Inc.
Relevance of mode coupling to energy/information transfer during protein function, particularly in the context of allosteric interactions is widely accepted. However, existing evidence in favor of this hypothesis comes essentially from model systems. We here report a novel formal analysis of the near-native dynamics of myosin II, which allows us to explore the impact of the interaction between possibly non-Gaussian vibrational modes on fluctutational dynamics. We show that an information-theoretic measure based on mode coupling alone yields a ranking of residues with a statistically significant bias favoring the functionally critical locations identified by experiments on myosin II. Proteins 2014. © 2014 Wiley Periodicals, Inc.
Crystal structure of JHP933 from Helicobacter pylori J99 shows two-domain architecture with a DUF1814 family nucleotidyltransferase domain and a helical bundle domain
The jhp0933 gene in the plasticity region of Helicobacter pylori J99 encodes a hypothetical protein (JHP933), which may play some roles in pathogenesis. Here, we have determined the crystal structure of JHP933 at 2.17 Å. It represents the first crystal structure of the DUF1814 protein family. JHP933 consists of two domains: an N-terminal domain of the nucleotidyltransferase (NTase) fold and a C-terminal helix bundle domain. A highly positively-charged surface patch exists adjacent to the putative NTP binding site. Structural similarity of JHP933 to known NTases is very remote, suggesting that it may function as a novel nucleotidyltransferase. © Proteins 2014;. © 2014 Wiley Periodicals, Inc.
Comparative analysis of sequence co-variation methods to mine evolutionary hubs: Examples from selected GPCR families
Co-variation between positions in a multiple sequence alignment may reflect structural, functional, and/or phylogenetic constraints and can be analyzed by a wide variety of methods. We explored several of these methods for their ability to identify co-varying positions related to the divergence of a protein family at different hierarchical levels. Specifically, we compared seven methods on a system model composed of three nested sets of G-protein-coupled receptors (GPCRs) in which a divergence event occurred. The co-variation methods analyzed were based on: χ2 test, mutual information, substitution matrices, and perturbation methods. We first analyzed the dependence of the co-variation scores on residue conservation (measured by sequence entropy), and then we analyzed the networking structure of the top pairs. Two methods out of seven—OMES (Observed minus Expected Squared) and ELSC (Explicit Likelihood of Subset Covariation)—favored pairs with intermediate entropy and a networking structure with a central residue involved in several high scoring pairs. This networking structure was observed for the three sequence sets. In each case, the central residue corresponded to a residue known to be crucial for the evolution of the GPCR family and the sub-family specificity. These central residues can be viewed as evolutionary hubs, in relation with an epistasis-based mechanism of functional divergence within a protein family. © Proteins 2014;. © 2014 Wiley Periodicals, Inc.
Our understanding of protein folding, stability and function has begun to more explicitly incorporate dynamical aspects. Nuclear magnetic resonance has emerged as a powerful experimental method for obtaining comprehensive site-resolved insight into protein motion. It has been observed that methyl-group motion tends to cluster into three “classes” when expressed in terms of the popular Lipari-Szabo model-free squared generalized order parameter. Here the origins of the three classes or bands in the distribution of order parameters are examined. As a first step, a Bayesian based approach, which makes no a priori assumption about the existence or number of bands, is developed to detect the banding of O2axis values derived either from NMR experiments or molecular dynamics simulations. The analysis is applied to seven proteins with extensive molecular dynamics simulations of these proteins in explicit water to examine the relationship between O2 and fine details of the motion of methyl bearing side chains. All of the proteins studied display banding, with some subtle differences. We propose a very simple yet plausible physical mechanism for banding. Finally, our Bayesian method is used to analyze the measured distributions of methyl group motions in the catabolite activating protein and several of its mutants in various liganded states and discuss the functional implications of the observed banding to protein dynamics and function. © Proteins 2014;. © 2014 Wiley Periodicals, Inc.
Second Antibody Modeling Assessment (AMA-II)
To assess the state of the art in antibody 3D modeling, eleven unpublished high-resolution x-ray Fab crystal structures from diverse species and covering a wide range of antigen-binding site conformations were used as benchmark to compare Fv models generated by seven structure prediction methodologies. The participants included: Accerlys Inc, Chemical Computer Group (CCG), Schrodinger, Jeff Gray's lab at John Hopkins University, Macromoltek, Astellas Pharma/Osaka University and Prediction of ImmunoGlobulin Structure (PIGS). The sequences of benchmark structure were submitted to the modelers and PIGS, and a set of models were generated for each structure. We provide here an overview of the organization, participants and main results of this second antibody modeling assessment (AMA-II). Also, we compare the results with the first antibody assessment published in this journal (Almagro et al., Proteins 79: 3050, 2011). © Proteins 2014;. © 2014 Wiley Periodicals, Inc.
The influenza fusion peptide promotes lipid polar head intrusion through hydrogen bonding with phosphates and N-terminal membrane insertion depth
Influenza infection requires fusion between the virus envelope and a host cell endosomal membrane. The influenza hemagglutinin fusion peptide (FP) is essential to viral membrane fusion. It was recently proposed that FPs would fuse membranes by increasing lipid tail protrusion, a membrane fusion transition state. The details of how FPs induce lipid tail protrusion however remain to be elucidated. To decipher the molecular mechanism by which FPs promote lipid tail protrusion, we performed MD simulations of the wild-type (WT) FP, fusogenic mutant F9A, and nonfusogenic mutant W14A in model bilayers. This paper presents the peptide-lipid interaction responsible for lipid tail protrusion and a related lipid perturbation, polar head intrusion, where polar heads are sunk under the membrane surface. The backbone amides from the four N-terminal peptide residues, deeply inserted in the membrane, promoted both perturbations through H-bonding with lipid phosphates. Polar head intrusion correlated with peptides N-terminal insertion depth and activity: the N-termini of WT and F9A were inserted deeper into the membrane than nonfusogenic W14A. Based on these results, we propose that FP-induced polar head intrusion would complement lipid tail protrusion in catalyzing membrane fusion by reducing repulsions between juxtaposed membranes headgroups. The presented model provides a framework for further research on membrane fusion and influenza antivirals. © Proteins 2014;. © 2014 Wiley Periodicals, Inc.
To understand the relationship between protein sequence and structure, this work extends the knob-socket model in an investigation of β-sheet packing. Over a comprehensive set of β-sheet folds, the contacts between residues were used to identify packing cliques: sets of residues that all contact each other. These packing cliques were then classified based on size and contact order. From this analysis, the 2 types of 4 residue packing cliques necessary to describe β-sheet packing were characterized. Both occur between 2 adjacent hydrogen bonded β-strands. First, defining the secondary structure packing within β-sheets, the combined socket or XY:HG pocket consists of 4 residues i,i+2 on one strand and j,j+2 on the other. Second, characterizing the tertiary packing between β-sheets, the knob-socket XY:H+B consists of a 3 residue XY:H socket (i,i+2 on one strand and j on the other) packed against a knob B residue (residue k distant in sequence). Depending on the packing depth of the knob B residue, 2 types of knob-sockets are found: side-chain and main-chain sockets. The amino acid composition of the pockets and knob-sockets reveal the sequence specificity of β-sheet packing. For β-sheet formation, the XY:HG pocket clearly shows sequence specificity of amino acids. For tertiary packing, the XY:H+B side-chain and main-chain sockets exhibit distinct amino acid preferences at each position. These relationships define an amino acid code for β-sheet structure and provide an intuitive topological mapping of β-sheet packing. © Proteins 2014;. © 2014 Wiley Periodicals, Inc.
Molecular dynamics simulation of the phosphorylation-induced conformational changes of a tau peptide fragment
Aggregation of the microtubule associated protein tau (MAPT) within neurons of the brain is the leading cause of tauopathies such as Alzheimer's disease. MAPT is a phospho-protein that is selectively phosphorylated by a number of kinases in vivo to perform its biological function. However, it may become pathogenically hyperphosphorylated, causing aggregation into paired helical filaments and neurofibrillary tangles. The phosphorylation induced conformational change on a peptide of MAPT (htau225−250) was investigated by performing molecular dynamics simulations with different phosphorylation patterns of the peptide (pThr231 and/or pSer235) in different simulation conditions to determine the effect of ionic strength and phosphate charge. All phosphorylation patterns were found to disrupt a nascent terminal β-sheet pattern (226VAVVR230 and 244QTAPVP249), replacing it with a range of structures. The double pThr231/pSer235 phosphorylation pattern at experimental ionic strength resulted in the best agreement with NMR structural characterization, with the observation of a transient α-helix (239AKSRLQT245). PPII helical conformations were only found sporadically throughout the simulations. Proteins 2014. © 2014 Wiley Periodicals, Inc.
Mycobacterium tuberculosis evades host immune responses by colonizing macrophages. Intraphagosomal M. tuberculosis is exposed to environmental stresses such as reactive oxygen and nitrogen intermediates as well as acid shock and inorganic phosphate (Pi) depletion. Experimental evidence suggests that expression levels of mycobacterial protein PstS3 (Rv0928) are significantly increased when M. tuberculosis bacilli are exposed to Pi starvation. Hence, PstS3 may be important for survival of Mtb in conditions where there is limited supply of Pi. We report here the structure of PstS3 from M. tuberculosis at 2.3-Å resolution. The protein presents a structure typical for ABC phosphate transfer receptors. Comparison with its cognate receptor PstS1 showed a different pattern distribution of surface charges in proximity to the Pi recognition site, suggesting complementary roles of the two proteins in Pi uptake. Proteins 2014. © 2014 Wiley Periodicals, Inc.
Thermally stable proteins are desirable for research and industrial purposes, but redesigning proteins for higher thermal stability can be challenging. A number of different techniques have been used to improve the thermal stability of proteins, but the extents of stability enhancement were sometimes unpredictable and not significant. Here, we systematically tested the effects of multiple stabilization techniques including a bioinformatic method and structure-guided mutagenesis on a single protein, thereby providing an integrated approach to protein thermal stabilization. Using a mesophilic adenylate kinase (AK) as a model, we identified stabilizing mutations based on various stabilization techniques, and generated a series of AK variants by introducing mutations both individually and collectively. The redesigned proteins displayed a range of increased thermal stabilities, the most stable of which was comparable to a naturally evolved thermophilic homologue with more than a 25° increase in its thermal denaturation midpoint. We also solved crystal structures of three representative variants including the most stable variant, to confirm the structural basis for their increased stabilities. These results provide a unique opportunity for systematically analyzing the effectiveness and additivity of various stabilization mechanisms, and they represent a useful approach for improving protein stability by integrating the reduction of local structural entropy and the optimization of global noncovalent interactions such as hydrophobic contact and ion pairs. Proteins 2014. © 2014 Wiley Periodicals, Inc.
Effects of protein engineering and rational mutagenesis on crystal lattice of single chain antibody fragments
Protein crystallization is dependent upon, and sensitive to, the intermolecular contacts that assist in ordering proteins into a three-dimensional lattice. Here we used protein engineering and mutagenesis to affect the crystallization of single chain antibody fragments (scFvs) that recognize the EE epitope (EYMPME) with high affinity. These hypercrystallizable scFvs are under development to assist difficult proteins, such as membrane proteins, in forming crystals, by acting as crystallization chaperones. Guided by analyses of intermolecular crystal lattice contacts, two second-generation anti-EE scFvs were produced, which bind to proteins with installed EE tags. Surprisingly, although noncomplementarity determining region (CDR) lattice residues from the parent scFv framework remained unchanged through the processes of protein engineering and rational design, crystal lattices of the derivative scFvs differ. Comparison of energy calculations and the experimentally-determined lattice interactions for this basis set provides insight into the complexity of the forces driving crystal lattice choice and demonstrates the availability of multiple well-ordered surface features in our scFvs capable of forming versatile crystal contacts. Proteins 2014. © 2014 Wiley Periodicals, Inc.
ProDomAs, protein domain assignment algorithm using center-based clustering and independent dominating set
Decomposition of structural domains is an essential task in classifying protein structures, predicting protein function, and many other proteomics problems. As the number of known protein structures in PDB grows exponentially, the need for accurate automatic domain decomposition methods becomes more essential. In this article, we introduce a bottom-up algorithm for assigning protein domains using a graph theoretical approach. This algorithm is based on a center-based clustering approach. For constructing initial clusters, members of an independent dominating set for the graph representation of a protein are considered as the centers. A distance matrix is then defined for these clusters. To obtain final domains, these clusters are merged using the compactness principle of domains and a method similar to the neighbor-joining algorithm considering some thresholds. The thresholds are computed using a training set consisting of 50 protein chains. The algorithm is implemented using C++ language and is named ProDomAs. To assess the performance of ProDomAs, its results are compared with seven automatic methods, against five publicly available benchmarks. The results show that ProDomAs outperforms other methods applied on the mentioned benchmarks. The performance of ProDomAs is also evaluated against 6342 chains obtained from ASTRAL SCOP 1.71. ProDomAs is freely available at http://www.bioinf.cs.ipm.ir/software/prodomas. Proteins 2014. © 2014 Wiley Periodicals, Inc.
Free energetics of rigid body association of ubiquitin binding domains: A biochemical model for binding mediated by hydrophobic interaction
Weak intermolecular interactions, such as hydrophobic associations, underlie numerous biomolecular recognition processes. Ubiquitin is a small protein that represents a biochemical model for exploring thermodynamic signatures of hydrophobic association as it is widely held that a major component of ubiquitin's binding to numerous partners is mediated by hydrophobic regions on both partners. Here, we use atomistic molecular dynamics simulations in conjunction with the Adaptive Biasing Force sampling method to compute potentials of mean force (the reversible work, or free energy, associated with the binding process) to investigate the thermodynamic signature of complexation in this well-studied biochemical model of hydrophobic association. We observe that much like in the case of a purely hydrophobic solute (i.e., graphene, carbon nanotubes), association is favored by entropic contributions from release of water from the interprotein regions. Moreover, association is disfavored by loss of enthalpic interactions, but unlike in the case of purely hydrophobic solutes, in this case protein-water interactions are lost and not compensated for by additional water-water interactions generated upon release of interprotein and moreso, hydration, water. We further find that relative orientations of the proteins that mutually present hydrophobic regions of each protein to its partner are favored over those that do not. In fact, the free energy minimum as predicted by a force field based method recapitulates the experimental NMR solution structure of the complex. Proteins 2014. © 2014 Wiley Periodicals, Inc.
Multiple Gaussian Network Modes alignment reveals dynamically variable regions: The hemoglobin case
Gaussian Network Model (GNM) modes of motion are calculated to a dataset of Hemoglobin (Hb) structures and modes with dynamics similarity to the T state are multiply aligned. The sole criterion for the alignment is the mode shape itself and not sequence or structural similarity. Standard deviation of the GNM value score along the alignment is calculated, regions with high standard deviation are defined as dynamically variable. The analysis shows that the α1β1/α2β2 interface is a dynamically variable region but not the α1β2/ α2β1 and the α1α2/ β1β2 interfaces. The results are in accordance with the T → R2 transition of hemoglobin. We suggest that dynamically variable regions are regions that are likely to undergo structural change in the protein upon binding, conformational transition, or any other relevant chemical event. The represented technique of multiple dynamics based alignment of modes is novel and may offer a new insight in proteins′ dynamics to function relation. © Proteins 2014;. © 2014 Wiley Periodicals, Inc.
Molecular external structure is important for molecular function, with voids on the surface and interior being one of the most important features. Hence, recognition of molecular voids and accurate computation of their geometrical properties, such as volume, area and topology, are crucial, yet most popular algorithms are based on the crude use of sampling points and thus are approximations even with a significant amount of computation. In this article, we propose an analytic approach to the problem using the Voronoi diagram of atoms and the beta-complex. The correctness and efficiency of the proposed algorithm is mathematically proved and experimentally verified. The benchmark test clearly shows the superiority of BetaVoid to two popular programs: VOIDOO and CASTp. The proposed algorithm is implemented in the BetaVoid program which is freely available at the Voronoi Diagram Research Center (http://voronoi.hanyang.ac.kr).Proteins 2014. © 2014 Wiley Periodicals, Inc.
Obtaining optimal cofactor balance to drive production is a challenge in metabolically engineered microbial production strains. To facilitate identification of heterologous enzymes with desirable altered cofactor requirements from native content, we have developed Cofactory, a method for prediction of enzyme cofactor specificity using only primary amino acid sequence information. The algorithm identifies potential cofactor binding Rossmann folds and predicts the specificity for the cofactors FAD(H2), NAD(H), and NADP(H). The Rossmann fold sequence search is carried out using hidden Markov models whereas artificial neural networks are used for specificity prediction. Training was carried out using experimental data from protein–cofactor structure complexes. The overall performance was benchmarked against an independent evaluation set obtaining Matthews correlation coefficients of 0.94, 0.79, and 0.65 for FAD(H2), NAD(H), and NADP(H), respectively. The Cofactory method is made publicly available at http://www.cbs.dtu.dk/services/Cofactory. Proteins 2014. © 2014 Wiley Periodicals, Inc.