Nucleic Acids Research
The avian bacterial pathogen Mycoplasma gallisepticum is a good model for systems studies due to small genome and simplicity of regulatory pathways. In this study, we used RNA-Seq and MS-based proteomics to accurately map coding sequences, transcription start sites (TSSs) and transcript 3'-ends (T3Es). We used obtained data to investigate roles of TSSs and T3Es in stress-induced transcriptional responses. We identified 1061 TSSs at a false discovery rate of 10% and showed that almost all transcription in M. gallisepticum is initiated from classic TATAAT promoters surrounded by A/T-rich sequences. Our analysis revealed the pronounced operon structure complexity: on average, each coding operon has one internal TSS and T3Es in addition to the primary ones. Our transcriptomic approach based on the intervals between the two nearest transcript ends allowed us to identify two classes of T3Es: strong, unregulated, hairpin-containing T3Es and weak, heat shock-regulated, hairpinless T3Es. Comparing gene expression levels under different conditions revealed widespread and divergent transcription regulation in M. gallisepticum. Modeling suggested that the core promoter structure plays an important role in gene expression regulation. We have shown that the heat stress activation of cryptic promoters combined with the hairpinless T3Es suppression leads to widespread, seemingly non-functional transcription.
To elucidate the molecular mechanism of the integration of long interspersed elements (LINEs), we characterized the 5' ends of more than 200 LINE de novo retrotransposition events into chicken DT40 or human HeLa cells. Human L1 inserts produced 15-bp target-site duplications (TSDs) and zebrafish ZfL2-1 inserts produced 5-bp TSDs in DT40 cells, suggesting that TSD length depends on the LINE species. Further analysis of 5' junctions revealed that the 5'-end-joining pathways of LINEs can be divided into two fundamental types—annealing or direct. We also found that the generation of 5' inversions depends on host and LINE species. These results led us to propose a new model for 5'-end joining, the type of which is determined by the extent of exposure of 3' overhangs generated after the second-strand cleavage and by the involvement of host factors.
Endonuclease G preferentially cleaves 5-hydroxymethylcytosine-modified DNA creating a substrate for recombination
5-hydroxymethylcytosine (5hmC) has been suggested to be involved in various nucleic acid transactions and cellular processes, including transcriptional regulation, demethylation of 5-methylcytosine and stem cell pluripotency. We have identified an activity that preferentially catalyzes the cleavage of double-stranded 5hmC-modified DNA. Using biochemical methods we purified this activity from mouse liver extracts and demonstrate that the enzyme responsible for the cleavage of 5hmC-modified DNA is Endonuclease G (EndoG). We show that recombinant EndoG preferentially recognizes and cleaves a core sequence when one specific cytosine within that core sequence is hydroxymethylated. Additionally, we provide in vivo evidence that EndoG catalyzes the formation of double-stranded DNA breaks and that this cleavage is dependent upon the core sequence, EndoG and 5hmC. Finally, we demonstrate that the 5hmC modification can promote conservative recombination in an EndoG-dependent manner.
The endoribonuclease RNase E is a key enzyme in RNA metabolism for many bacterial species. In Escherichia coli, RNase E contributes to the majority of RNA turnover and processing events, and the enzyme has been extensively characterized as the central component of the RNA degradosome assembly. A similar RNA degradosome assembly has been described in the α-proteobacterium Caulobacter crescentus, with the interacting partners of RNase E identified as the Kreb's cycle enzyme aconitase, a DEAD-box RNA helicase RhlB and the exoribonuclease polynucleotide phosphorylase. Here we report that an additional degradosome component is the essential exoribonuclease RNase D, and its recognition site within RNase E is identified. We show that, unlike its E. coli counterpart, C. crescentus RhlB interacts directly with a segment of the N-terminal catalytic domain of RNase E. The crystal structure of a portion of C. crescentus RNase E encompassing the helicase-binding region is reported. This structure reveals that an inserted segment in the S1 domain adopts an α-helical conformation, despite being predicted to be natively unstructured. We discuss the implications of these findings for the organization and mechanisms of the RNA degradosome.
The intricate network of interactions observed in RNA three-dimensional structures is often described in terms of a multitude of geometrical properties, including helical parameters, base pairing/stacking, hydrogen bonding and backbone conformation. We show that a simple molecular representation consisting in one oriented bead per nucleotide can account for the fundamental structural properties of RNA. In this framework, canonical Watson-Crick, non-Watson-Crick base-pairing and base-stacking interactions can be unambiguously identified within a well-defined interaction shell. We validate this representation by performing two independent, complementary tests. First, we use it to construct a sequence-independent, knowledge-based scoring function for RNA structural prediction, which compares favorably to fully atomistic, state-of-the-art techniques. Second, we define a metric to measure deviation between RNA structures that directly reports on the differences in the base–base interaction network. The effectiveness of this metric is tested with respect to the ability to discriminate between structurally and kinetically distant RNA conformations, performing better compared to standard techniques. Taken together, our results suggest that this minimalist, nucleobase-centric representation captures the main interactions that are relevant for describing RNA structure and dynamics.
Single nucleotide seed modification restores in vivo tolerability of a toxic artificial miRNA sequence in the mouse brain
Huntington's disease is a fatal neurodegenerative disease caused by polyglutamine-expansion in huntingtin (HTT). Recent work showed that gene silencing approaches, including RNA interference (RNAi), improve disease readouts in mice. To advance RNAi to the clinic, we designed miHDS1, with robust knockdown of human HTT and minimized silencing of unintended transcripts. In Rhesus macaque, AAV delivery of miHDS1 to the putamen reduced HTT expression with no adverse effects on neurological status including fine and gross motor skills, no immune activation and no induction of neuropathology out to 6 weeks post injection. Others showed safety of a different HTT-targeting RNAi in monkeys for 6 months. Application of miHDS1 to Huntington's patients requires further safety testing in normal rodents, despite the fact that it was optimized for humans. To satisfy this regulatory requirement, we evaluated normal mice after AAV.miHDS1 injection. In contrast to monkeys, neurological deficits occurred acutely in mice brain and was attributed to off-target silencing through interactions of miHDS1 with the 3'UTR of other transcripts. While we resolved miHDS1 toxicity in mouse brain and maintained miHDS1-silencing efficacy, these studies highlight that optimizing nucleic acid-based medicines for safety in humans presents challenges for safety testing in rodents or other distantly related species.
The RNA-binding protein L7Ae, known for its role in translation (as part of ribosomes) and RNA modification (as part of sn/oRNPs), has also been identified as a subunit of archaeal RNase P, a ribonucleoprotein complex that employs an RNA catalyst for the Mg2+-dependent 5' maturation of tRNAs. To better understand the assembly and catalysis of archaeal RNase P, we used a site-specific hydroxyl radical-mediated footprinting strategy to pinpoint the binding sites of Pyrococcus furiosus (Pfu) L7Ae on its cognate RNase P RNA (RPR). L7Ae derivatives with single-Cys substitutions at residues in the predicted RNA-binding interface (K42C/C71V, R46C/C71V, V95C/C71V) were modified with an iron complex of EDTA-2-aminoethyl 2-pyridyl disulfide. Upon addition of hydrogen peroxide and ascorbate, these L7Ae-tethered nucleases were expected to cleave the RPR at nucleotides proximal to the EDTA-Fe–modified residues. Indeed, footprinting experiments with an enzyme assembled with the Pfu RPR and five protein cofactors (POP5, RPP21, RPP29, RPP30 and L7Ae–EDTA-Fe) revealed specific RNA cleavages, localizing the binding sites of L7Ae to the RPR's catalytic and specificity domains. These results support the presence of two kink-turns, the structural motifs recognized by L7Ae, in distinct functional domains of the RPR and suggest testable mechanisms by which L7Ae contributes to RNase P catalysis.
ArfA recognizes the lack of mRNA in the mRNA channel after RF2 binding for ribosome rescue
Although trans-translation mediated by tmRNA-SmpB has long been known as the sole system to relieve bacterial stalled ribosomes, ArfA has recently been identified as an alternative factor for ribosome rescue in Escherichia coli. This process requires hydrolysis of nascent peptidyl-tRNA by RF2, which usually acts as a stop codon-specific peptide release factor. It poses a fascinating question of how ArfA and RF2 recognize and rescue the stalled ribosome. Here, we mapped the location of ArfA in the stalled ribosome by directed hydroxyl radical probing. It revealed an ArfA-binding site around the neck region of the 30S subunit in which the N- and C-terminal regions of ArfA are close to the decoding center and the mRNA entry channel, respectively. ArfA and RF2 sequentially enter the ribosome stalled in either the middle or 3' end of mRNA, whereas RF2 induces a productive conformational change of ArfA only when ribosome is stalled at the 3' end of mRNA. On the basis of these results, we propose that ArfA functions as the sensor to recognize the target ribosome after RF2 binding.
Functional characterization of C. elegans Y-box-binding proteins reveals tissue-specific functions and a critical role in the formation of polysomes
The cold shock domain is one of the most highly conserved motifs between bacteria and higher eukaryotes. Y-box-binding proteins represent a subfamily of cold shock domain proteins with pleiotropic functions, ranging from transcription in the nucleus to translation in the cytoplasm. These proteins have been investigated in all major model organisms except Caenorhabditis elegans. In this study, we set out to fill this gap and present a functional characterization of CEYs, the C. elegans Y-box-binding proteins. We find that, similar to other organisms, CEYs are essential for proper gametogenesis. However, we also report a novel function of these proteins in the formation of large polysomes in the soma. In the absence of the somatic CEYs, polysomes are dramatically reduced with a simultaneous increase in monosomes and disomes, which, unexpectedly, has no obvious impact on animal biology. Because transcripts that are enriched in polysomes in wild-type animals tend to be less abundant in the absence of CEYs, our findings suggest that large polysomes might depend on transcript stabilization mediated by CEY proteins.
The conserved GTPase LepA contributes mainly to translation initiation in Escherichia coli
LepA is a paralog of EF-G found in all bacteria. Deletion of lepA confers no obvious growth defect in Escherichia coli, and the physiological role of LepA remains unknown. Here, we identify nine strains (dksA, molR1, rsgA, tatB, tonB, tolR, ubiF, ubiG or ubiH) in which lepA confers a synthetic growth phenotype. These strains are compromised for gene regulation, ribosome assembly, transport and/or respiration, indicating that LepA contributes to these functions in some way. We also use ribosome profiling to deduce the effects of LepA on translation. We find that loss of LepA alters the average ribosome density (ARD) for hundreds of mRNA coding regions in the cell, substantially reducing ARD in many cases. By contrast, only subtle and codon-specific changes in ribosome distribution along mRNA are seen. These data suggest that LepA contributes mainly to the initiation phase of translation. Consistent with this interpretation, the effect of LepA on ARD is related to the sequence of the Shine–Dalgarno region. Global perturbation of gene expression in the lepA mutant likely explains most of its phenotypes.
Ribosomes in the balance: structural equilibrium ensures translational fidelity and proper gene expression
At equilibrium, empty ribosomes freely transit between the rotated and un-rotated states. In the cell, the binding of two translation elongation factors to the same general region of the ribosome stabilizes one state over the other. These stabilized states are resolved by expenditure of energy in the form of GTP hydrolysis. A prior study employing mutants of a late assembling peripheral ribosomal protein suggested that ribosome rotational status determines its affinity for elongation factors, and hence translational fidelity and gene expression. Here, mutants of the early assembling integral ribosomal protein uL2 are used to test the generality of this hypothesis. rRNA structure probing analyses reveal that mutations in the uL2 B7b bridge region shift the equilibrium toward the rotated state, propagating rRNA structural changes to all of the functional centers of ribosome. Structural disequilibrium unbalances ribosome biochemically: rotated ribosomes favor binding of the eEF2 translocase and disfavor that of the elongation ternary complex. This manifests as specific translational fidelity defects, impacting the expression of genes involved in telomere maintenance. A model is presented describing how cyclic intersubunit rotation ensures the unidirectionality of translational elongation, and how perturbation of rotational equilibrium affects specific aspects of translational fidelity and cellular gene expression.
G-triplex structure and formation propensity
The occurrence of a G-triplex folding intermediate of thrombin binding aptamer (TBA) has been recently predicted by metadynamics calculations, and experimentally supported by Nuclear Magnetic Resonance (NMR), Circular Dichroism (CD) and Differential Scanning Calorimetry (DSC) data collected on a 3' end TBA-truncated 11-mer oligonucleotide (11-mer-3'-t-TBA). Here we present the solution structure of 11-mer-3'-t-TBA in the presence of potassium ions. This structure is the first experimental example of a G-triplex folding, where a network of Hoogsteen-like hydrogen bonds stabilizes six guanines to form two G:G:G triad planes. The G-triplex folding of 11-mer-3'-t-TBA is stabilized by the potassium ion and destabilized by increasing the temperature. The superimposition of the experimental structure with that predicted by metadynamics shows a great similarity, with only significant differences involving two loops. These new structural data show that 11-mer-3'-t-TBA assumes a G-triplex DNA conformation as its stable form, reinforcing the idea that G-triplex folding intermediates may occur in vivo in human guanine-rich sequences. NMR and CD screening of eight different constructs obtained by removing from one to four bases at either the 3' and the 5' ends show that only the 11-mer-3'-t-TBA yields a relatively stable G-triplex.
Structural and biochemical impact of C8-aryl-guanine adducts within the NarI recognition DNA sequence: influence of aryl ring size on targeted and semi-targeted mutagenicity
Chemical mutagens with an aromatic ring system may be enzymatically transformed to afford aryl radical species that preferentially react at the C8-site of 2'-deoxyguanosine (dG). The resulting carbon-linked C8-aryl-dG adduct possesses altered biophysical and genetic coding properties compared to the precursor nucleoside. Described herein are structural and in vitro mutagenicity studies of a series of fluorescent C8-aryl-dG analogues that differ in aryl ring size and are representative of authentic DNA adducts. These structural mimics have been inserted into a hotspot sequence for frameshift mutations, namely, the reiterated G3-position of the NarI sequence within 12mer (NarI(12)) and 22mer (NarI(22)) oligonucleotides. In the NarI(12) duplexes, the C8-aryl-dG adducts display a preference for adopting an anti-conformation opposite C, despite the strong syn preference of the free nucleoside. Using the NarI(22) sequence as a template for DNA synthesis in vitro, mutagenicity of the C8-aryl-dG adducts was assayed with representative high-fidelity replicative versus lesion bypass Y-family DNA polymerases, namely, Escherichia coli pol I Klenow fragment exo– (Kf–) and Sulfolobus solfataricus P2 DNA polymerase IV (Dpo4). Our experiments provide a basis for a model involving a two-base slippage and subsequent realignment process to relate the miscoding properties of C-linked C8-aryl-dG adducts with their chemical structures.
We have determined the 1.50 Å crystal structure of the DNA decamer, d(CCACNVKGCGTGG) (CNVK, 3-cyanovinylcarbazole), which forms a G-quadruplex structure in the presence of Ba2+. The structure contains several unique features including a bulged nucleotide and the first crystal structure observation of a C-tetrad. The structure reveals that water molecules mediate contacts between the divalent cations and the C-tetrad, allowing Ba2+ ions to occupy adjacent steps in the central ion channel. One ordered Mg2+ facilitates 3'-3' stacking of two quadruplexes in the asymmetric unit, while the bulged nucleotide mediates crystal contacts. Despite the high diffraction limit, the first four nucleotides including the CNVK nucleoside are disordered though they are still involved in crystal packing. This work suggests that the bulky hydrophobic groups may locally influence the formation of non-Watson–Crick structures from otherwise complementary sequences. These observations lead to the intriguing possibility that certain types of DNA damage may act as modulators of G-quadruplex formation.
Structural insights into the function of a unique tandem GTPase EngA in bacterial ribosome assembly
Many ribosome-interacting GTPases, with proposed functions in ribosome biogenesis, are also implicated in the cellular regulatory coupling between ribosome assembly process and various growth control pathways. EngA is an essential GTPase in bacteria, and intriguingly, it contains two consecutive GTPase domains (GD), being one-of-a-kind among all known GTPases. EngA is required for the 50S subunit maturation. However, its molecular role remains elusive. Here, we present the structure of EngA bound to the 50S subunit. Our data show that EngA binds to the peptidyl transferase center (PTC) and induces dramatic conformational changes on the 50S subunit, which virtually returns the 50S subunit to a state similar to that of the late-stage 50S assembly intermediates. Very interestingly, our data show that the two GDs exhibit a pseudo-two-fold symmetry in the 50S-bound conformation. Our results indicate that EngA recognizes certain forms of the 50S assembly intermediates, and likely facilitates the conformational maturation of the PTC of the 23S rRNA in a direct manner. Furthermore, in a broad context, our data also suggest that EngA might be a sensor of the cellular GTP/GDP ratio, endowed with multiple conformational states, in response to fluctuations in cellular nucleotide pool, to facilitate and regulate ribosome assembly.
Mammalian synthetic biology may provide novel therapeutic strategies, help decipher new paths for drug discovery and facilitate synthesis of valuable molecules. Yet, our capacity to genetically program cells is currently hampered by the lack of efficient approaches to streamline the design, construction and screening of synthetic gene networks. To address this problem, here we present a framework for modular and combinatorial assembly of functional (multi)gene expression vectors and their efficient and specific targeted integration into a well-defined chromosomal context in mammalian cells. We demonstrate the potential of this framework by assembling and integrating different functional mammalian regulatory networks including the largest gene circuit built and chromosomally integrated to date (6 transcription units, 27kb) encoding an inducible memory device. Using a library of 18 different circuits as a proof of concept, we also demonstrate that our method enables one-pot/single-flask chromosomal integration and screening of circuit libraries. This rapid and powerful prototyping platform is well suited for comparative studies of genetic regulatory elements, genes and multi-gene circuits as well as facile development of libraries of isogenic engineered cell lines.
Target-responsive DNA-capped nanocontainer used for fabricating universal detector and performing logic operations
Nucleic acids have become a powerful tool in nanotechnology because of their controllable diverse conformational transitions and adaptable higher-order nanostructure. Using single-stranded DNA probes as the pore-caps for various target recognition, here we present an ultrasensitive universal electrochemical detection system based on graphene and mesoporous silica, and achieve sensitivity with all of the major classes of analytes and simultaneously realize DNA logic gate operations. The concept is based on the locking of the pores and preventing the signal-reporter molecules from escape by target-induced the conformational change of the tailored DNA caps. The coupling of ‘waking up’ gatekeeper with highly specific biochemical recognition is an innovative strategy for the detection of various targets, able to compete with classical methods which need expensive instrumentation and sophisticated experimental operations. The present study has introduced a new electrochemical signal amplification concept and also adds a new dimension to the function of graphene-mesoporous materials hybrids as multifunctional nanoscale logic devices. More importantly, the development of this approach would spur further advances in important areas, such as point-of-care diagnostics or detection of specific biological contaminations, and hold promise for use in field analysis.
It is now known that unwanted noise and unmodeled artifacts such as batch effects can dramatically reduce the accuracy of statistical inference in genomic experiments. These sources of noise must be modeled and removed to accurately measure biological variability and to obtain correct statistical inference when performing high-throughput genomic analysis. We introduced surrogate variable analysis (sva) for estimating these artifacts by (i) identifying the part of the genomic data only affected by artifacts and (ii) estimating the artifacts with principal components or singular vectors of the subset of the data matrix. The resulting estimates of artifacts can be used in subsequent analyses as adjustment factors to correct analyses. Here I describe a version of the sva approach specifically created for count data or FPKMs from sequencing experiments based on appropriate data transformation. I also describe the addition of supervised sva (ssva) for using control probes to identify the part of the genomic data only affected by artifacts. I present a comparison between these versions of sva and other methods for batch effect estimation on simulated data, real count-based data and FPKM-based data. These updates are available through the sva Bioconductor package and I have made fully reproducible analysis using these methods available from: https://github.com/jtleek/svaseq.
Insyght: navigating amongst abundant homologues, syntenies and gene functional annotations in bacteria, it's that symbol!
High-throughput techniques have considerably increased the potential of comparative genomics whilst simultaneously posing many new challenges. One of those challenges involves efficiently mining the large amount of data produced and exploring the landscape of both conserved and idiosyncratic genomic regions across multiple genomes. Domains of application of these analyses are diverse: identification of evolutionary events, inference of gene functions, detection of niche-specific genes or phylogenetic profiling. Insyght is a comparative genomic visualization tool that combines three complementary displays: (i) a table for thoroughly browsing amongst homologues, (ii) a comparator of orthologue functional annotations and (iii) a genomic organization view designed to improve the legibility of rearrangements and distinctive loci. The latter display combines symbolic and proportional graphical paradigms. Synchronized navigation across multiple species and interoperability between the views are core features of Insyght. A gene filter mechanism is provided that helps the user to build a biologically relevant gene set according to multiple criteria such as presence/absence of homologues and/or various annotations. We illustrate the use of Insyght with scenarios. Currently, only Bacteria and Archaea are supported. A public instance is available at http://genome.jouy.inra.fr/Insyght. The tool is freely downloadable for private data set analysis.
The characterization of transcription factor complexes and their binding sites in the genome by affinity purification has yielded tremendous new insights into how genes are regulated. The affinity purification requires either the use of antibodies raised against the factor of interest itself or by high-affinity binding of a C- or N-terminally added tag sequence to the factor. Unfortunately, fusing extra amino acids to the termini of a factor can interfere with its biological function or the tag may be inaccessible inside the protein. Here, we describe an effective solution to that problem by integrating the ‘tag’ close to the nuclear localization sequence domain of the factor. We demonstrate the effectiveness of this approach with the transcription factors Fli-1 and Irf2bp2, which cannot be tagged at their extremities without loss of function. This resulted in the identification of novel proteins partners and a new hypothesis on the contribution of Fli-1 to hematopoiesis.