RNA polymerases (RNAPs) carry out transcription in all living organisms. All multisubunit RNAPs are derived from a common ancestor, which becomes apparent from their subunit composition, amino acid sequence, structure, function and molecular mechanisms. Despite their similarity, the organisms who depend on them are extremely diverse ranging from microbes to man. The RNAP laboratory at the ISMB is following a focussed research programme, which aims at characterising the molecular mechanisms of transcription. The major asset of our laboratory is a recombinant 12-subunit archaeal RNAP, which represents a unique and powerful tool to study transcription of transcription in vitro. Our system is due to its superior biochemical tractability ideally suited to delineate the mechanisms of its eukaryotic counterparts that are not amenable to rigorous analysis in vitro. The technical expertise in our laboratory is rooted in strong biochemistry, molecular biology and biophysics, we have highly productive collaborations with structural biologists and chemists, and we are increasingly introducing sophisticated tools to our system including fluorescence and electron spin resonance methods. We believe that this multidisciplinary approach is essential to address important questions about gene expression at the atomic level in the future. Using this strategy we continue making world-class contributions to the field of gene expression.
Introduction – Information processing in Biology.
Since Francis Crick phrased the 'central dogma' of molecular biology in the mid 1950s - according to which DNA-makes-RNA-makes-Protein – scientists from a broad range of backgrounds have investigated the flow of genetic information in biological systems (Watson and Crick, 2003). According to this traditional view, the DNA template-dependent synthesis of DNA is referred to as replication, the DNA template-dependent synthesis of RNA is transcription, and RNA in turn is translated into proteins (Figure 1). Soon after the discovery that nucleic acids not only encode the genetic information, but are also instrumental in translating it into proteins in the form of ribosomes (e.g. rRNA) and their ligands (e.g. tRNA), it became apparent that this assumed unidirectional flow of information is anything but simple, nor is it unidirectional (Figure 1). For example, a plethora of viral genomes are made of RNA, which is reverse transcribed into DNA. The perpetuation of RNA genomes is facilitated by RNA-template dependent RNA synthesis - transcription - implying that replication and transcription can be the same process. What initially was perceived as a quaint catalytic property of selfsplicing introns (Cech and Bass, 1986) and processes involved in tRNA maturation (Altman and Robertson, 1973) rapidly lead to the discovery of enzymes that were entirely made of RNA with a whole range of activities, ribozymes. In the last decade a large number of small noncoding RNA molecules have emerged as potent regulators of replication, transcription and translation, mRNA folding and stability. Thus, RNA molecules are involved in all stages of the information processing in extant biology; they encode information and provide structural, regulatory and catalytic properties. What is the reason for this pervasive omnipresence of RNA in life - is it due to the versatile properties of RNA alone or is it a relic of the ancient past of our Biosphere? Currently no theoretical models provide satisfying and unequivocal answers to the Origins of Life (Schuster, 2010). However, both the perpetuation of genetic information and the ability to alter the information by mutation or recombination was necessarily required for a hypothetical ancestor to be subject to Darwinian evolution. The 'RNA world' hypothesis provides such a model, and RNA polymerases play a fundamental role in this scenario (Joyce, 1999).
The vast majority of organisms on earth store their genetic information in the format of DNA, which is transcribed into RNA, and the RNA is translated into protein (green arrows). In the hypothetical 'RNA world' era more than 4 billion years ago RNA molecules both encoded information and carried out catalysis (highlighted in red). DNA as storage medium and protein as catalyst emerged much later in evolution (grey arrows), and adapted to resemble the flow of information we know from extant life. Note that small noncoding RNA species regulate all three fundamental processes (thin grey arrows).
General structural features of RNAP
RNA polymerisation has been ‘invented’ at least six times during evolution, as judged by independent structural folds of RNAP active sites. However, it is noteworthy that all RNAP responsible for transcribing cellular genomes of all living organisms – without exceptions – are evolutionary conserved, which implies that they are derived from a common ancestor. This is reflected in the sequence, structure and function of the RNAP subunits, and transcription factors that regulate their activity (Table 1).
Multisubunit RNAP resemble a crab claw whose 'jaws' (Figure 1, highlighted in red) interact with duplex DNA in the direction of transcription. The DNA projects along the floor the major DNA-binding channel (blue) and is secured by the RNAP 'clamp' (green) until it encounters the active centre (yellow) at the RNAP 'wall'. The DNA-RNA hybrid is perpendicular to the downstream duplex DNA and the strands are separated by the RNAP 'lid', and the transcript is guided by interactions with the RNAP 'stalk' (orange). The NTP entry pore, or secondary channel, is located under the active site and allows access of substrates and cleavage factors to the active site and extrusion of the transcript during backtracking. All RNAP subunits can be divided into three overlapping functional classes. RNAP subunits homologous to Rpo3 (corresponding to alphaI in bacteria), 10, 11 (alphaII) and 12 form the assembly platform (deep blue), whose association nucleates RNAP assembly. The two largest subunits Rpo1 (beta') and 2 (beta) form the catalytic core that harbours the active site including the Magnesium chelating carboxylate residues, the bridge and trigger helices, the downstream DNA and DNA-RNA hybrid binding sites, the secondary NTP entry channel and loop and switch regions that are instrumental in the handling of the nucleic acids scaffold including strand separation. The combination of assembly platform and catalytic core is the minimal subunit configuration of active RNAPs. The other RNAP subunits are not strictly required for basic RNAP operations (including promoter-directed transcription) and have auxiliary functions by adding interaction sites with basal transcription factors and/or nucleic acids. Rpo5 extends the RNAP jaw's interactions with the downstream duplex DNA during transcription initiation, Rpo6 (omega) aids the folding and stability of Rpo1 and acts as anchorage point for the Rpo4/7 'stalk' complex. RNAP subunits Rpo4/7 form a stable heterodimeric subcomplex, which interacts with the nascent RNA transcript during elongation and termination.
The transcription cycle
Multisubunit RNAPs carry out transcription by repeatedly cycling through initiation, elongation and termination phases. RNAP activity is dependent on, and modulated by, exogenous transcription factors. In the Archaea TBP, TFB and TFE facilitate transcription initiation, whereas TFS and Spt4/5 regulate transcription elongation. Some initiation factors (e.g. TBP) remain associated with the promoter ready for the recruitment of a next RNAP in the subsequent cycle, whereas other factors (e.g. TFE) may be retained by elongating RNAPs 123. Transcription termination in Archaea is facilitated by poly-U signals at the 3' end of the template gene.
Functional architecture of RNAP
Despite a relatively low sequence identity, the two main types of multisubunit RNAP, exemplified by the Bacterial and Archaeal/eukaryotic enzymes, respectively, display an impressive degree of structural homology (Figure 4; homologous RNAP subunits are colour coded). The strictly conserved residues cluster around the RNAP active site (Figure 4, red circle) including the bridge (Figure 4, orange) and trigger helices, form the NTP entry pore and are involved in the handling of the template and nontemplate DNA- and RNA strands, or encode flexible motifs including the RNAP clamp (Figure 4, green circle) and the switch regions (Ruprich-Robert and Thuriaux, 2010). How does the active site of multisubunit RNAPs work? X-ray structures of the Bacterial and yeast enzymes have captured conformational intermediates of the NTP addition cycle that correspond to distinct functional states of RNAP, which in combination are the molecular basis for the physical translocation of RNAP along the template gene (Vassylyev et al., 2007a, Vassylyev et al., 2007b, Brueckner et al., 2009). Two structural motifs, the bridge and trigger helices, are crucial to the mechanism. The RNAP-DNA-RNA elongation complex is in equilibrium between pre- and post-translocated states, where the latter corresponds to the RNAP having moved one basepair in the downstream direction. The NTP substrate is inserted into the RNAP active site in the post-translocated 'preinsertion' state, the active site 'closes' by a structural rearrangement of the trigger motif from a loop to a helical structure, which results in the formation of a trihelix bundle with the bridge helix - the 'insertion' state. In this posttranslocated insertion state conformation the two Magnesium ions are ideally positioned and the active site is competent for catalysis, and a new phosphodiester bond is formed. Pyrophosphate leaves the active site, and the elongation complex is rendered in the pre-translocated state. Transition of the pre- into post-translocated state involves an 'opening' of the active site into the preinsertion state, i.e. a structural rearrangement of the trihelix bundle into trigger loop and bridge helix - ready for the subsequent NTP binding event and the next nucleotide addition cycle.
The structurally conserved core of the Bacterial T. aquaticus RNAP (A) and Archaeal S. shibatae RNAP (C) can easily be recognised by close inspection of their X-ray structures. Important functional features such as the active site (B, metal A shown as pink sphere), the bridge helix (highlighted in orange) and the main DNA-binding channel are well conserved. The homologous RNAP subunits are colour-coded according to the key in the figure.
The Archaeal/eukaryote-specific RNAP subunits - which have no homologues in Bacterial RNAPs (Figure 4 C and D, highlighted in magenta) - interact with many of the universally conserved subunits (Figure 4 B, highlighted in blue) and are not clustered at one particular site of the enzyme. The nomenclature for the eukaryotic RNAPII subunits is RPB1-12 (from largest to smallest S. cerevisiae polypeptide), while the Archaeal RNAP subunits are named Rpo1-13 (Table 1) or are designated with a letter code in the older literature (A, B, D, E, F, G, H, K, L, N and P). Four subunits, Rpo3/10/11/12, are required for the efficient assembly of RNAPs; they form the aptly named assembly platform (Werner and Weinzierl, 2002). Rpo3/11 is homologous to the Bacterial alpha homodimer, which is sufficient for baterial RNAP assembly (Werner et al., 2000) (Figure 5 A and C). Subunits Rpo10 (N)- and 12 (P) fill concave depressions in the second largest RNAP subunit (Rpo2 [B]) and thereby act as molecular adaptors between Rpo2 and 3 (RPB2 and 3) (Figure 6 C and D), which explains at least in part their role during RNAP assembly. However, Rpo10 and 12 have additional functions beyond RNAP assembly. Thus, the Archaeal homologue of RPB12, Rpo12 (P) has been shown to play a role during transcription initiation by promoting DNA melting and stabilising the open complex (Reich et al., 2009). Similar Rpo5 (H) is instrumental in DNA melting and early transcription (Grunberg et al., 2010). RPB5 consists of two discrete domains; a eukaryote-specific N-terminal domain that interacts with the basal initiation factor TFIIB (Lin et al., 1997, Cheong et al., 1995) (Figure 3). The C-terminal domain of RPB5, which corresponds to the full-length Archaeal homologue Rpo5 (H), makes intricate contacts with the C-terminus of the largest RNAP subunit (Rpo1 [A"], Figure 5 and 6). Rpo5 (H) and a fragment of Rpo1 form the lower jaw domain of RNAP, which is more extended than its Bacterial counterpart (Hirata et al., 2008; Korkhin et al., 2009) (Figure 4).
Cycling through transcription with Rpo4/7
The most prominent structural feature that distinguishes archaeo-eukaryotic RNAPs from their bacterial counterparts is a stalk like protrusion located at the periphery of the roughly ellipsoid shape of the enzyme in proximity of the RNA exit channel (Figure 4 and 5). This signature module of archaeo-eukaryotic RNAPs is a heterodimeric complex consisting of RNAP subunits Rpo4/7, which are homologous to RPB4/7 in the Saccharomyces cerevisiae RNAPII (Todone et al., 2001). In addition to the archaeal RNAP, all five types of eukaryotic RNAPs (RNAPI, II, III, IV and V) harbour homologues of the Rpo4/7 complex, which suggests that they are important for RNAP function (Ream et al., 2009). Surprisingly, initial studies with a wholly recombinant archaeal transcription system demonstrated that RNAP lacking the Rpo4/7 complex were not only capable of RNA polymerisation in promoter-independent transcription assays, but also able to initiate transcription in a start-site specific and basal transcription factor dependent manner (Werner and Weinzierl, 2002). These experiments suggested that Rpo4/7 was not strictly required for RNAP function. However, subsequent analysis using more elaborate assays has shown that the Rpo4/7 complex is a highly versatile RNAP module, which plays multiple roles during the transcription cycle.
Model of the functional interplay between the Rpo4/7 (F/E) complex, the RNAP clamp and exogenous transcription factors during the transcription cycle. Please note that Rpo4/7 is in the literature often referred to the RNAP-F/E complex. During initiation Rpo4/7 and the basal factor TFE interact and are able to modulate the position of the RNAP clamp ('open-to-close'), which in turn facilitates DNA melting ('open'complex formation, A). The winged helix domain (WH) of TFE interacts with the nontemplate strand (NTS) of the promoter. During elongation Rpo4/7 and the elongation factor Spt4/5 ensure increased stability of the elongation complex by closing the RNAP clamp over the DNA binding channel ('keep closed') (B). The Spt5 NGN domain is sequestered to RNAP by binding to the clamp coiled coil (Hirtreiter, 2010) (Grohmann, 2010). The KOW domain of the bacterial Spt5 homologue, NusG, has been shown to interact with ribosomal protein S10 (aka NusE) and thus physically links RNA polymerase to the ribosome during elongation (Burmann et al., 2010); (Proshkin et al., 2010). Rpo4/7 augments the termination efficiency of RNAP on weak terminator signals. The mechanism is not fully understood but is likely to involve conformational changes in the active site leading to an opening of the RNAPclamp ('closed-to-open') and subsequent dissociation of the TEC (C). To date no archaeal transcription termination factors (highlighted with a question mark) have been identified.
The universally conserved elongation factor Spt5
The transcription elongation properties of RNAPs are influenced by their subunit composition (e.g. Rpo4/7) and regulated by exogenous transcription factors including Spt5, the only known RNAP-associated transcription factor that is universally conserved in evolution. We have recently solved the X-ray structure of M. jannaschii Spt4/5 and characterised its function (Figure 7). Our results demonstrate an astonishing degree of conservation in terms of Spt4/5 structure, its interaction with RNAP and its stimulatory properties on transcription elongation. Archaeal Spt5 is, like its bacterial homologue NusG, comprised of two domains, the NGN domain (NusG N-terminal domain) and a KOW domain (Kyrpidis, Ouzounis and Woese domain). We carried out a deletion analysis, which revealed that Spt5-NGN is the effector domain of Spt5 that mediates the dimerisation with Spt4, the binding to RNAP, and is required for the elongation activity of Spt4/5. The last two features are reliant on an interaction between a deep hydrophobic cavity in the Spt5-NGN domain and the tip of the RNAP clamp coiled coil, a surface exposed structural feature that is conserved in all multisubunit RNAPs. A very similar interaction between the Spt5 homologue NusG and its cognate RNAP has been proposed in the bacterial system.
Universal evolutionary conservation of the elongation factor Spt4/5 and NusG. Structural alignment of bacterial NusG NGN domain (A, E. coli, pdb code 2K06), archaeal Spt4/5 NGN (B, M. jannaschii, pdb 3LPE) and eukaryotic Spt4/5 NGN (C, S. cerevisiae, pdb 2EXU). The ultimate C-terminal residue of the NGN domain resolved in the structure is indicated with a dashed circle. Panel D illustrates the central role of Spt5 due to its universal evolutionary conservation in all three domains of life. Panel E shows the X-ray structure of full-length P. furiousus Spt4/5 containing the C-terminal KOW domain (K. Murakami, unpublished results). The NGN domains are highlighted in fire brick red, Spt4 is highlighted in wheat and the Spt5 KOW domain is highlighted in pink.
What are the molecular mechanisms by which Spt4/5 stimulates elongation? The Spt5-NGN binding site on RNAP is approximately 70 Å distant from the active site, which suggests an allosteric mechanism of stimulation. We find it significant that both RNAP subunits Rpo4/7 and Spt4/5 are in close proximity of the RNAP clamp and that both affect transcription elongation in a similar manner: by increasing the processivity in a fashion that is not dependent on the NTS. The latter result makes it unlikely that Rpo4/7 and Spt4/5 function solely by interacting with the NTS, or act by affecting downstream DNA-strand separation, or upstream DNA-strand joining. Rather we propose that Spt4/5 induces a conformational change in the RNAP clamp that is translated into the RNAP interior and results in increased translocation efficiency. This mechanism is reminiscent of NusG and its paralogue RfaH, which have been proposed to stimulate elongation by stabilising the forward translocated state of the RNAP active site. Alternatively, Spt4/5 could bridge the gap over the main DNA-binding channel of RNAP and 'lock' the RNAP clamp in a closed position, which would result in an increased elongation complex stability and enhanced processivity (Figure 2B).
The elongation first hypothesis
Despite the high degree of homology between RNAPs in the three domains of life, the basal transcription factors that are required for transcription initiation in bacteria and archaea/eukaryotes, respectively, are not evolutionary related. In contrast, the only RNAP-associated transcription factor that is universally conserved in evolution, Spt5/NusG, controls the elongation phase of transcription. What is the significance of this observation, and what does it tell us about the regulation of the ancestral form of RNAPs in LUCA? Due to the complete absence of any bona fide sigma factor homologues in Archaea and eukaryotes, and of TBP/TF(II)B homologues in Bacteria, it is unlikely that the RNAP of the LUCA initiated transcription aided by sigma- or TBP/TF(II)B-like transcription factors. Rather than regulating transcription by recruiting RNAPs to proto-promoters in a TATA- or -35/-10 element-dependent manner, RNAPs could have initiated transcription largely nonsequence-specifically by directly associating with the template DNA, and without being aided by basal transcription factors. Suitable candidates for these RNAP ‘entry sites’ are T/A-rich sequences since (i) they have a propensity to distort the DNA topology (DNA bending) and (ii) based on energetic considerations since T/A-rich DNA strands melt readily and thus allow template DNA strand loading into the RNAP active site. It may thus not be accidental that both TATA elements and -10 boxes in contemporary promoters are rich in T- and A-residues. Auxiliary protein factors could have evolved independently in the bacterial and archaeo-eukaryotic lineages to enhance this process, and eventually these sequences could have co-evolved with their cognate factors resulting in the TBP:TATA and sigma:-35/-10 box ensembles of extant archaea/eukaryotes and bacteria, respectively. It is also worth keeping in mind that single-subunit RNAPs such as T7 RNAP are capable of promoter sequence-dependent transcription initiation without any requirements for additional factors.
Due to the extensive structural and functional homology between bacterial NusG and archaeal/eukaryotic Spt5 and taking into consideration the highly conserved binding sites on RNAP, it is almost certain that a NusG/Spt5-like transcription factor associated with the LUCA RNAP, modulated its properties, and possibly regulated gene expression. This hypothesis implies that regulation of evolutionary ancient RNAPs could predominantly have targeted the elongation phase of transcription, and not initiation. It is unclear to which extent the ancestral forms of NusG/Spt5 regulated transcription per se. However, regulation could have occurred by counteracting sequence-specific pausing, via a modulation of the elongation rates, by affecting the likelihood of entering the paused state, or by decreasing the pause duration. All these phenomena have been observed with E. coli NusG. Alternatively or in addition to regulating transcription of the LUCA RNAP in a gene-specific manner, NusG/Spt5 could have acted as a general processivity factor, possibly even by improving the relatively poor utilisation of RNA templates by RNAP prior to the emergence of DNA as main coding molecule. Two recent reports demonstrate that NusG plays a crucial role in coupling transcription and translation in vivo by connecting elongating RNAPs and ribosomes, which furthermore underpins the crucial role of NusG/Spt5-like factors for the regulation of gene expression, and their possible very early origin.