The impact of transcription-mediated replication stress on genome instability and human disease

文章来源: 作者: 发布时间:2020年09月23日 点击数: 字体:

Review Article                                                                                             Open Access

Published: 30 August 2020


Stefano Gnan, Yaqun LiuManuela Spagnuolo & Chun-Long Chen 

Genome Instability & Disease volume 1,207–234(2020)


Abstract

DNA replication is a vital process in all living organisms. At each cell division, > 30,000 replication origins are activated in a coordinated manner to ensure the duplication of > 6 billion base pairs of the human genome. During differentiation and development, this program must adapt to changes in chromatin organization and gene transcription: its deregulation can challenge genome stability, which is a leading cause of many diseases including cancers and neurological disorders. Over the past decade, great progress has been made to better understand the mechanisms of DNA replication regulation and how its deregulation challenges genome integrity and leads to human disease. Growing evidence shows that gene transcription has an essential role in shaping the landscape of genome replication, while it is also a major source of endogenous replication stress inducing genome instability. In this review, we discuss the current knowledge on the various mechanisms by which gene transcription can impact on DNA replication, leading to genome instability and human disease.

Introduction

Humans start their life as a single cell that has to repeatedly divide to create the ~ 40 trillion cells that comprise the human body (Bianconi et al. 2013). It is essential that all the genetic information contained in the zygote is reliably transmitted to all daughter cells to guarantee proper development. At each cell division, DNA replication involves the activation of tens of thousands of replication origins to ensure complete genome duplication. This program must be very robust and be able to adjust to the gene transcription programs and the chromatin organization of the different cell types that they are becoming. Furthermore, a huge level of DNA replication and cell division is necessary during the entire human life span to replace old, dead or damaged cells. If DNA replication fails, genome integrity is challenged and many diseases, such as cancers and neurological disorders, can arise (Ganier et al. 2019; Zeman and Cimprich 2014). It is therefore essential that this program is correctly accomplished. However, during the life span, large numbers of exogenous and endogenous replication stresses routinely challenge DNA integrity and lead to genome instability. In particular, growing evidence indicates that gene transcription itself is an important, yet unavoidable endogenous replication stress, which can either suppress replication initiation, or can generate conflicts with the DNA replication process. In this review, we focus on transcription-mediated replication stress and its impact on human diseases. First, we describe the mechanisms of DNA replication initiation and control, as well as its relation to gene transcription. We then discuss the different mechanisms by which transcription acts as a notable source of replication stress to induce genome instability. Finally, we explain how such transcription-mediated replication stresses are involved in various human diseases.

Origin licensing and firing: a two-step process

The nuclear genome must be correctly duplicated once and only once per mitotic cell division. To avoid genome re-replication, DNA replication is temporarily divided into two steps: (i) the origin licensing that takes place between mitosis and the beginning of the next interface, where all the possible replication origins are recognized and loaded with the pre-replication complex (pre-RC), and (ii) the origin firing that takes place during S phase (Fig. 1). Although most of the collected information comes from yeast, the major process seems to be highly conserved in other eukaryotes. In this section, we report mainly dynamics from budding yeast and integrate with information from other eukaryotes.

Fig. 1figure1

Origin licensing and firing occur over a two-step process. The first step of DNA replication, called origin licensing, consists of loading the pre-replication (pre-RC) complex onto chromatin on all the potential replication origins along the genome. This occurs between the end of mitosis and G1 phase. The second part of the process takes place in S phase, where replication origins are activated through the recruitment of limiting factors that lead to the conversion of pre-RC to pre-IC (pre-initiation complex). This transition is regulated by the replication timing program that marks the order of replication of the genome, with origins in the early-replicating regions fired before those in the late-replicating regions. To avoid re-replication, components of the pre-RC are segregated, exported or degraded, therefore impairing re-licensing

Full size image

The loading of the pre-RCs onto chromatin starts with the recognition of origins by a hetero-hexamer called origin recognition complex (ORC, ORC1-6) (Bell and Stillman 1992), which further recruits other factors to form the pre-RCs. In yeast, replication origins are associated with specific sequences called autonomously replicating sequences (ARS) (Marahrens and Stillman 1992), while in higher eukaryotes the situation is less clear and replication origins are not defined by a specific sequence. Multiple techniques have been used to map replication origins and the results are not concordant, suggesting that we might be looking at different subsets of origins based on the limitations of the various approaches (see Prioleau and MacAlpine 2016; Ganier et al. 2019 for review). Origins identified with the small nascent strand (SNS) method seem to be enriched at transcriptional start sites (TSSs) (Sequeira-Mendes et al. 2009; Cadoret et al. 2008), origin G-rich repeated elements (OGRE) (Cayrou et al. 2012), G quadruplex (G4) (Besnard et al. 2012), high CpG and GC content regions (Cayrou et al. 2011; Delgado et al. 1998; Cadoret et al. 2008), while the OK-seq (Okazaki fragment sequencing) method has shown that origins preferentially position within the intergenic regions before and/or after gene bodies that are AT rich (Tubbs et al. 2018; Petryk et al. 2016) (Fig. 2).

Fig. 2figure2

Origin distributions at early and late-replicating regions. Origin distribution differs between distinct regions of the genome. Within the early-replicating genome (Left panel), replication origins are enriched in intragenic regions between active genes. This effect might be because active transcription can cause disassembly of the pre-RC or sliding of the MCM complex away from the original loading position due to the passage of the RNA polymerase over transcribed genes. In the late-replicating regions (Right panel), which are frequently associated with regions that lack gene transcription, replication origins are almost randomly distributed

Full size image

Replication origins are also marked by epigenetic signatures, such as H2A.Z and H4k20me2/3, the presence of which is needed for ORC1 recruitment (Beck et al. 2012; Long et al. 2020; Kuo et al. 2012). ORC binding is followed by the recruitment of cell division control protein 6 (CDC6), which stabilizes ORC binding (Speck and Stillman 2007), and CDC10-dependent transcript 1 (CDT1). CDT1 is loaded onto chromatin together with the mini-chromosome maintenance (MCM2-7) helicase complex through the interaction with ORC (Evrin et al. 2009; Maiorano et al. 2000; Nishitani et al. 2000). A second MCM complex is then loaded by ORC in an inverted orientation to form MCM double-hexamer formation (Miller et al. 2019). This process completes origin ‘’licensing” (Fig. 1). The pre-RC is not stable on chromatin, and recent studies in yeast and Drosophila have suggested that gene expression can alter origin licensing by disassembling the pre-RC or by sliding the MCM complex away from the original loading position due to the passage of the RNA polymerase over transcribed genes (Gros et al. 2015; Powell et al. 2015). This process might explain the preferential localization of replication initiation sites within intergenic regions between active genes.

To avoid re-replication when cells enter S phase, components of the pre-RC are made inaccessible through post-translational modifications that can cause their inactivation, export out of the nucleus and degradation, or as in the case of CDT1, segregation via an interaction with GEMININ (Ballabeni et al. 2013; Petersen et al. 20001999; Nguyen et al. 2000; Li and DePamphilis 2002; Méndez et al. 2002). At the transition between G1 and S phase, fully formed pre-RCs are phosphorylated at specific sites by Dbf4-dependent kinase (DDK) and cyclin-dependent kinase (CDK). These phosphorylation events lead to CDC45, treslin and Mdm2-binding protein (MTBP) recruitment (Boos et al. 2013; Kumagai and Dunphy 2017; Heller et al. 2011; Ilves et al. 2010; Jares and Blow 2000). Treslin phosphorylation leads to the recruitment of topoisomerase 2-binding protein 1 (TOPBP1), RecQ-like helicase 4 (RECQL4), GINS complex and Pol ε and the subsequent conversion of pre-RCs into pre-initiation complexes (pre-ICs) (Tanaka et al. 2007; Kumagai et al. 2011; Boos et al. 2011; Muramatsu et al. 2010; Sangrithi et al. 2005). At this point, treslin, MTBP, RECQL4 and TOPBP1 are released and the active replisome is formed thanks to MCM10 loading (Kanke et al. 2012; Watase et al. 2012; Kanemaki and Labib 2006; Gambus et al. 2006). Finally, other proteins such as replication protein A (RPA), proliferating cell nuclear antigen (PCNA) and replication factor C (RFC) are loaded and DNA replication starts, called origin firing (MacNeill 2012) (Fig. 1).

Origins usage and replication timing

Of all the potential replication origins that are loaded with a pre-RC, only a subset will actually be fired. Most of these origins are licensed to work as a backup plan in case replicative stress stalls the replication forks (Ge et al. 2007; Woodward et al. 2006; Santocanale et al. 1999). Moreover, all replication origins do not fire at the same time, but instead they follow a cell-type specific spatio-temporal program, known as the replication timing (RT) program (Dimitrova and Gilbert 1999) (Fig. 2). In mammalian cells, this program is established during G1 phase, ~ 2 h after mitosis in a time window referred to as the time decision point (TDP) (Lu et al. 2010; Li et al. 20012003; Wu et al. 2006; Dimitrova and Gilbert 1999). Interestingly, the establishment of the RT precedes the choice as to which origins are going to be used during S phase; that choice instead occurs later in G1 phase during the origin decision point (ODP) (Dimitrova and Gilbert 1999; Li et al. 2003). The relation between TDP and ODP, and the corresponding mechanism(s) need to be further investigated. RT establishment temporally corresponds to the re-establishment of an organized nuclear architecture after mitosis, with the anchoring of chromosomes to the nuclear periphery (Dimitrova and Gilbert 1999; Li et al. 2001) and the establishment of topologically associated domains (TADs) and the A/B compartments (Dileep et al. 2015). Likewise, RT domains have an extensive overlap with TADs and their being early or late-replicating corresponds to A or B compartments, respectively (Pope et al. 2014; Ryba et al. 2010).

Such a strong correlation between RT and the 3D genome structure led the field to hypothesize that these two processes might be coupled and that one might control the other. This hypothesis has been reinforced by the identification of Rif1 (Rap1-interacting factor 1) as a nuclear structural protein that has an important role in RT regulation (Foti et al. 2016). Conversely, knock-outs or knock-downs of several other nuclear structural proteins, such as cohesin or CTCF, alter chromatin structure but not RT (Oldach and Nieduszynski 2019; Rao et al. 2017; Sima et al. 2019; Nora et al. 2017). Moreover, it has been recently shown that Rif1 haploid cells show alterations in chromatin structure but normal RT, which indicates that although the two processes are coordinated, they can be uncoupled (Gnan et al. 2019).

For years, the field has investigated the possibility that RT could reflect gene transcription. In general, early-replicating regions are enriched with expressed genes, while late-replicating regions are not (Woodfine et al. 2004) (Fig. 2). In addition, during development, regions switching RT from early-to-late (or late-to-early) are associated with genes whose expression is switched off (or on) (Hiratani et al. 2008). However, there are numerous exceptions: for example, regions containing expressed genes can also be replicated late, which challenges a direct link between RT and gene transcription (Rivera-Mulia et al. 2015). In addition, switching off genes at the β-globin locus fails to alter RT when chromatin accessibility is not modified, which also seems to go against this model (Cimbora et al. 2000). Indeed, the change in RT at the β-globin locus is associated with changes in accessibility, which seems to support the idea that RT is associated with chromatin accessibility rather than gene expression (Cimbora et al. 2000). In fact, early-replicating regions are associated with open chromatin states (A compartment), while late-replicating regions are enriched in closed chromatin states (B compartment) (Pope et al. 2014). In a recent article, Dileep and colleagues showed that changes in RT can precede or follow changes in gene transcription or be totally independent from it (Dileep et al. 2019). It is therefore likely that both RT and gene transcription are regulated by some common factors shared between the two processes.

How RT and origin usage are regulated is not fully understood, but they can be explained through a model of differential affinity for limiting factors (Fig. 1). To date, limiting factors have been identified in some organisms and include proteins that are essential for the assembly of the pre-IC, such as CDC45, DBF4/CDC7 (regulatory/catalytic subunit of DDK), RecQL4, Treslin, TOPBP1 and MTBP orthologs (Mantiero et al. 2011; Wu and Nurse 2009; Collart et al. 2013; Wong et al. 2011; Tanaka et al. 2011). What regulates the affinity for these limiting factors to replication origins is still unclear, but probably multiple layers of regulation are in place. A first possibility lays on chromatin looping that clusters together origins being fired and leaving backup origins on the periphery of the loops (Courbet et al. 2008). Along the same line, the order of firing could be regulated through chromatin accessibility. As discussed previously, the early-replicating regions have a more open chromatin state than the late-replicating regions (Pope et al. 2014), which might make the late-replicating regions inaccessible at the beginning of the S phase. Moreover, some proteins globally regulate RT, in a way, controlling the accessibility of the limiting factors. One of these is RIF1, which is enriched at late-replicating regions: RIF1 counters DDK activity thanks to its interaction with PP1 (protein phosphatase 1), dephosphorylating components of the pre-RC and limiting origin firing until late S phase (Cornacchia et al. 2012; Mattarocci et al. 2014; Poh et al. 2014; Hiraga et al. 2014; Sukackaite et al. 2017). In fission yeast, the shelterin complex (also called telosome) is involved in RT regulation of a subgroup of late origins through Rif1. Shelterin can recruit Rif1 on telomeric DNA, as Taz1 does, and also brings late-replicating regions into the proximity of Rif1 (Tazumi et al. 2012; Ogawa et al. 2018; Kanoh and Ishikawa 2001). Moreover, Rap1 and Poz1 (two members of the shelterin complex) depletion can impact RT in an indirect manner. In fact, these mutants exhibit abnormal telomere elongation that delocalizes PP1 ortholog from the late Rif1-dependent and Taz1-independent regions to telomeres (Hasegawa et al. 2019). Fork head 1 and 2 (Fkh1/2) are two transcription factors that have also been reported to regulate RT in yeast. These factors group early origins into clusters to facilitate DDK activity (Knott et al. 2012; Fang et al. 2017) via a direct interaction between Fkh1/2 and Dbf4 (Fang et al. 2017). Similarly, Ctf19 and Swi6 recruit DDK to pericentromeric origins, allowing centromeres to replicate early in budding and fission yeasts (Hayashi et al. 2009; Natsume et al. 2013). In S. cerevisiae, two histone deacetylases, Sir2 and Rpd3, control the RT of origins located within the ribosomal DNA (rDNA) array by tuning their ability to compete with single-copy origins for limiting factors (Yoshida et al. 2014). Work is ongoing to identify additional factors and delineate the underlying mechanisms controlling the origin usage and RT. Such work will help us better understand the complex relationship between DNA replication, gene transcription and chromatin organization.

Transcription-mediated replication stresses and genome instability

As described earlier, replication initiation control is a multi-step process ensuring that the entire genome can be replicated once and only once for each cell division. Gene transcription can interplay with the DNA replication program at all stages, i.e., during the G1 phase for the origin setting (location, firing time etc.), or during the S phase for the origin activation, replication fork progress etc. Here, we describe in detail how gene transcription influences DNA replication that leads to genome instability in normal and pathological conditions, and in the contribution to human diseases.

Transcription–replication collision, R-loop formation and genome instability

Once replication forks have been deployed, their progression can be challenged by numerous factors. One such factor is the presence of active transcription along the genome. Collisions between the replication fork and the transcription machinery can either be co-directional (CD) or head-on (HO) (Fig. 3). The latter can be more dangerous for genome integrity (Hamperl et al. 2017). OK-Seq, which helps identify the direction of replication fork movement, has revealed that origin firing occurs more frequently upstream of the TSSs of active genes, ensuring co-directional replication of the most highly transcribed regions of the genome (Petryk et al. 2016). A wildly localized replication termination at the transcription termination sites (TTSs) of transcribed genes under unperturbed conditions was also revealed. Meanwhile, replication termination could redistribute to gene bodies under replication stress, causing increased gene 3′ end replication in an HO orientation (Chen et al. 2019), which strongly induces transcription–replication conflicts (TRCs).

Fig. 3figure3

Transcription–replication conflicts lead to fork stalling and genome instability. Replication and transcription machineries share the same DNA template, which causes replication–transcription conflicts (TRCs). These conflicts can occur in a head-on or co-directional manner. Head-On TRC is generally considered as more deleterious to genome stability, and preferentially occurs around gene transcription termination sites (TTS). The replication forks stall when they encounter RNA Pol II, which favors the transient formation of R-loops. Under normal conditions, harmful R-loop accumulation can be prevented by many factors, such as TOP1, SETX, BRCA1/2, and FANCM. Alternatively, this accumulation can be directly removed by RNase H, XRN2 and certain NER endonucleases like XPG/XPF. If the R-loops and stalled forks persist, the ATR-Chk1 pathway is activated and phosphorylates RPA at the stalled forks. Under topological stress, such as TOP1 depletion, DNA damage is induced, which leads to genome instability (Promonet et al. 2020). R-loops also frequently form at gene transcription start sites (TSS), while they do not seem to induce TRCs and are rather involved in other mechanisms, like transcription regulation

Full size image

Recently, numerous studies have revealed that TRCs are frequently associated with a specific structure known as R-loops (Fig. 3). R-loops are formed when RNA polymerase progresses along the DNA double strands, with newly transcribed RNA re-annealed to the transiently accessible template strand: a DNA:RNA hybrid forms that displaces the non-template strand (Thomas et al. 1976) mainly in the presence of high GC content sequences (Sanz et al. 2016). Importantly, by analyzing the genome-wide distribution of R-loops by DNA:RNA hybrid immunoprecipitation and next-generation sequencing (DRIP-seq), Cimprich and colleagues revealed that R-loops form preferentially at regions with HO TRC (Hamperl et al. 2017). These data reinforce the idea that the CD bias of the human genome might help to minimize the accumulation of HO collisions and deleterious R-loops.

Cells can also regulate R-loops by opposing their formation. As a matter of fact, R-loops preferentially form in the presence of negative supercoils, such as those formed in concomitance with RNA transcription. To resolve these tensions, cells use topoisomerases that rescue normal DNA tension and reduce the accumulation of R-loops (El Hage et al. 2010; Yang et al. 2014). Recently, P. Pasero, C.L. Chen and colleagues discovered that R-loop formation is enriched at TTSs for a subset of highly expressed genes located at early-replicating regions. Here, a higher level of HO collision is frequently associated with the accumulation of phospho-RPA32 (S33), a hallmark of stalled forks. As a result, at these regions, an increase in DNA double-strand breaks (DSBs) and γ-H2AX, a histone mark around broken replication forks, have been observed in cells with topoisomerase 1 (Top1) depletion (Promonet et al. 2020).

It should be noted that although the presence of R-loops on HO TRC can be deleterious, R-loops can also have important physiological roles in many normal cellular processes, including the regulation of transcription termination, chromosome segregation and rearrangement events (Skourti-Stathaki and Proudfoot 2014; Kabeche et al. 2018; Skourti-Stathaki et al. 2011; Xu et al. 2017a). The R-loop balance in cells is therefore maintained via various strategies to protect genome stability. As mentioned, cells use topoisomerases to reduce topological stress and decrease harmful R-loop accumulation (Fig. 3). Cells also present RNase H 5′–3′ exonucleases that can digest RNA from DNA:RNA hybrids. R-loops can also be prevented or resolved through helicases, such as DHX9 and Aquarius (AQR) (Sollier et al. 2014; Chakraborty and Grosse 2011), senataxin (SETX) (Groh et al. 2017) and PIF1 (Zhou et al. 2014). R-loop formation is also tightly regulated via spliceosome binding to RNA (Li and Manley 2005; Gómez-González et al. 2011; Li et al. 2007; Pefanis et al. 2015), the presence of proteins coating RPA (Aguilera and García-Muse 2012; Nguyen et al. 2018) and the ATR-Chk1 pathway (Matos et al. 2020). Many studies have shown that mutations affecting these factors could induce R-loop-associated human diseases, which we discuss in more detail later.

Proteins of homologous recombination and non-homologous end joining on stalled forks

As obstacles to replication fork progression, R-loops can induce genome instability and thus inevitably activate the DNA damage repair pathway. In particular, stalled forks deriving from TRCs activate Fanconi anemia (FA) DSB pathway—a repair system involved in the resolution of R-loop-mediated replication fork collapse (Schwab et al. 2015; García-Rubio et al. 2015). The disruption of critical FA complex members FANCD2, FANCA and FANCM impairs the restarting of stalled forks, and leads to gene instability and DNA damage from R-loop-mediated replication fork collapse (Schwab et al. 2015; García-Rubio et al. 2015). These effects can be reverted by over-expressing RNase H1, a ribonuclease degrading DNA:RNA hybrid, reinforcing the idea that R-loops are responsible for fork stalling at the HO TRC sites (Schwab et al. 2015). Interestingly, a recent study revealed that SLX4, a tumor suppressor, drives (via its interaction with RTEL1) the recruitment of FANCD2 to RNA polymerase II to prevent endogenous transcription-induced replication stress (Takedachi et al. 2020).

Besides the core FA complex members, other factors involved in homologous recombination (HR) accumulate at DSBs, such as RAD52, RAD51, BRCA1 (also called FANCS), and BRCA2 (also called FANCD1), to regulate genome instability through R-loop resolution. Their recruitment can be reduced by RNase H overexpression at active transcription regions or through specific reporter systems (D’Alessandro et al. 2018; Yasuhara et al. 2018). For example, BRCA1 and BRCA2 prevent the potential harmful effects of R-loops by recruiting helicase SETX to R-loops (Hatchi et al. 2015; Zhang et al. 2017). In particular, BRCA1-dependent recruitment of SETX resolves R-loop structures preferentially at TTSs and suppresses DNA damage. Moreover, SETX depletion impairs RAD51 recruitment and favors 53BP1 accumulation, a key DNA damage response (DDR) factor in non-homologous end joining (NHEJ) (Cohen et al. 2018). These data suggest that DNA:RNA hybrids may favor HR factor accumulation to potentially facilitate the elimination of the hybrids so that HR could occur, likely counteracting NHEJ at DSBs within transcribed genes. Interestingly, a recent study revealed that 53BP1 and BRCA1 counteract each other to control the time-dependent switch of the fork restart pathways: here, 53BP1 promotes the fast and BRCA1 promotes the slow kinetics restart pathways, respectively (Xu et al. 2017b). On the other hand, BRCA2 depletion from cells also increases R-loop accumulation. BRCA2 might prevent R-loop formation by preventing replication fork collapse and recruiting the ssDNA binding protein, Rad51, to DSBs (Schlacher et al. 2011). Moreover, BRCA2 recruits RNA polymerase II-associated factor-1 (PAF1) to promoter-bound Pol II to enhance the pause and decrease of R-loop formation (Shivji et al. 2018).

G1 shortening induces abnormal initiation and genome instability within gene body

G1 phase is an important period for origin setting. Rapidly proliferating mammalian embryonic stem cells (ESCs) exhibit a short G1 phase that is < 2 h due to an unusual cell cycle structure (Savatier et al. 1994). Such a short G1 phase is considered a characteristic of ESCs that might help to inhibit differentiation and preserve their pluripotent state (Li et al. 2012). Several studies have reported that the short G1 phase in ESCs, before differentiating, is related to a unique mechanism of cell cycle regulation. In particular, ESCs express low cyclin D1 levels and no cyclin D2/D3, lack MAPK and pRB control (Jirmanova et al. 2002; Savatier et al. 1996; White et al. 2005), lack pathways of p53-p21 in response to DNA damage (Aladjem et al. 1998) and lack activity of cyclin E-Cdk2 and cyclin A-Cdk2 complexes throughout the cell (Stead et al. 2002; White et al. 2005). These findings highlight that cell proliferation control in ESCs is fundamentally different from that in differentiated somatic cell lineages (Coronado et al. 2013). Ample storage of the factors required for replication and relaxed chromatin structures in ESCs results in many more replication initiation sites in S phase. Despite their short G1 phase, ESCs can effectively tolerate an accumulation of replication stress by extensive fork reversal and replication-coupled repair. This feature allows these cells to preserve genome stability, demonstrating that fast proliferating ESCs do not exhibit mechanisms to delay G2/M and G1/S transitions on incomplete replication (Ahuja et al. 2016).

Conversely, somatic cells have a longer G1 phase, which might help to ensure proper origin licensing to guarantee complete genome duplication. Therefore, G1 shortening in somatic cells, e.g., by overexpressing cyclin E, associated with an altered G1-S transition, may lead to deregulation of replication fork progression and DNA damage (Jones et al. 2013). Cyclin E, a member of the cyclin family, has a critical role in controlling the G1-S transition. It binds CDK2 to form the cyclin E/CDK2 complex, which phosphorylates numerous downstream proteins (such as RB, p27, p21) to regulate multiple cellular processes, thus allowing replication initiation and S phase progression (Siu et al. 2012). Ekholm-Reed and colleagues demonstrated that overexpressing cyclin E can shorten the length of G1 phase from about 10–12 h to as little as 2–4 h (Ekholm-Reed et al. 2004). To deeply discern the detailed mechanisms related to the replication stress induced by cyclin E overexpression, Macheret and Halazonetis mapped DNA replication and transcription genome-wide in cells with abnormal cyclin E activation (Macheret and Halazonetis 2018). By investigating the DNA replication initiation profiles (HU-EdU-seq) from cells overexpressing cyclin E versus cells with normal cyclin E levels, they showed that cyclin E overexpression induces extra origins that are frequently located within intragenic regions (Fig. 4). In addition, analysis of newly synthesized transcript profiles through EU-seq has revealed that these novel origins induced by G1 shortening are often located at the 3′ ends of the gene body, showing lower levels of nascent transcripts in G1 cells due to G1 shortening. Importantly, a specific fork collapse has been observed around these origins that only appears under cyclin E overexpression, while fork collapse has not been observed for the constitutive origins (Fig. 4). Similar results have been obtained by overexpressing MYC. MYC-inducible activation leads to G1-phase shortening and to the firing of intragenic oncogene-induced (Oi) origins. Many of these Oi origins overlap with cyclin E-induced origins (Macheret and Halazonetis 2018). Moreover, overexpression of both genes can induce the firing of a novel set of replication origins within the 3′ gene body of highly transcribed genes that are usually suppressed by transcription during the G1 phase. The precocious entry into S phase, before all genic regions have been transcribed, allows the firing of origins within genes in cells with a short G1 phase (Macheret and Halazonetis 2018). Therefore, DNA replication stress resulted from extra intragenic origin firing caused by premature S phase entry is an important mechanism that leads to genomic instability in human cells.

Fig. 4figure4

G1 shortening induces abnormal origin firing within active genes leading to genome instability. In normal cell cycles, the length of G1 is sufficient for transcription to inactivate origins across the entire length of genes (Top panel). When the length of G1 is greatly reduced due to oncogene expression (Bottom panel), there is insufficient time for transcription to inactivate all intragenic origins. This effect allows for the activation of oncogene-induced extra-origins, located within intragenic regions, and leads to chromosome breakage. G1 shortening, e.g., induced by cyclin E or Myc, leads to abnormal replication and genome instability, which might contribute to early cancer development

Full size image

Interestingly, under replication stress, i.e., under high dose HU treatment, cells can accumulate replication fork stalling and collapse within specific early-replicating regions known as early-replicating fragile sites (ERFSs) (Barlow et al. 2013). These sites are also enriched around replication origins containing long (> 20 bp) Poly(dA:dT) tracts (Tubbs et al. 2018). Whether similar or different mechanisms generate ERFSs is still unknown and thus warrants further investigation.

Transcription-mediated suppression of initiation within large genes lead to CFS instability

Transcription–replication collisions and R-loop formation are not the only ways in which transcription can interfere with DNA replication. Common fragile sites (CFSs) are an example of this. These sites are under-replicated during mild replication stress, for example, in response to aphidicolin, a DNA-polymerase inhibitor that slows the progression of replication forks (Glover et al. 1984). CFSs can be visualized on metaphase spreads as ultrafine bridges between chromatids, gaps or breaks (Chan et al. 2009; Glover et al. 1984) that are hotspots for chromatid exchange (Glover and Stein 1987), chromosome deletions (Bignell et al. 2010; Pichiorri et al. 2008) and amplifications (Hellman et al. 2002; Miller et al. 2006). These regions are preferential sites for chromosome lesions (such as deletion and/or rearrangement) involved in oncogenesis, neurological disorders and viral DNA integration (see Le Tallec et al. 2014; Ozeri-Galai et al. 2014; Sarni and Kerem 2016; Debatisse and Rosselli 2019 for review). The study of CFSs is challenging due to the lack of precise genomic mapping. Traditionally, they have been mapped by conventional cytogenetic screening at a megabase scale. In lymphocytes, the number of CFSs ranges from ~ 20 (with break frequency ≥ 1%) to 230 (including CFSs with lower frequency) (Mrasek et al. 2010). Only a few of them have been mapped on a fine scale (several hundred kb) by molecular cytogenetic analysis combined with fluorescence in situ hybridization (FISH) (Savelyeva and Brueckner 2014), which is very time-consuming. Therefore, most collected data derive from isolated CFSs, which has resulted in some controversial results. In a recent study, CFSs were mapped genome-wide at a high resolution by Repli-Seq technique. The authors compared the RT of cells exposed to a low dose of aphidicolin to the RT of control cells to define the significant delayed regions (SDRs), corresponding to CFSs (Brison et al. 2019). This first genome-wide analysis has shed light on the characteristics and mechanisms responsible for CFS instability, demonstrating that stress-induced delay/under-replication is a hallmark of CFSs (Brison et al. 2019).

CFSs were long believed to be associated with particular sequences, such as stretches of AT-rich sequences that can form a secondary structure that blocks replication fork progression, impedes replication completion and leads to DNA breaks. However, recent studies have shown that CFS instability is cell-type specific, which indicates that it is directed by epigenetic features rather than by specific sequence motifs (Le Tallec et al. 2011). It has indeed been shown that such sequences at FRA3B (a well-studied CFS on chr3) do not overlap with its break boundaries (Durkin et al. 2008). CFSs are mid-late and late-replicating regions, but this is not enough to mark them (Le Beau et al. 1998; Palakodeti et al. 2004; Pelliccia et al. 2008; Hellman et al. 2000; Brison et al. 2019) as there are many more late-replicating regions than CFSs. Interestingly, most fine-mapped CFS cores are replicated in mid-late S phase (instead of late) in non-treated cells, and they become the latest replicating regions only after aphidicolin treatment (Brison et al. 2019). This finding suggests that other mechanisms rather than late-replication per se are responsible for their instability. Remarkably, CFSs are frequently associated with very long expressed genes (> 300 kb) or large transcription domains (sometimes with two or three overlapping genes), although even this is not always the case (Mitsui et al. 2010; Ohta et al. 1996; Rozier et al. 2004; Zhu et al. 2006; Helmrich et al. 2007; Denison et al. 2003; Bednarek et al. 2000; Brison et al. 2019). It has been suggested that CFSs might be caused by R-loop formation resulting from TRC (Helmrich et al. 2011). However, TRC seems unlikely as the delay of replication decreases gradually, in most cases, around both sides of CFS cores in a symmetrical way that is independent of gene orientation but instead reflects the firing time of the flanking origins (Brison et al. 2019). In addition, R-loops and fork stalling positions seem to only accumulate within highly active genes located at early-replicating regions, but not at large late-replicating genes associated with CFSs showing a modest transcription level (Liu and Chen, unpublished results). More importantly, gene transcription–replication encounters are not necessary for CFS expression, as treatments with transcription inhibitors during S phase do not rescue CFS fragility (Brison et al. 2019). Taken together, these results indicate that mechanisms other than transcription–replication encounters are responsible for the strong correlation between large genes and CFSs.

Importantly, on FRA3B (Letessier et al. 2011) and FRA16C (a CFS on chr16) (Ozeri-Galai et al. 2011), there is no (or few) activation of dormant origins to rescue stalled or slowed replication forks. This lack of activation might actually be due to the removal of replication origins by transcription (Gros et al. 2015; Powell et al. 2015). Indeed, the occupancy of components of the pre-RC is low over large genes (> 300 kb) associated with CFSs (Miotto et al. 2016; Sugimoto et al. 2018). The genome-wide analyses of replication origin distribution obtained by OK-Seq or Bubble-Seq along fine-mapped CFSs also support a model by which transcription-dependent suppression of initiation across large genes generates ultra-long (several hundreds of kb) late-replicating origin-poor regions, which delays their replication upon stress (Brison et al. 2019) (Fig. 5a). Moreover, OK-Seq data have further revealed that, in most cases, two major initiation zones flank the large transcribed genes hosting CFSs, located immediately upstream or downstream of the gene, respectively. The unidirectional forks emanating from these initiation zones travel across several hundreds of kb to complete replication of the gene body (Brison et al. 2019) (Fig. 5b). Replication could not be completed when the fork speed was reduced by aphidicolin treatment. The distance separating the initiation zones flanking the genes is therefore a major parameter for CFS setting.

Fig. 5figure5

Transcription-dependent suppression of initiation across large genes lead to CFS instability. a Schematic showing how gene transcription shapes the replication landscape responsible for common fragile site (CFS) instability. CFSs are genomic regions that are replicated during mid-late S phase. They are nested within large genes (> 300 Kb) whose transcription leads to the removal of pre-RC complexes from the gene body, leaving it replicated by two long-travelling unidirectional replication forks arising from its flanking regions. Under replication stress, DNA replication might not be completed within these regions. This results in a cruciform structure that must be resolved, otherwise it will lead to the expression of CFSs and genome instability. b The replication fork directionality (RFD) profile detected by Okazaki fragment sequencing (OK-Seq) along FRA16D CFS containing the large gene, WWOX (1.1 Mb). Each point shows the RFD values computed in 1 kb windows. The red and blue points indicate the regions that are predominantly replicated by rightward and leftward replication forks, respectively. The RFD profile agrees with the model shown in (a), with two strong initiation zones (identified as upward transitions on the RFD profile, indicated by the blue box) located at both extremities of the WWOX gene, and the gene body is replicated by long-travelling unidirectional replication forks (red and blue arrows, respectively). The under-replicated CFS core overlaps with the termination zone (downward transition on the RFD profile, indicated by a red box) at the gene center. A similar RFD pattern is observed in most CFSs (Brison et al. 2019)

Full size image

Independently from its molecular causes, at the end of S phase, cells containing under-replicated regions link together the two sister chromatids (Fig. 5a). At this point, the resolution of these structures could be due to a series of endonucleases (Guervilly et al. 2015; Naim et al. 2013; Ying et al. 2013) that could be recruited to disassemble the replication forks (Deng et al. 2019), and can create single and/or double-strand breaks that give the cells their last chance to repair the damage during the early stages of mitosis. Importantly, several recent studies have discovered that an E3 ubiquitin-protein ligase, TRAIP (TRAF interacting protein), makes an important contribution to driving replisome disassembly during mitosis and promoting fork breakage (Sonneville et al. 2019; Wu et al. 2019; Deng et al. 2019). This event might allow factors involved in mitotic DNA synthesis (MiDAS) (Minocherhomji et al. 2015), a form of break-induced replication (BIR), to have access to the under-replicated CFSs (see Ovejero et al. 2020 for a review). The CFS is expressed if the broken DNA is not properly repaired.

Transcription-mediated replication stresses and human diseases

Defects in DNA replication processes can lead to various diseases. In the following sections, we will focus on some of the most common diseases.

Neurological disorders

R-loops can occur from a variety of cellular stresses, and lead to deleterious complications such as transcriptional irregularities, replication defects and genomic instability, relating to numerous pathologic conditions (reviewed in Richard and Manley 2017). Among them, various neurological disorders, have been linked to R-loops and gene-specific repeat expansions (Table 1).


DiseaseDescriptionGenome instability involvementReferences
Aicardi-Goutières syndrome (AGS)Inflammatory encephalopathy caused by mutations in TREX1, encoding an exonuclease, and in RNase H 2A/2B/2C genes encoding subunits of the RNase H2 endonuclease complexDysfunctional TREX or RNase H activity leads to the accumulation of DNA:RNA hybrids(Lim et al. 2015)
Apraxia oculomotor ataxia 1 (AOA1)Rare autosomal recessive condition caused by mutations in the APTX gene (9p13.3) encoding aprataxinAPTX, which has a role in DNA single-strand break repair, is involved in repairing the non-canonical ribonucleotides introduced into DNA during replication(Tumbale et al. 2014)
Apraxia oculomotor ataxia 2 (AOA2)Rare disease caused by mutations in the SETX gene (9q34), encoding the senataxin protein. Mutations in PIK3R5 (17p13.1) have also been implicated in the pathogenesis of AOA2Senataxin is a DNA/RNA helicase implicated in DNA break repair and in resolving DNA:RNA hybrids (R-loops). Data suggest that both DNA repair and RNA splicing are key factors in AOA2 disease(Moreira et al. 2004)
Attention deficit hyperactivity disorder (ADHD)Clinically heterogeneous neurodevelopmental syndrome presenting the triad of inattention, hyperactivity and increased impulsivityNew risk genes identified in adult ADHD encoding for cell adhesion molecules CDH13, ASTN2 and regulators of synaptic plasticity CTNNA2, KALRN that contain RDCs in NSPC cells(Lesch et al. 2008)
Autism spectrum disorder (ASD)Highly heritable disorder with altered cognitive ability and abnormalities in language, social cognition and mental flexibilityExonic copy-number variants and point mutations have been identified in the NRXN1 and NRXN2 genes encoding proteins with a role in synaptic cell adhesion and neurotransmitter secretion. Deletions in the NRXNR3 and variants of Cadm1/2 genes are also associated with ASD
Also, loss of function mutations of CTNND2 and NRXN1 gene deletions are identified as RDC-containing genes involved in autism etiology
(Turner et al. 2015; Hu-Lince et al. 2005; Vaags et al. 2012; Casey et al. 2012)
Autosomal recessive juvenile parkinsonism (ARJP)Hereditary neurodegenerative disorderThe PARK2 gene, mapping within the large FRA6E CFS region, is associated with ARJP. Large heterozygous deletions have been observed in PARK2 in ARJP patients(Denison et al. 2003)
Bipolar disorderSevere mental disorder comprising depressive and manic or hypomanic episodesRDCs containing NRXN1 gene deletions and variants of BAI3, LSAMP, NPAS3 genes are associated with bipolar disorder(Ferreira et al. 2008; Nurnberger et al. 2014)
Cerebellar ataxiaClinically and genetically heterogeneous group of inherited neurodegenerative disorders that affect the development of the cerebellum and spinal cordMissense mutations in the RDC-associated WWOX gene were found in autosomal recessive cerebellar ataxia(Mallaret et al. 2014)
Fragile X syndrome (FXS)Genetic disease comprising the expansion of (CGG)n trinucleotide repeats in the 5′ UTR (Xq27.3) of fragile x mental retardation gene 1 (FMR1)R-loop formation induced by transcription through the GC-rich FMR1 5′ UTR region leading to DNA methylation-mediated silencing of the FMR1 locus(Chakraborty et al. 2019; Loomis et al. 2014; Groh et al. 2014; Colak et al. 2014)
Friedreich ataxia (FRDA)Neurodegenerative condition comprising GAA repeat expansions in intron 1 of the gene (9q21.11) encoding frataxin (FNX)R-loop formation at expanded repeats of the FXN gene, correlates with repressive chromatin marks and hinders FXN transcription in patient cells. The R-loop increase leads to transcriptional silencing of the FXN gene and formation of repressive chromatin(Reddy et al. 2011; Groh et al. 2014)
Huntington’s disease (HD)Progressive brain disorder characterized by an elongated CAG repeat in the gene (4p16.3) encoding Huntingtin (HTT)R-loop formation at expanded repeats. R-loop induction is transcription-dependent in CG-rich repeat tracts in E. coli and human cells with reduced RNase H activity(Reddy et al. 2011)
Juvenile amyotrophic lateral sclerosis (ALS4)Mutations in the ALS2 gene (2q33-q35) encoding the protein alsin, abundant in motor neurons. Less common mutations in the ERLIN2 gene (8p11.2) and SEXT (9q34) have been reportedSenataxin is a DNA/RNA helicase implicated in DNA break repair and in resolving DNA:RNA hybrids (R-loops). Missense mutations in a single allele in the SETX gene segregate with ALS4 disease(Lavin et al. 2013)
Microcephaly syndromeRare genetic syndrome characterized by microcephaly and an intellectual deficitNonsense loss-of-function mutations on the RDC-associated WWOX gene have been identified in microcephaly syndrome(Abdel-Salam et al. 2014)
Schimke immuno-osseous dysplasia (SIOD)Multisystem disorder caused by mutations in the SMARCAL1 gene (2q35) encoding the chromatin remodeling protein hHARPThe SMARCAL1 enzyme works as an annealing helicase and as a response protein to DNA stress, having a role in maintaining genomic integrity at stalled replication forks(Baradaran-Heravi et al. 2012)
SchizophreniaSevere life-long mental disorder characterized by interactions among genetic and environmental componentsRDC-associated NRXN1 gene deletions and variant forms of BAI3, CDH13, CSMD1, LSAMP and NPAS3 genes are associated with schizophrenia(Børglum et al. 2014; Donohoe et al. 2013)
Spinocerebellar ataxias type 1 (SCA1)Neurodegenerative condition comprising CAG repeat expansions in the gene region (6p23) comprising ataxin 1 (ATXN1)Transcription-dependent induction of R-loops at (CTG)(CAG) repeats(Reddy et al. 2011)
Spinocerebellar ataxias type 2 (SCA2)Neurodegenerative condition characterized by CAG repeat expansions in the gene region (12q23-q24.1) comprising ataxin 2 (ATXN2)Transcription-dependent induction of R-loops at (CTG)(CAG) repeats(Reddy et al. 2011)


Table 1 Overview of the neurological disorders associated with transcription-mediated replication stress


Full size table

Trinucleotide repeat expansions within intergenic regions provide additional risk for harmful R-loop formation that disrupts proper transcription and normal gene expression. For example, diseases like Huntingtin (HTT; Huntington’s disease), ataxin 1/2 (ATXN1/ATXN2; spinocerebellar ataxias) and frataxin (FXN; Friedreich ataxia), all contain GC-rich or GAA trinucleotide expansions that form R-loops in vitro and associate with disease (Reddy et al. 2011; Loomis et al. 2014). The mechanism of fragile X syndrome (FXS) is also related to the trinucleotide expansion in the 5′ UTR (Untranslated Transcribed Region) of the FMR1 gene, which leads to DNA methylation-mediated silencing of this locus (Groh et al. 2014; Colak et al. 2014). It favors the transcription-dependent R-loops, which are resistant to degradation and co-localize with repressive H3K9me2 chromatin mark. By performing a nascent nuclear run-on analysis, Groh and colleagues showed that in FXS patient cells, R-loop over-expanded repeats can block RNA polymerase II transcription of the FXN gene. In affected patients, the FMR1 allele with a (CGG)n>200 expansion in the 5′ UTR is completely methylated and transcriptionally silenced (Santoro et al. 2012; Groh et al. 2014). To test the role of such R-loop formation in trinucleotide expansion diseases, FMR1 transcription has been reactivated by using the DNA methylation inhibitor 5-aza-29-deoxycytidine (5-azadC) (Groh et al. 2014). A fourfold increase in R-loops has been observed over the exon 1 region upstream of the expansion in FXS cells, while in control cells, changes are not significant. This specificity of R-loop formation has been confirmed by RNase H treatment. These findings suggest that transcription-dependent R-loops are localized to the expanded (CGG) repeat region to regulate the expression of the FMR1 gene. Meanwhile, increasing R-loop formation leads to transcriptional repression of the FXN gene, suggesting a direct molecular association between R-loop formation and the pathology of Friedreich ataxia (FRDA) (Groh et al. 2014). The formation of R-loops over expanded repeats might, therefore, favor FXN and FMR1 silencing, and might represent a common feature of nucleotide expansion-associated diseases, contributing to the corresponding pathology in vivo (Groh et al. 2014). Interestingly, FXS cells exhibit high levels of chromosome breaks, in particular, under replication stress (Chakraborty et al. 2019). More importantly, the FMRP, the protein product of FMR1, is required for abating R-loop accumulation, thereby preventing chromosome breakage (Chakraborty et al. 2019). These data provide a detailed mechanism on the direct link between R-loop formation, replication stress and genome instability in FXS.

Active pathways that have a role in avoiding transcription–replication collisions and R-loop accumulation could be altered, leading to DNA damage and human diseases including neurological disorders (reviewed in Zeman and Cimprich 2014). For example, dysfunctional TREX1 or RNase H is responsible for Aicardi–Goutières syndrome that is characterized by severe neurological dysfunction and a congenital infection-like phenotype (Lim et al. 2015). Mutations in aprataxin (APTX), a protein present in the same pathway as RNase H, induce the neurological disorder apraxia oculomotor ataxia 1 (AOA1), characterized by cerebellar degeneration (Tumbale et al. 2014). Neurodegenerative disorders have also been associated with the loss of DNA helicase that has a clear role in the replication stress response. Of note, loss of SMARCAL1, which functions at the interface of replication and transcription (Baradaran-Heravi et al. 2012), leads to Schimke immuno-osseous dysplasia (SIOD), a multisystem disorder characterized by notable neurologic manifestations. Another example is the loss of the helicase SETX, which is involved in avoiding the formation of aberrant DNA:RNA hybrids. SETX has been associated with juvenile amyotrophic lateral sclerosis (ALS4) and ataxia–ocular apraxia (Moreira et al. 2004; Lavin et al. 2013). It should be noted that mature neurons are non-cycling cells; therefore, R-loops would either act on neurons in a replication-independent manner, or on neuron precursors link to DNA replication process. The extent by which R-loops contribute to these diseases via a replication-dependent and/or independent mechanism needs to be further investigated.

DSB repair through canonical NHEJ is important for the development of primary neural stem/progenitor cells (NSPCs) (Gao et al. 1998). Previous studies have demonstrated the presence of recurrent endogenous DSBs using genome-wide translocation sequencing (HTGTS) (Chiarle et al. 2011; Frock et al. 2015), which is a sensitive DNA break joining assay using “bait” DNA breaks introduced on different chromosomes to reveal endogenous “prey” DNA breaks. Recurrent DSB clusters (RDCs) have been mapped in NSPCs in response to replication stress induction (Wei et al. 20162018). The NSPC-RDCs are enriched in the gene bodies of large (> 100 kb), late-replicating genes. Considering that these characteristics (i.e., large active genes at late-replicating regions) are often associated with CFSs, and most RDCs only present after aphidicolin treatment to induce a mild replication stress (Wei et al. 2016), a common mechanism (i.e., transcription-dependent suppression of initiation across large genes) might underlie these events.

Other studies have suggested that TRC might also function in RDC formation (reviewed in Bouwman and Crosetto 2018). Importantly, several neurodevelopmental and neuropsychiatric disorders have been linked to NSPC RDC-containing genes and the activity of neural cell adhesion and/or regulation of synapse formation. For example, molecules involved in cell–cell adhesion and neural development and growth—including the cadherin-associated proteins Ctnna2 and Ctnnd2, Cdh13 Cadherin, Cadm2, the membrane proteins Csmd1 and Csmd3, the glycoprotein Lsamp, cell adhesion molecules Mdga2, Ntm, Sdk1, Npas3, members of the neurexin family Nrxn 1/3, and the excitatory neurotransmitter receptor Grik2—are associated with numerous diseases, including attention deficit hyperactivity disorder (ADHD) (Lesch et al. 2008), intellectual disabilities (Belcaro et al. 2015; Motazacker et al. 2007), schizophrenia (Børglum et al. 2014; Donohoe et al. 2013), bipolar disorder (Ferreira et al. 2008; Nurnberger et al. 2014; Noor et al. 2014) and autism spectrum disorder (ASD) (Turner et al. 2015; Hu-Lince et al. 2005; Vaags et al. 2012; Casey et al. 2012). Interestingly, mutations linked to cerebellar ataxia and microcephaly syndrome have been found in the WW domain-containing oxidoreductase (WWOX) gene, within FRAD16, a well-studied CFS (Abdel-Salam et al. 2014; Mallaret et al. 2014). Likewise, the PARKIN (PARK2) gene, located within another CFS locus, FRA6E, is involved (via germline mutation) in Parkinson’s disease pathogenesis (Denison et al. 2003). Thus, the formation of RDCs and the CFS loci are highly associated with the gene fragility that underlies the most frequent neuronal disorders.

Cancer

The conflicts between replication and transcription are related to oncogene-induced replication stress and consequently to genomic instability, which is a hallmark of cancer (Gaillard et al. 2015; Kotsantis et al. 2016; Jones et al. 2013) (Table 2). For example, increased transcriptional activity induced by H-RAS overexpression causes replication stress, which depends on R-loop accumulation (Kotsantis et al. 2016). Using estrogen receptor-positive (ER +) breast cancer cells, Stork and colleagues showed that treating human breast cancer cells with estrogen (E2) promotes E2-activated transcription and an increase in DSBs together with R-loop formation, which colocalize particularly in regions of the genome containing estrogen-activated genes (Stork et al. 2016). In addition, replication stress induced by oncogene activation during tumorigenesis is associated with increased replication initiation within intragenic regions, leading to conflicts between replication, transcription and genomic instability (Jones et al. 2013). As described earlier, cyclin E and its subunit CDK2 form the cyclin E/CDK2 complex, the activity of which can be regulated at multiple levels and seems to be involved in triggering DNA replication initiation and in regulating genes important for proliferation and progression through the S phase (Ekholm-Reed et al. 2004). When deregulated, cyclin E is involved in tumorigenesis, and is overexpressed in many cancer types (Cooley et al. 2010; Fukuse et al. 2000; Niu et al. 2015). Importantly, somatic cells can tolerate the replication stress induced by oncogenes such as cyclin E, for several cell cycles before going through chromosomal breakage (Neelsen et al. 2013) that could constitute an initiating event in cancer. Together with cyclin E, cyclin A2 (encoded by CCNE1 and CCNA2 genes respectively) shows alterations that have been identified in a subgroup of hepatocellular carcinoma (HCC), named CCN-HCC: here, rearrangements of CCNE1 promoter regions and recurrent fusions involving CCNA2 have been identified. CCN-HCC is characterized by the accumulation of hundreds of tandem duplications and templated insertion cycles (Bayard et al. 2018). Under cyclin E overexpression, BIR, which is involved in DSB and damaged replication fork repair, is required for cell cycle progression (Costantino et al. 2014). Because chromosome rearrangements often occur during BIR upon oncogene activation (Smith et al. 2007), the rearrangements found in CCN-HCC together with the enrichment of breakpoints in early-replicated and actively transcribed regions might be associated with BIR mechanisms caused by replication stress.


Tumor typeFactor involvedGenome instability involvementReferences
Breast (BC)/ovarian cancer (OC)Breast-related cancer antigen 1/2 (BRCA1/2)Increase of DSBs due to a lack of BRCA1/2 protective effects against replication fork collapse. Overexpression of RNase H1 occurs when BRAC1/2 are mutated in BC and OC cell lines
Also, ~ 10 kb tandem duplications frequently observed in BRCA1 mutant breast and ovarian cancers arise by a replication restart-bypass mechanism
(Zhang et al. 2017; Willis et al. 2017)
EstrogenEstrogen treatment induces an increase in R-loops and DNA breaks in ER + BC cells. R-loops have been identified in the regions of the genes activated by estrogen that are frequently mutated in BC. DSB formation in the vicinity of the R-loops has been reported(Stork et al. 2016)
Cadherin 13 (CDH13)Intra-genic deletions of the RDC-associated CDH13 gene are associated with BC. A combination of hyper-methylation and deletion induce loss of CDH13 activity in OC(Kadota et al. 2010; Kawakami et al. 1999)
Burkitt's lymphoma (BL)/multiple myeloma (MM)Tudor domain-containing protein 3 (TDRD3) and topoisomerase IIIB (TOP3B)TDRD3 tightly complexed with TOP3B is recruited to the c-MYC CpG island promoter to reduce R-loop formation and suppress chromosomal translocations. Oncogenic c-Myc/Igh translocations have been reported in BL. Chromosomal abnormalities involving the c-myc locus have been identified in MM cell lines(Küppers and Dalla-Favera 2001; Shou et al. 2000; Yang et al. 2014)
Ewing's sarcoma (EWS)Breast-related cancer antigen 1 (BRCA1)Damage-induced transcription, accumulation of R-loops with a depletion of functional BRCA1 associated with transcriptional stress and consequently DNA damage. Significant enrichment of a BRCA1-mutated gene set in EWS cells has been identified(Gorthi et al. 2018)
Glioblastoma (GBM)/astrocytomaNeuronal PAS domain protein 3 (NPAS3)Loss of function/loss of heterozygosity mutations in RDC-associated NPAS3 have been identified in highly proliferative GBM and astrocytoma(Moreira et al. 2011)
GliomaBrain-specific angiogenesis inhibitor 3 (BAI3)The expression of the RDC-associated BAI3 gene is decreased in high-grade glioma(Kee et al. 2004)
Hepatocellular carcinoma (HCC)Cyclin A2 (CCNA2) and E1 (CCNE1)Recurrent fusions involving CCNA2 and recurrent rearrangements of the CCNE1 promoter region are associated with HCC induced by replication stress(Bayard et al. 2018)
Lung cancer (LC)Cadherin 13 (CDH13)Deletions with loss of expression of the RDC-associated CDH13 gene combined with a partial hypermethylation are associated with LC(Sato et al. 1998)
Osteosarcoma (OSC)Limbic system-associated membrane protein (LSAMP)Loss of activity of the RDC-associated L-SAMP gene is associated with OSC(Kresse et al. 2009)
Prostate cancer (PC)Cell adhesion molecule 2 (CADM2)Rearrangements that disrupt the RDC-associated CADM2 gene have been associated with PC(Berger et al. 2011)
CUB and Sushi Multiple Domains-3 (CSMD3)Rearrangements in the RDC-associated CSMD3 gene have been associated with PC(Berger et al. 2011)
Diacylglycerol kinase, beta (DGKB)Inter-chromosomal complex rearrangements fusing RDC-associated DGKB with MIPOL1 genes in the PC cell line(Maher et al. 2009)
Stomach adenocarcinoma (STAD)CUB and Sushi multiple domains-1 and -3 (CSMD1/3)RDC-associated CSMD-1 and -3 genes are present in the 10 most frequently mutated genes in STAD(Wang et al. 2020)
Common defects in numerous cancer typesFragile histidine triad protein (FHIT), WW domain-containing oxidoreductase (WWOX), Parkin (PARK2)Deletions occurring in the FHIT, WWOX and PARK2 tumor suppressor genes located within the unstable FRA3B, FRA16D and FRA6E CFS regions, respectively, have been detected in precancerous lesions of various tumors. Genome instability at CFSs has been linked to the formation of chromosome rearrangements in cancers(Glover et al. 2017; Pandis et al. 1997; Kameoka et al. 2004; Bednarek et al. 2000; Ludes-Meyers et al. 2003; Krummel et al. 2000; Paige et al. 2000; Denison et al. 2003; Letessier et al. 2007; Iwakawa et al. 2012)


Table 2 Overview of tumors associated with transcription-mediated replication stress

Full size table

The contribution of loss of BRCA1 and BRCA2 function on cancer development has been well established, particularly in breast and ovarian cancers. Tandem duplications (~ 10 kilobase length) frequently observed in BRCA1 mutant breast and ovarian cancers generated by a replication restart-bypass mechanism, which is completed by end joining or by microhomology-mediated template switching (Willis et al. 2017). This finding supports that BRCA1 and BRCA2 have an important role in protecting the replication forks (Xu et al. 2017b; Schlacher et al. 2011). When lacking the protective effects that these genes confer against replication fork collapse, cells show an increase in DSBs. These cancer cells lacking BRCA1/2 are therefore more sensitive to PARP (poly ADP ribose polymerase) inhibitors such as olaparib, rucaparib, niraparib or talazoparib, which can block another alternative repair pathway used by cells (reviewed in Ubhi and Brown 2019). PARP inhibitors are now used frequently as a targeted therapy for cancers with defective BRACA1/2 or other critical HR components, such as Rad51. Interestingly, the cancer-associated genotoxic stress that arises from mutations in BRCA1/2 can be partially rescued by overexpressing RNase H1 in cancer cell lines, suggesting that aberrant R-loop formation also contributes to malignancy (Hill et al. 2014; Hatchi et al. 2015; Zhang et al. 2017).

In addition, Ewing’s sarcoma has been linked to damage-induced transcription, an accumulation of R-loops related to transcriptional stress, and subsequent depletion of functional BRCA1, all of which ultimately results in DNA damage (Gorthi et al. 2018). Moreover, R-loops might have a role in the oncogenic c-MYC-Igh translocation commonly seen in Burkitt's lymphoma and multiple myeloma. Here, the Tudor domain-containing protein 3 (TDRD3) forms a complex with TOP3B, is recruited to the c-MYC CpG island promoter to avoid R-loop accumulation and suppresses chromosomal translocations (Küppers and Dalla-Favera 2001; Shou et al. 2000; Yang et al. 2014). Finally, cancer-derived somatic SLX4 mutations and HHS-associated germline RTEL1 mutations, abrogating the SLX4–RTEL1 interaction, affect the recruitment of FANCD2 at RNA Pol II to resolve R-loops from transcription-induced replication stress and contribute to cancer development (Takedachi et al. 2020).

As described previously, large genes expressed in NSPCs are prone to DSBs and translocations. Genes identified within RDCs are also frequently altered in different tumors (Wei et al. 2016). For example, LSAMP is contained in a small region that is frequently deleted and it has been assigned a tumor suppressor role (Kresse et al. 2009). CDH13 cadherin is involved in cell–cell adhesion activity and neural growth, and is deleted in different tumor types (Kawakami et al. 1999; Kadota et al. 2010; Sato et al. 1998). The NRXN3 synaptic cell surface protein is altered in the medulloblastoma. In prostate cancer, CADM2 and CSMD3 are rearranged and DGKB is involved in inter-chromosomal gene fusions (Berger et al. 2011; Maher et al. 2009). Moreover, a recent report found that CSMD3 and CSMD1 are included in a group of genes identified as the most frequently mutated in stomach adenocarcinoma (Wang et al. 2020). NPAS3, which helps to regulate genes that are involved in neurogenesis, is deleted in high-grade astrocytoma and glioblastoma (Moreira et al. 2011). Finally, the cell adhesion molecule BAI3 has been implicated in glioma progression (Kee et al. 2004).

Deletions in CFSs are considered as one of the major common genetic variations observed during tumor development. The first large gene discovered to be spanned by a highly unstable CFS region was fragile histidine triad (FHIT) that is located within FRA3B. FHIT alterations, such as deletions or loss of expression, have been observed in various tumors, including breast and B-cell lymphoma (Pandis et al. 1997; Kameoka et al. 2004). Another example gene spanned by the CFS region is WWOX, which is located within the second most active common fragile site FRA16D (Bednarek et al. 2000; Ludes-Meyers et al. 2003) and is frequently deleted in several tumors (Krummel et al. 2000; Paige et al. 2000). The third most frequent CFS locus is FRA6E, which contains the E3 ubiquitin gene PARK2: here, its inactivation can accelerate cell-cycle progression and induce cyclin D1 accumulation (reviewed in Glover et al. 2017). Like FHIT and WWOX, PARK2 is a tumor suppressor. Deletion of PARK2 has been described in various cancers and causes a loss of its activity (Letessier et al. 2007; Iwakawa et al. 2012; Denison et al. 2003). Loss of PARK2 activity can induce chromosome instability related to tumor formation. This effect might be due to an alteration of several mitosis regulators, such as Plk1, Aurora A/B, Cyclin B1, Cdc20, and UbcH10, which are normally controlled by PARK2. These alterations can lead to mitotic defects, such as prometaphase-like arrest, anaphase and cytokinesis failure. Given that loss of PARK2 induces multiple chromosomal defects, it seems that PARK2 has an important role in maintaining genomic stability (Lee et al. 2015).

Several CFS-associated genes have protective roles by promoting the DDR, which is a critical mechanism to maintain genome stability. Indeed, the inactivation of several tumor suppressors located within CFSs induces DDR de-regulation. In particular, the tumor suppressor FHIT as well as WWOX has a role in the DDR in regulating apoptosis, which is achieved through interactions with the pro-apoptotic p53 family of transcription factors. Thus, loss of function of these tumor suppressors, together with other gene mutations, such as in p53, have an important role in enhancing the uncontrolled proliferation that promotes genome instability (reviewed in Hazan et al. 2016). Many other genes, such as CTNNA1/3, DLG2, DMD, GRID2, IL1RAPL1, LRP1B, NBEA and RORA, which span CFS regions, are well described and linked to different tumor types (reviewed in Gao and Smith 2015). The high number of large genes contained in the CFS regions can be explained by the transcription-mediated suppression of replication initiation within these large genes (Brison et al. 2019) (see previous section for detail), creating large regions without replication initiations, and leading to genome instability under replication stress, as frequently observed in cancer.

Other pathological conditions

Transcription-mediated replication stress is also involved in a number of other pathological conditions, such as immunodeficiencies, infertility, Prader–Willi and facial anomalies syndromes (Table 3). In particular, genome instability induced by the co-transcriptional R-loop formation has been linked to FA, a genetic disease characterized by bone marrow failure and a strong predisposition to cancer. FA occurs following germline mutations that can occur in up to 22 FA genes, including BRCA1/2 (Yamamoto et al. 2005; van Twest et al. 2017; Nepal et al. 2017). Of note, FANCD2, a core FA gene, accumulates at transcribed genes and has a role in resolving R-loop and transcription–replication conflicts by recruiting RNA processing factors (Schwab et al. 2015; García-Rubio et al. 2015). Particularly, mono-ubiquitination of the FANCI–FANCD2 (ID2) heterodimer complex is due to FANCL ubiquitin E3 ligase activity occurring during S phase and under conditions of replication stress (van Twest et al. 2017; Rajendra et al. 2014). Several reports have shown the presence of increased R-loops in FA mutant cells (Schwab et al. 2015; García-Rubio et al. 2015; Liang et al. 2019), demonstrating that FANCD2 mono-ubiquitination is required to prevent their accumulation and colocalization with R-loops in an actively transcribed genomic region. Although BRCA1 and BRCA2 also belong to the FA gene family, surprisingly, breast or ovarian cancer rarely, if ever, develop in FA patients. It should be noted that FA is primarily an autosomal recessive genetic disorder, in which two mutated alleles are required to cause the disease, while BRCA1/2 defects linked to breast or ovarian cancer are mostly found in heterozygote carriers. Patients with homozygous BRCA2 depletion (BRCA2−/−) generally die from complications of aplastic anemia well before the age of developing breast or ovarian cancer. In addition, FA patients carrying BRCA1 biallelic mutations have not been identified, suggesting biallelic loss of BRCA1 might be lethal to the embryo (reviewed in D’Andrea 2010). It is not completely clear how the loss of a single DNA-repair pathway can induce bone marrow failure, developmental abnormalities and a predisposition to cancer in FA patients; we anticipate that this point will continue to be a hot topic in the field.


DiseaseDescriptionGenome instability involvementReferences
Fanconi anemia (FA)A genetic disorder with progressive pancytopenia, congenital malformations and a predisposition to develop tumors, characterized by mutations in genes involved in DNA repair and genomic stabilityFANCD2 accumulates in nuclear foci or chromatin in a manner that is dependent on R-loops, and has a role in resolving R-loop/replication fork conflicts by recruiting RNA processing factors(van Twest et al. 2017; Rajendra et al. 2014)
Instability immunodeficiency, centromeric region instability, facial anomalies syndrome (ICF)A genetic syndrome characterized by immunodeficiency and rearrangements in the vicinity of the centromeres of chromosomes 1, 16 and 9. DNA hypomethylation and mutations in the DNA methyltransferase DNMT3B gene have been reportedICF cells with DNMT3B mutations exhibit severe hypomethylation at subtelomeres. In ICF cells, telomere–DNA–RNA hybrid formation enhances telomere shortening, thus highlighting the contribution of epigenetic modifications in telomere-specific length regulation(Sagie et al. 2017)
Prader–Willi syndrome (PWS)A genetic syndrome mainly characterized by severe hypotonia, mental deficiency and hyperphagia with a risk of obesity. Affected patients exhibit a loss of paternal genes in the 15q11-q13 region, due to small deletions of the SNORD116 locusAn accumulation of R-loops in the G-rich repeats of the SNORD116 locus induces nucleosome displacement in a transcription-dependent manner and chromatin decondensation of the paternal allele(Powell et al. 201

Table 3 Other pathological conditions associated with transcription-mediated replication stress


Full size table

Prader–Willi syndrome (PWS) is a genetic disorder that is caused by the loss of paternal gene expression in the 15q11-q13 chromosomal region, due to small deletions of the SNORD116 locus. Interestingly, R-loops form within the G-rich repeats of the SNORD116 locus, inducing nucleosome displacement in a transcription-dependent manner and chromatin decondensation of the paternal allele (Powell et al. 2013). The SNORD116 locus mediates the effects of topotecan, which induces an increase in R-loops and stalling of transcriptional progression. Among the genetic syndromes characterized by immunodeficiency related to R-loop formation, centromeric region instability and facial anomalies syndrome (ICF) have been described. This syndrome is caused by mutations in the DNA methyltransferase 3B (DNMT3B) and sub-telomeric hypomethylation associated with atypically short telomere length. Transcription of telomeric repeat-containing RNA (TERRA) has an important role in regulating telomere length and its replication. Mature TERRA RNA forms DNA:RNA hybrids with the C-rich DNA template: these telomeric hybrids are present in telomerase-positive cancers (Arora et al. 2014). Moreover, in ICF cells, telomere shortening or loss, increases TERRA transcription levels, indicating that telomere hybrids are involved in promoting instability at the telomeric ICF regions. Indeed, Sagie and colleagues demonstrated that telomere hybrids enhance telomere shortening together with other unknown factors that regulate the length of telomeres, suggesting the contribution of epigenetic modifications (e.g., compromised methylation by DNMT3B) in telomere-specific length regulation (Sagie et al. 2017). Understanding the relationship between DNA:RNA hybrids, replication stress and genome instability in these disorders, and how to use such relationships to find additional targeted therapies, need to be further investigated in future studies.

Conclusion and perspectives

In conclusion, studies over the past few years have provided new and important insights into replication stress and genome instability. Increasing evidence supports that gene transcription has an essential role in shaping the landscape of human genome replication, while it is also a major source of endogenous replication stress inducing genome instability and leading to human diseases. Transcription-mediated replication stresses present at both early and late-replicating regions via two major mechanisms: head-on transcription replication conflicts frequently occur at the transcription termination sites of highly expressed genes in the early-replicating regions, while transcription-dependent suppression of initiation across large genes creating large origin-poor regions is responsible for CFS instability in the late-replicating regions. Due to technical limitations, most studies have only used cell lines as their model system. Ongoing development on high-throughput single-molecule (Müller et al. 2019; Klein et al. 2017) and single-cell (Dileep and Gilbert 2018; Takahashi et al. 2019) approaches to study the DNA replication program will provide novel tools to directly address these questions using patient samples. We expect that this advancement will bring new insights into the detailed mechanisms by which transcription-mediated replication stress impacts on genome instability and human diseases. These will help to better select the patients who will likely respond to a given targeted therapy (such as PARP, ATR or TOP1 inhibitors) targeting factors involving in the corresponding processes, and further develop new targeted therapies to better fight against cancers and other human diseases.

Abbreviations

  • ARS:

  • Autonomously replicating sequence

  • BIR:

  • Break-induced replication

  • BRCA1/2:

  • Breast-related cancer antigen 1/2

  • CD:

  • Co‑directional

  • CDC45/6:

  • Cell division control protein 45/6

  • CDK:

  • Cyclin-dependent kinase

  • CDT1:

  • CDC10-dependent transcript 1

  • CFS:

  • Common fragile site

  • DDK:

  • Dbf4-Dependent Kinase;

  • DDR:

  • DNA damage response

  • DSB:

  • DNA double-strand break

  • ER:

  • Estrogen receptor

  • ERFS:

  • Early-replicating fragile site

  • ESC:

  • Embryonic stem cell

  • FA:

  • Fanconi anemia

  • FISH:

  • Fluorescence in situ hybridization

  • FXS:

  • Fragile X syndrome

  • G4:

  • G quadruplex

  • HO:

  • Head-on

  • HR:

  • Homologous recombination

  • MCM:

  • Mini-chromosome maintenance

  • MiDAS:

  • Mitotic DNA synthesis

  • NHEJ:

  • Non-homologous end joining

  • NSPC:

  • Neural stem/progenitor cell

  • ODP:

  • Origin decision point

  • OGRE:

  • Origin G-rich repeated elements

  • OK-seq:

  • Okazaki fragment sequencing

  • ORC:

  • Origin recognition complex

  • PCNA:

  • Proliferating cell nuclear antigen

  • pre-IC:

  • Pre-initiation complex

  • pre-RC:

  • Pre‑replication complex

  • RDC:

  • Recurrent DSB cluster

  • RFC:

  • Replication factor C

  • Rif1:

  • Rap1-interacting factor 1

  • RPA:

  • Replication protein A

  • RT:

  • Replication timing

  • SDR:

  • Significant delayed region

  • SNS:

  • Small nascent strand

  • TAD:

  • Topologically associated domain

  • TDP:

  • Time decision point

  • TRC:

  • Transcription–replication conflict

  • TSS:

  • Transcriptional start site

  • TTS:

  • Transcription termination site

  • UTR:

  • Untranslated transcribed region

References

  1. Abdel-Salam, G., Thoenes, M., Afifi, H. H., Körber, F., Swan, D., & Bolz, H. J. (2014). The supposed tumor suppressor gene WWOX is mutated in an early lethal microcephaly syndrome with epilepsy, growth retardation and retinal degeneration. Orphanet Journal of Rare Diseases, 9, 1–7.

    Google Scholar 

  2. Aguilera, A., & García-Muse, T. (2012). R Loops: From transcription byproducts to threats to genome stability. Molecular Cell, 46, 115–124.

    PubMed CAS Google Scholar 

  3. Ahuja, A. K., Jodkowska, K., Teloni, F., Bizard, A. H., Zellweger, R., Herrador, R., et al. (2016). A short G1 phase imposes constitutive replication stress and fork remodelling in mouse embryonic stem cells. Nature Communications, 7, 1–11.

    Google Scholar 

  4. Aladjem, M. I., Spike, B. T., Rodewald, L. W., Hope, T. J., Klemm, M., Jaenisch, R., et al. (1998). ES cells do not activate p53-dependent stress responses and undergo p53-independent apoptosis in response to DNA damage. Current Biology, 8, 145–155.

    PubMed CAS Google Scholar 

  5. Arora, R., Lee, Y., Wischnewski, H., Brun, C. M., Schwarz, T., & Azzalin, C. M. (2014). RNaseH1 regulates TERRA-telomeric DNA hybrids and telomere maintenance in ALT tumour cells. Nature Communications, 5, 1–11.

    Google Scholar 

  6. Ballabeni, A., Zamponi, R., Moore, J. K., Helin, K., & Kirschner, M. W. (2013). Geminin deploys multiple mechanisms to regulate Cdt1 before cell division thus ensuring the proper execution of DNA replication. Proceedings of the National Academy of Sciences U S A, 110, E2848–E2853.

    CAS Google Scholar 

  7. Baradaran-Heravi, A., Cho, K. S., Tolhuis, B., Sanyal, M., Morozova, O., Morimoto, M., et al. (2012). Penetrance of biallelic SMARCAL1 mutations is associated with environmental and genetic disturbances of gene expression. Human Molecular Genetics, 21, 2572–2587.

    PubMed PubMed Central CAS Google Scholar 

  8. Barlow, J. H., Faryabi, R. B., Callén, E., Wong, N., Malhowski, A., Chen, H. T., et al. (2013). Identification of early replicating fragile sites that contribute to genome instability. Cell, 152, 620–632.

    PubMed PubMed Central CAS Google Scholar 

  9. Bayard, Q., Meunier, L., Peneau, C., Renault, V., Shinde, J., Nault, J. C., et al. (2018). Cyclin A2/E1 activation defines a hepatocellular carcinoma subclass with a rearrangement signature of replication stress. Nature Communications, 9, 5235.

    PubMed PubMed Central CAS Google Scholar 

  10. Beck, D. B., Burton, A., Oda, H., Ziegler-Birling, C., Torres-Padilla, M. E., & Reinberg, D. (2012). The role of PR-Set7 in replication licensing depends on Suv4-20h. Genes & Development, 26, 2580–2589.

    CAS Google Scholar 

  11. Bednarek, A. K., Laflin, K. J., Daniel, R. L., Liao, Q., Hawkins, K. A., & Aldaz, C. M. (2000). WWOX, a novel WW domain-containing protein mapping to human chromosome 16q23.3-24.1, a region frequently affected in breast cancer. Cancer Research, 60, 2140–2145.

    PubMed CAS Google Scholar 

  12. Belcaro, C., Dipresa, S., Morini, G., Pecile, V., Skabar, A., & Fabretto, A. (2015). CTNND2 deletion and intellectual disability. Gene, 565, 146–149.

    PubMed CAS Google Scholar 

  13. Bell, S. P., & Stillman, B. (1992). ATP-dependent recognition of eukaryotic origins of DNA replication by a multiprotein complex. Nature, 357, 128–134.

    PubMed CAS Google Scholar 

  14. Berger, M. F., Lawrence, M. S., Demichelis, F., Drier, Y., Cibulskis, K., Sivachenko, A. Y., et al. (2011). The genomic complexity of primary human prostate cancer. Nature, 470, 214–220.

    PubMed PubMed Central CAS Google Scholar 

  15. Besnard, E., Babled, A., Lapasset, L., Milhavet, O., Parrinello, H., Dantec, C., et al. (2012). Unraveling cell type-specific and reprogrammable human replication origin signatures associated with G-quadruplex consensus motifs. Nature Structural & Molecular Biology, 19, 837–844.

    CAS Google Scholar 

  16. Bianconi, E., Piovesan, A., Facchin, F., Beraudi, A., Casadei, R., Frabetti, F., et al. (2013). An estimation of the number of cells in the human body. Annals of Human Biology, 40, 463–471.

    PubMed Google Scholar 

  17. Bignell, G. R., Greenman, C. D., Davies, H., Butler, A. P., Edkins, S., Andrews, J. M., et al. (2010). Signatures of mutation and selection in the cancer genome. Nature, 463, 893–898.

    PubMed PubMed Central CAS Google Scholar 

  18. Boos, D., Sanchez-Pulido, L., Rappas, M., Pearl, L. H., Oliver, A. W., Ponting, C. P., et al. (2011). Regulation of DNA replication through Sld3-Dpb11 interaction is conserved from yeast to humans. Current Biology, 21, 1152–1157.

    PubMed CAS Google Scholar 

  19. Boos, D., Yekezare, M., & Diffley, J. F. X. (2013). Identification of a heteromeric complex that promotes DNA replication origin firing in human cells. Science, 340, 981–984.

    PubMed CAS Google Scholar 

  20. Børglum, A. D., Demontis, D., Grove, J., Pallesen, J., Hollegaard, M. V., Pedersen, C. B., et al. (2014). Genome-wide study of association and interaction with maternal cytomegalovirus infection suggests new schizophrenia loci. Molecular Psychiatry, 19, 325–333.

    PubMed Google Scholar 

  21. Bouwman, B. A. M., & Crosetto, N. (2018). Endogenous DNA double-strand breaks during DNA transactions: Emerging insights and methods for genome-wide profiling. Genes, 9, 632.

    PubMed Central Google Scholar 

  22. Brison, O., El-Hilali, S., Azar, D., Koundrioukoff, S., Schmidt, M., Nähse, V., et al. (2019). Transcription-mediated organization of the replication initiation program across large genes sets common fragile sites genome-wide. Nature Communications, 10, 5693.

    PubMed PubMed Central CAS Google Scholar 

  23. Cadoret, J.-C., Meisch, F., Hassan-Zadeh, V., Luyten, I., Guillet, C., Duret, L., et al. (2008). Genome-wide studies highlight indirect links between human replication origins and gene regulation. Proceedings of the National Academy of Sciences U S A, 105, 15837–15842.

    CAS Google Scholar 

  24. Casey, J. P., Magalhaes, T., Conroy, J. M., Regan, R., Shah, N., Anney, R., et al. (2012). A novel approach of homozygous haplotype sharing identifies candidate genes in autism spectrum disorder. Human Genetics, 131, 565–579.

    PubMed Google Scholar 

  25. Cayrou, C., Coulombe, P., Vigneron, A., Stanojcic, S., Ganier, O., Peiffer, I., et al. (2011). Genome-scale analysis of metazoan replication origins reveals their organization in specific but flexible sites defined by conserved features. Genome Research, 21, 1438–1449.

    PubMed PubMed Central CAS Google Scholar 

  26. Cayrou, C., Coulombe, P., Puy, A., Rialle, S., Kaplan, N., Segal, E., et al. (2012). New insights into replication origin characteristics in metazoans. Cell Cycle, 11, 658–667.

    PubMed PubMed Central CAS Google Scholar 

  27. Chakraborty, P., & Grosse, F. (2011). Human DHX9 helicase preferentially unwinds RNA-containing displacement loops (R-loops) and G-quadruplexes. DNA Repair (Amst), 10, 654–665.

    CAS Google Scholar 

  28. Chakraborty, A., Jenjaroenpun, P., McCulley, A., Li, J., El, H. S., Haarer, B., et al. (2019). Fragile X Mental Retardation Protein regulates R-loop formation and prevents global chromosome fragility. BioRxivhttps://doi.org/10.1101/601906.

    Article Google Scholar 

  29. Chan, K. L., Palmai-Pallag, T., Ying, S., & Hickson, I. D. (2009). Replication stress induces sister-chromatid bridging at fragile site loci in mitosis. Nature Cell Biology, 11, 753–760.

    PubMed CAS Google Scholar 

  30. Chen, Y.-H., Keegan, S., Kahli, M., Tonzi, P., Fenyö, D., Huang, T. T., et al. (2019). Transcription shapes DNA replication initiation and termination in human cells. Nature Structural & Molecular Biology, 26, 67–77.

    CAS Google Scholar 

  31. Chiarle, R., Zhang, Y., Frock, R. L., Lewis, S. M., Molinie, B., Ho, Y. J., et al. (2011). Genome-wide translocation sequencing reveals mechanisms of chromosome breaks and rearrangements in B cells. Cell, 147, 107–119.

    PubMed PubMed Central CAS Google Scholar 

  32. Cimbora, D. M., Schübeler, D., Reik, A., Hamilton, J., Francastel, C., Epner, E. M., et al. (2000). Long-distance control of origin choice and replication timing in the human beta-globin locus are independent of the locus control region. Molecular and Cellular Biology, 20, 5581–5591.

    PubMed PubMed Central CAS Google Scholar 

  33. Cohen, S., Puget, N., Lin, Y.-L., Clouaire, T., Aguirrebengoa, M., Rocher, V., et al. (2018). Senataxin resolves RNA:DNA hybrids forming at DNA double-strand breaks to prevent translocations. Nature Communications, 9, 533.

    PubMed PubMed Central Google Scholar 

  34. Colak, D., Zaninovic, N., Cohen, M. S., Rosenwaks, Z., Yang, W. Y., Gerhardt, J., et al. (2014). Promoter-bound trinucleotide repeat mRNA drives epigenetic silencing in fragile X syndrome. Science, 343, 1002–1005.

    PubMed PubMed Central CAS Google Scholar 

  35. Collart, C., Allen, G. E., Bradshaw, C. R., Smith, J. C., & Zegerman, P. (2013). Titration of four replication factors is essential for the Xenopus laevis midblastula transition. Science, 341, 893–896.

    PubMed PubMed Central CAS Google Scholar 

  36. Cooley, A., Zelivianski, S., & Jeruss, J. S. (2010). Impact of cyclin E overexpression on Smad3 activity in breast cancer cell lines. Cell Cycle, 9, 4900–4907.

    PubMed PubMed Central CAS Google Scholar 

  37. Cornacchia, D., Dileep, V., Quivy, J.-P., Foti, R., Tili, F., Santarella-Mellwig, R., et al. (2012). Mouse Rif1 is a key regulator of the replication-timing programme in mammalian cells. EMBO Journal, 31, 3678–3690.

    PubMed CAS Google Scholar 

  38. Coronado, D., Godet, M., Bourillot, P. Y., Tapponnier, Y., Bernat, A., Petit, M., et al. (2013). A short G1 phase is an intrinsic determinant of naïve embryonic stem cell pluripotency. Stem Cell Research, 10, 118–131.

    PubMed Google Scholar 

  39. Costantino, L., Sotiriou, S. K., Rantala, J. K., Magin, S., Mladenov, E., Helleday, T., et al. (2014). Break-induced replication repair of damaged forks induces genomic duplications in human cells. Science, 343, 88–91.

    PubMed CAS Google Scholar 

  40. Courbet, S., Gay, S., Arnoult, N., Wronka, G., Anglana, M., Brison, O., et al. (2008). Replication fork movement sets chromatin loop size and origin choice in mammalian cells. Nature, 455, 557–560.

    PubMed CAS Google Scholar 

  41. D’Alessandro, G., Whelan, D. R., Howard, S. M., Vitelli, V., Renaudin, X., Adamowicz, M., et al. (2018). BRCA2 controls DNA:RNA hybrid level at DSBs by mediating RNase H2 recruitment. Nature Communications, 9, 1–17.

    Google Scholar 

  42. D’Andrea, A. D. (2010). Susceptibility pathways in Fanconi’s anemia and breast cancer ed. R.S. Schwartz. The New England Journal of Medicine, 362, 1909.

    PubMed PubMed Central Google Scholar 

  43. Debatisse, M., & Rosselli, F. (2019). A journey with common fragile sites: From S phase to telophase. Genes, Chromosomes and Cancer, 58, 305–316.

    PubMed CAS Google Scholar 

  44. Delgado, S., Gomez, M., Bird A., & Antequera F. (1998). Initiation of DNA replication at CpG islands in mammalian chromsomes. EMBO Journal, 17, 2426–2435.

    PubMed CAS Google Scholar 

  45. Deng, L., Wu, R. A., Sonneville, R., Kochenova, O. V., Labib, K., Pellman, D., et al. (2019). Mitotic CDK Promotes Replisome Disassembly, Fork Breakage, and Complex DNA Rearrangements. Molecular Cell, 73, 915–929.e6.

    PubMed PubMed Central CAS Google Scholar 

  46. Denison, S. R., Callahan, G., Becker, N. A., Phillips, L. A., & Smith, D. I. (2003). Characterization of FRA6E and its potential role in autosomal recessive juvenile parkinsonism and ovarian cancer. Genes, Chromosomes and Cancer, 38, 40–52.

    PubMed CAS Google Scholar 

  47. Dileep, V., & Gilbert, D. M. (2018). Single-cell replication profiling to measure stochastic variation in mammalian replication timing. Nature Communications, 9, 427.

    PubMed PubMed Central Google Scholar 

  48. Dileep, V., Ay, F., Sima, J., Vera, D. L., Noble, W. S., & Gilbert, D. M. (2015). Topologically associating domains and their long-range contacts are established during early G1 coincident with the establishment of the replication-timing program. Genome Research, 25, 1104–1113.