The eukaryotic genome is a complex and dynamic structure. About 50% of the human genome, and possibly more is covered by Mobile Genetic Elements (MGE), which for a long time were considered unnecessary ballast [1,2]. Studies of recent years convincingly testify to the important role of these elements in the evolution of genomes and in the evolution of gene regulation [3-15]. They are not only mutagenic factors, but they can also be the source of a variety of regulatory sequences, such as sites of alternative splicing, cis-regulatory modules, which are clusters of binding sites of transcription factors, or play the role of alternative promoters [16-24]. MGE tends to integrate into non-coding genome sites (introns, flanked genes and intergenic sites) . On average, in the human introns, MGE is approximately 89.5%; in exons, they account for slightly more than 10%, in particular about 4% for protein-encoding genes . Up to 20% of the genes contain MGE in non-translated regions of mRNA, where they can affect the regulation of gene expressions; in particular, the MGE in 5'UTR affects the initiation of translation . It is known that about 25% of human genes contain MGE in promoter regions, and today there is convincing evidence of their involvement, in particular promoters of retroelements, in the regulation of gene transcriptional activity [21,26]. There are cases where MGE plays the role of alternative promoters, which leads either to increase the level of expression of the corresponding gene, or to change the tissue specificity of its expression .Human MGMT gene encodes a reparative enzyme called O6-methylguanine-DNA methyltransferase, which removes alkylated groups from the O6 position of guanine in DNA and protects cells from their toxic and mutagenic effects. Expression of this gene and the activity of the enzyme themselves have wide limits both in between and intrinsically individual variations, indicating that its regulation is complicated . Given the complexity of this problem, it might be worthwhile to focus on MGE, the role of which in gene regulation has recently been discussed quite widely. Numerous databases today have enormous material on the availability of mobile elements, but the data array is not characterized and not generalized. Perhaps the MGE of MGMT gene is also a source of a variety of regulatory sequences, which led us to focus our attention on the study of this issue.
Materials and Methods
The nucleotide sequence of the MGMT gene is taken on the Ensembl site. Data on the promoter region of the gene were obtained from GenBank (X61657), about the potential promotor regions of the gene being studied from the AceView database. The results of the search and identification of IHE are done with the CENSOR program. The homology between the sequences studied was determined by the BLAST 2.2.32 program. Functional sites are defined by TFSEARCH: Searching Transcription Factor Binding Sites (ver 1.3). The search for potential regulatory sequences was performed by SITECON, SiteGA, BLASTN 2.2.26 using the TRRD database resources (Transcription Regulatory Regions Database) and CISTER: Cis-element Cluster Finder, WWW Signal Scan and Tfsitescan. These methods and databases were used in the relevant sections of the article.
Results and Discussion
Distribution of MGE in the gene of human MGMT reparative enzyme
Human MGMT gene is localized on telomeric region of chromosome 10 at position 10q26 and consists of one non-coding and four coding exons and four introns. Mobile genetic elements in human MGMT gene are present in all intron sequences and absent in exons (Figure1). Most of their fragments were detected in the intron 3 (44.74%) and the smallest in the intro 4 (8.26%). In most of the introns, non-LTR retrotransposons predominate among MGE classes (in particular LINE-elements). The share of endogenous retroviruses as well as DNA transposons is insignificant in all introns (Table 1).
Clusters of MGE were found in the intron sequences of the studied gene. Thus, in intrusion 2 there are three compositional clusters, which, in addition to LINE-elements, include representatives of other classes (Table 2). Intron 3 also revealed three clusters of LINE-elements, one of which is composite (Table 3).
It is known that MGE tends to cluster in intragenic sites of genes and introns . Among the MGEs, clustering tends to Alu-repeats and suggests that these clusters are involved in chromosomal rearrangements . In the case of the MGMT gene of the individual, all identified clusters include fragments of LINE1-elements alone or in combination with representatives of other classes of MGE (Tables 2 and 3).Recently, the literature actively discusses the role of MGE in regulation of genetic activity. For retrotransposons, in particular for LINE-elements located in intron sequences, it has been shown that they can interfere with the transcription of the gene, causing exonization, introns content, crypt polyadenylation, or act as a “downtrodden policeman” by regulating the passage of RNA polymerase II [32,33]. A “gene breakdown” hypothesis is also proposed, according to which the LINE1-element, which is located in the intron in the opposite direction of transcription, can “break” the transcript into two parts .Consequently, the human MGMT gene of the MGE is present in the intron sequences. In two introns, they form composite cluster structures that can be the source of a variety of regulatory sequences and potentially affect its expression.
Composite cluster structures of MGE in the introns of the investigated gene as a source of potential regulatory sequences
Of the six clusters we identified, within the introns of the human MGMT gene, for the further analysis, those that, in addition to the fragments of LINE-elements, also contain Alu-repeats. Such compositional clusters revealed two, one in the intron 2 (Table 2, cluster number 1) and in the 3rd (Table 3, cluster number 1). The length of the first cluster is 1339 bp. In addition to the full-length AluSg-repeater, which is in the opposite direction to the transcription, there are fragments of two LINE-elements. Within the AluSg-repeat, homology with binding sites for four transcription factors, namely YY1, p300, C/EBP and for transcription activator of HSF2 heat shock genes has been detected.Interestingly, the analysis of the consensus sequence of Alu-repeats showed that it contains conservative sites that the YYI transcription factor can theoretically bind to . It is known that there are large numbers of consensus sites in the human genome to bind the transcription factor YY1, and some of them (24%) are localized in Alu-repeats. It is noted that about 50% of Alu-repeats contain a potential site for binding YY1 . This allowed expressing the assumption about the role of the Alu-repeats of activation and in the suppression of transcription. The assumption is indirectly confirmed by the data on the ability of the YY1 protein to bind to regulatory regions of the gene and thereby increase or suppress transcription . The YY1 property to attract other proteins to the site of its binding to DNA also can affect the activity of the genome. For example, YY1 specifically binding to the hRPD3 protein, which detects histone deacetylase activity, may initiate complexation in certain regions of the genome of the complex that translates chromatin into an inactive state . Such interaction can lead to changes in the level of transcription. In addition, YY1 protein may interact with ADPRT protein, which has ADP-ribosyl transferase activity . It is shown that YY1 protein, interacting with the protein ADPRT, stimulates its auto-ribosylation . On the basis of these data, the mechanism by which the negative regulation of transcription with the participation of Alu-repeats can be carried out is proposed. The YY1 protein binds to the Alu-repeat “attracts” the ADPRT protein and initiates its auto ribozylation. In a certain region of the gene there is a complex that sterically opposes the formation of the complex of initiation, of transcription. According to the authors in a similar way, the regulation of genes that are under the control of hormone-associated elements can be carried out. This will reveal data on the interaction of the ADPRT protein with the TR/RXR complex, which complicates the binding of nuclear receptors to the complex of initiation . In the consensual sequence of Alu-repeats, a block of AGGTCA with which the transcription factors that belong to the family of nuclear hormonal receptors can be identified. The bulk (more than 70%) of the potential hormone-associated elements for the thyroid hormone, retinoic acid and estrogens is located exactly in the Alu-repeats. CV-1 cells show that binding to nuclear receptors, the Alu-Associated DR-4 element can regulate transcription activity depending on the presence of the thyroid gland hormones . Proteins of a large family of nuclear hormonal receptors can bind not only to the AGGTCA block but also to the various variants that arise due to its duplication. It is suggested that the Alu-repeats the “containers” that contain sets of potential sequences for the binding of various transcription factors [41,42].Within the fragments of the LINE-elements that are part of the composite cluster structure within the intron 2 MGMT gene, homology with binding sites for 22 transcription factors was detected. Among the number of identified potential sites, I would like to highlight the sites of Heat Shock protein binding (HSF2), C/EBP, SRY, STAT, Oct, GATA and AP-1. In addition, TATA box and Glucocorticoid Receptor (GR) and ROR alpha 1 (orphan hormone nuclear receptor) binding sites have been identified. This is particularly interesting as previously the presence of sites for glucocorticoid receptor binding within Alu-repeats has been shown . It has also been found that hormone-acceptor elements for the thyroid hormone, estrogen and retinoic acid are mainly localized in Alu-repeats .The presence of TATA box can be a prerequisite for the existence of an alternative promoter in the human MGMT gene within the intron 2. It is worth noting that in addition to the internal promoter, L1 contains an Antisense Promoter (ASP) . The analysis of the database of the expressed gene sequences of the man revealed 49 chimeric transcripts, which begin in LPAA and are part of the mRNA of known genes . In 45 cases, the direction of transcription from the ASP and the promoter of the gene coincided, and the four ASP activities led to the formation of the corresponding complementary RNA. It is assumed that the L1-compatible ASP gene may serve as an alternative promoter and may either lead to the appearance of a chimeric mRNA that is translated to form the same protein (in the case of L1, “topically” relative to the point of transcription), or to the formation of 5'-truncated mRNAs, the translation of which leads to the appearance of various N-terminal forms of protein (in the case of the location of L1 in the intron of the gene). In addition, the ASP activity deserves special attention to be aligned with the L1 gene located in the intron, since in this case, chimeric RNAs that contain sequences complementary to the exons of the gene that potentially can regulate the activity of the corresponding gene through the mechanism of RNA interference are formed.It is worth mentioning also the “gene-breaking” hypothesis, according to which L1, located in the introns of the gene in the opposite direction of tracing the orientation gene, can “break” the transcript into two parts: 1-the RNA, which covers “above lying” exon and ends in the antisense site polyadenylation L1 and 2-a transcript consisting of an ASP L1 and encompassing “non-existent“ exons .The second composite cluster we investigated is located within the intron 3 (Table 3, cluster number 1). It has a length of 1527 bp and consists of two fragments of LINE-elements and a full-length AluSz6-repeat. The transcription direction of all the constituent parts of the cluster composite structure in this case coincides with the direction of transcription of the gene. Interestingly, as in the case of the AluSg-repeat from cluster 1 in intron 2, the AluSz6-repeat also contains an element of response to the HSF2 heat shock proteins. Within the fragments of LINE-elements among other potential sites, as in the previous cluster, the homology with binding sites for C/EBP, SRY, STAT, Oct, GATA and AP-1, TATA box and sites for binding to ROR alpha 1 (orphan hormone nuclear receptor).As can be seen from the results presented in table 4, for some transcription factors, sequences that are homologous to their binding sites are present in retroelements belonging to different families. There are those that are present only in the sequence of fragments of LINE-elements, including the TATA box and the potential hormone-associated element for the retinoid orphan receptor ROR alpha 1 (orphan hormone nuclear receptor).Thus, having analyzed the two compositional cluster structures in the intron 2 and intron 3 the human MGMT gene, which include Alu-repeats and fragments of LINE-elements, it was found that both Alu-repeats contain sequences that are homologous to the response elements to the heat shock proteins, and cluster structures within the fragments of LINE-elements have TATA box sequences. This gives grounds for considering analyzed composite cluster structures of MGE within the intron sequences of human MGMT gene as potential alternative promoters.
Motives of regulatory sequences in the promoter regions of the MGMT gene within the MGE
The promoted gene promoter (X61657) has a length of 1157 bp and covers exon 1 part intron 1. It is devoid of TATA or CAAT sequences, contains CG-rich regions and is structurally reminiscent of genes of the household. It also contains SP1, AP-1 and AP-2, NF-kapBsites, two elements that bind the Glucocorticoid Receptor (GRE) and a 59-bp size element, which is located on the first eccentric-intron boundary, necessary for effective transcription of the reporter designs [46,47].
Fragments of MGE in the promoter region of the investigated gene
By analyzing the promoter region of the human MGMT gene, sequences of two fragments of LTR repeats of mammalian retroviruses were identified, namely, LTR-repeats of endogenous retroviruses ERV3 MLT1C2 and MLT1C, which are located in the distal part of the promoter sequence (Table 5) and are about 23% of the total length of the promoter. It should be noted that the described response elements for Glucocorticoid hormones (GRE) with coordinates 28-42 and 63-77 are localized within one of the fragments of the LTR-repeat, namely, MLT1C2 .In addition, a sequence of the fragment of the Mutator-like non-autonomous DNA transposon of SETARIA1 was found (Table 5). Interestingly, the minimum promoter (886-955) and the sequence of SP1 of the site (862-867) are within the given sequence SETARIA1 . Thus, in the case of the human MGMT gene, the promoter region contains fragments of three MGE, which is almost half its length.It is known that up to 25% of genes contain MGE in promoter regions . Especially enriched MGE genes that are associated with metabolic processes . For human DNA se II and CAML genes, it has been shown that the presence of Alu-repeats in the promoter may affect the expression of these genes . It is known that the human gene MSLN (Mesothelin, a megakaryocyte-potentiating factor) has two promoters, one of which is formed by the sequence of LTR, and the other is an MIR element (tRNA-like SINE) . The only currently known promoter of the BAAT gene specifically expressed in the liver and involved in the development of a hereditary disease associated with bile metabolism, is also the sequence of LTR-repeats . An interesting case of regulation of the transcription of the NAIP gene encoding one of the inhibitors of apoptosis is interesting. It has been shown that the promoter portions of this human gene and rodents do not have homology, but in one and the other, LTR are alternative promoters . In humans, the integration of LTR has led to the formation of a tissue-specific promoter that is active mainly in the testicles, whereas in rodents two LTRs are described that are capable of initiating transcription. One of them is the main, constitutive promoter, active in all rodents, and the other an alternative, found only in the mouse. It is important to note that LTRs of humans and rodents that are in the promoter region of the NAIP gene are not related. This case is an example of the independent involvement of LTR in regulating the transcription of orthologic genes .
Potential cis-elements within the fragments of mobile genetic elements in the promoter regions of the investigated gene
Among the potential cis-regulatory sequences within the LTR- repeats, an element of response to the HSF2 heat shock proteins, binding sites for several representatives of the GATA family of proteins (important regulators specification and differentiation of various tissues), MZF1 (involved in the control of cell proliferation and carcinogenesis), NF-kappaB (participation in the activation of transcription of numerous cytokine and immunoregulatory genes), AML-1 (regulation of hemopoiesis, angiogenesis and neurogenesis),C/EBP (control of the differentiation of various cell types and a key role in regulating cellular proliferation through interaction with cell cycle proteins), CRE-BP (an important role in the development and functioning of the nervous system), Nkx-2.5 (the regulation of the expression of tissue-specific genes, the development of the heart, the time and spatial development models), CDP (regulates gene expression, morphogenesis and differentiation, the role in the progression of the cell cycle, especially in the S-phase) and the potential hormone-associated element for the retinoid orphan R receptor ROR alpha 1 (orphan hormone nuclear receptor) (Figure 2).As part of the sequence of the DNA fragment of the transposition of SETARIA1 (Figure 2), apart from the numerous potential binding sites for the Sp1 factor (one of the major transcription factors involved in the regulation of the cell cycle, chromatin structure changes and DNA methylation regulation), the motives for recognition for transcription factors GATA-2 (hematopoietic transcription factor), c-Rel (participation in immune and inflammatory reactions, developmental processes, cell growth and apoptosis) and USF (one of the key regulatory elements of gene expression).Recently, not only for LTR elements, called “regulatory information packages”, but also for other MGEs, the presence of numerous regulatory sequences in their structure has been shown . Integrating in the promoter regions of the genes, MGEs can be the source of cis- regulatory modules, which are clusters of transcription factor binding sites [21,48,53]. In particular, in the consensus Alu-repetition sequence, there were found binding sites for 20 transcription factors, the functional activity of most of which has been proved experimentally . In addition, Alu-repeats identified functional binding sites for retinoic acid receptors and hormone-acceptor elements [54-57]. As already noted, the bulk of the potential hormone-associated elements for the thyroid hormone, retinoic acid and estrogens are located exactly in the Alu-repeats .The direct involvement of Alu-repeats in expression regulation is shown for genes that are associated with differentiation and development, namely for PTH, FcεRI-γ, CD8α, CHRNA3, BRCA-1 and PLOD-1 genes . In Alu-repeats, sites of transcription factor binding have been identified which are involved in hemopoiesis, T-cell differentiation, and the development of various organs (eyes, teeth, heart, lungs and brain), which is another evidence of participation Alu-repeats in ontogenesis .Thus, in the referenced promoter of the MGMT gene of the human, three sequences have been identified that are homologous to the MGE fragments, namely two LTR-repeats and a fragment of the DNA-transposon. The fact that one of the LTR-repeats contains the previously described response elements for Glucocorticoid hormones (GRE) and that the known minimal promoter and sequence of the SP1 site are located within the fragment of the DNA-transposon, confirms the important role of MGE in gene regulation. Also, the identified MGE fragments in the promoter region of human MGMT enriched with potential cis-regulatory sequences that may be involved in the regulation of this gene.
Fragments of the MGE as components of potential alternative promoters
In addition to the referenced promoter, the AceView database contains information on eight potential promotor sites (length of 2,000 bp) for the gene in question. For four, there is information that these sequences may contain a promoter (aAug10, cAug10, eAug10 and iAug10). According to BLAST, the analysis of potential promotor regions of the human MGMT gene revealed that of the eight potential promoter regions, three sequences (cAug10, hAug10 and iAug10) had reflex sequence homology and no five homology sequences. In particular, the dAug10 sequence is localized within the intron 1. Four other sequences (aAug10, eAug10, fAug10 and gAug10) are located within the intron 2.
Fragments of the MGE in alternative promoter regions of the investigated gene
It has been shown that four of the eight potential promotor regions studied contain fragments of MGE (Table 6 and Figure 3). It is interesting that these are the exact sequences that may include the promoter, as indicated in the AceView database. Therefore, these sequences were analyzed in detail. Two sequences of possible alternative promoters (cAug10 and iAug10) overlapping with the referenced sequence (hO6P) contain additional fragments of MGE-MIRc (NonLTR-retrotransposon) and MER117 (DNA-transposon), whereas the sequences of two fragments of LTR-repeats of mammalian retroviruses, namely the LTR-repeat of the MaLR retrovirus-like element (MLT1C) and the LTR-repeat of the endogenous retrovirus ERV3 (MLT1C2) identified by the CENSOR program as one fragment of the endogenous retrovirus due to the larger size of the nucleotide sequence. This feature of the program should be taken into account in further research. In the sequences aAug10 and fAug10, located within the intron 2, fragments of MGE of different families were identified (Figure 3). Particular attention deserves two sequences: Tigger15a (DNA-transposon) and AluSx1 (NonLTR-Retrotransposon), which is specific to MGE for mammals and primates.
Potential regulatory sequences within identified MGE fragments
All of the MGE fragments we investigated in the alternative human MGMT gene promoters are enriched in a variety of regulatory sequences (Table 7). In particular, the AluSx1-repeat contains a number of new, besides the described, transcription factors binding sites, as well as tissue-specific promoter, enhancer and seylener sequences (Table 7) [48,54-57]. It is known that Alu-repeats can affect cellular processes by providing new transcription termination sites or splice sites or acting as alternative promoters [16,19,21,53]. Such situations may interfere with the normal functioning of the gene, but may sometimes lead to the formation of proteins with new functions . DNA transposons are also enriched by regulatory sequences . The fragment of the Tigger15a element we examined contains binding sites for a number of transcription factors (ERE, HRE, MEF-2 and C/EBP), as well as promoter, enhancer and seylerange sequences and the locus control region (locus control region-like region).The human genome is not only specific to mammals, primates or humans, or their fragments or copies, but also fragments of MGE of other organisms: animals, plants and even bacteria [61-64]. Such sequences are also enriched in a variety of regulatory elements and can affect the transcriptional activity of the human MGMT reparative gene. It is worth noting that the presence of TATA box within the MGE of the potential promoter region of fAug10 (indicated in bold in Table 7) may be a prerequisite for the existence of an alternative promoter.Thus, analyzing the nucleotide sequences of the potential alternative promotor regions of the human MGMT gene from the AceView database revealed that the two sequences (cAug10 and iAug10) are overlapped with the reflux promoter and the other two sequences (aAug10 and fAug10) are located within the intron 2. All of them contain in its composition fragments of MGE that are enriched in a variety of regulatory sequences and can affect the regulation of the transcriptional activity of the human MGMT gene.There is no doubt that MGE can affect the expression of eukaryotic genes. First and foremost, these processes involve their promoters and their associated regulatory sequences. A global analysis of retrotransposon expression in the human genome revealed ~ 275,000 TSS, which is ~ 31% of all known TSS human genomes, although their level of activity is significantly lower than that of conventional genes. Transcription of sequences of MGE also affects the transcript of encoding proteins . It is shown that 576 promoters of human retrotransposons or their fragments are used as alternatives to the transcription of known genes. Also described are cases of enhancers and MGE infectious agents in the transcriptional networks of human genomes, animal plants . All these facts indicate the important role of MGE in phylogeny and ontogenesis of eukaryotes, but their global significance still needs confirmation .
In the human MGMT gene, MGE is present both in the intron sequences and in the promoter region. In the intron sequences, MGEs form composite cluster structures that are the source of various regulatory sequences and have the potential to form alternative promoters. In the promoter region, three sequences of MGE were identified: two LTR-repeats and a fragment of the DNA-transposon. The fact that one of the LTR-repeats contains the described response elements of the Glucocorticoid hormones (GRE) and that the minimal promoter and sequence of the SP1 site are located within the fragment of DNA-transposon confirms the important role of MGE in gene regulation. In addition, the MGE fragments in the promoter region of human MGMT enriched with potential cis regulatory sequences that may also be involved in the regulation of this gene.