Phenotype-related features, such as age, are linked to genetic components through regulatory pathways and epigenetic modifications, such as DNA methylation, are major regulators in the translation of genotype to phenotype. Age-related CpG sites have been identified and described to have predictive value in the estimation of chronological age from biological samples. Currently, CpG sites are chosen based only on statistical correlation with age, and not based on their location on the chromosome or on the function of genes where they are located. There seems to be a lack of information regarding the identified genes and if they have a direct or indirect relation to the aging process, or if the statistical correlation is more likely to be random. In this work, we analyzed studies published in the last six years on DNA methylation age markers and a total of 59 genes were identified as being age-related. The five most used ones by different prediction models were: ELOVL2, FHL2, KLF14, TRIM59 and C1orf132. After further investigation, it was possible to notice that although the function of genes is not the primary criteria when choosing targets to build age prediction models, it is clear that the statistical correlation found between DNA methylation and age is not random. The sites and genes chosen by different studies have mostly direct association with aging, playing important roles in metabolism, cell proliferation, migration and immune signaling pathways, and also being involved in inflammatory responses and in cancer development.
Age prediction; DNA methylation; Forensic epigenetics; Investigative lead
DNA samples left behind in crime scenes are often used for identification purposes by comparing them to reference samples or to a forensic database. In cases where there is no match, investigators may turn to alternative approaches using advanced technologies to gain valuable leads on phenotypic traits (externally visible characteristics) of the person who had left the DNA material. Previous studies have found that phenotypes can be predicted by using information obtained from different DNA markers, such as single nucleotide polymorphisms (SNPs), insertion/deletion polymorphisms (InDel), and epigenetic markers [1-11]. It is well established that epigenetic modifications are major regulators in the translation of genotype to phenotype. Epigenetic regulation encompasses various levels of gene expression and involves modifications of DNA, histones, RNA and chromatin, with functional consequences for the human genome [12-14]. DNA methylation is the most well-characterized epigenetic modification. It is usually associated with transcriptional repression and it involves the addition of a methyl group (– CH3) in the cytosine of CpG dinucleotides. Differences in DNA methylation levels in specific regions of the genome can affect the expression of target genes and downstream phenotypes. As differential methylation patterns began being more widely described, research started focusing on understanding how variations in human DNA methylation patterns can be used to describe and predict phenotype-related features, such as age. Age-related CpG sites have been identified and have been described to have predictive value in the estimation of chronological age from biological samples [15-18].
The most current used methodology to identify CpG sites consists in using data obtained from DNA methylation arrays, such as the Infinium HumanMethylation450 Bead Chip array (Illumina, CA). This platform interrogates over 450,000 individual CpG sites and quantifies DNA methylation by measuring signal intensity emitted by methylated and un methylated probes. Then, the analysis software will use these probe intensities to calculate beta-values, which will provide an estimated methylation status. For example, a CpG site with a 0.9 beta-value is estimated to have 90% methylation [18].Most age prediction studies use methylation raw data from different tissue samples followed by the extraction of beta-values for each CpG site. Then, CpG sites highly correlated to age are selected and a statistical model for age prediction is developed [19]. Based on this description of the selection of CpG sites, it is clear that they are not primarily chosen based on their location on the chromosome or on the function of the genes where they are located. Rather, CpG sites are chosen based only on statistical correlation with age. Published studies on age prediction usually present the sites chosen and briefly mention the genes involved. There seems to be a lack of information regarding the identified genes and if they have a direct or indirect relation to the aging process, or if the statistical correlation is more likely to be random. Hence, this study aimed to evaluate the current progress in forensic age prediction based on DNA methylation patterns and to investigate the identified sites/genes and their significance and correlation with human aging.
A comprehensive review of articles on forensic age prediction was done to identify previously published age-related DNA methylation markers and to gather information on CpG sites, gene, age estimation error and sample type. The most commonly used CpG sites in different age prediction models were identified and an investigation of the genes where these markers are located was performed. The identified genes and their correlation with any aging process were evaluated by using information found on the National Center for Biotechnology Information (NCBI) [20] and the Gene Cards [21] databases.
Despite all work done so far in investigating DNA methylation markers for age prediction, some aspects of age estimation based on epigenetic data need to be further understood. As seen in the studies published to date, there seems to be a functional link between DNA methylation and age since the regulatory regions of some genes have a tendency to become more methylated with the increase of age [22]. However, while studies report CpG sites identified as presenting a relation to human aging, there is a lack of further information on the genes where these CpG sites are located and if these sites are direct or indirectly associated to the aging process.
A total of 29 studies on DNA methylation age markers published in the last six years were analyzed to evaluate the current progress in forensic age prediction and to better investigate the sites and genes identified by different authors. The results from this investigation are presented in (Table 1). Most of the age predictions models were built on results gathered from blood sample analysis. However, an ideal age prediction tool should work for different body fluids, and for this reason, more recent studies have been focusing on expanding the tissue-type analyzed to at least include the most common types of body fluids found in crime scenes, such as saliva and semen [23-47].
Study |
Genes |
Mean absolute deviation or prediction error |
Sample type |
Weidner et al. [23] |
ITGA2B, ASPA, PDE4C |
4.5 years |
blood |
Yi et al. [15] |
TBOX3, GPR137, ZIC4, ZDHHC22, MEIS1, UBE2E1, PTDSS2, UBQLN1 |
NA |
blood |
Zbiec-Piekarska et al. [17] |
ELOVL2, C1orf132, TRIM59, KLF14, FHL2 |
3.9 years |
blood |
Lee et al. [24] |
TTC7B, NOX4 |
5.4 years |
semen |
Bekaert et al. [25] |
ASPA, PDE4C, ELOVL2, EDARADD |
3.8 years (blood) and 4.9 years (teeth) |
blood, teeth |
Xu et al. [26] |
ADAR, ITGA2B, PDE4C |
5.1 years |
blood |
Huang et al. [27] |
ASPA, ITGA2B, NPTX2 |
7.9 years |
blood |
Park et al [18] |
ELOVL2, ZNF423, CCDC102B |
3.4 years |
blood |
Friere-Aradas et al [28] |
ELOVL2, ASPA, PDE4C, FHL2, CCDC102B, C1orf132 |
3.1 years |
blood |
Giuliani et al. [29] |
ELOVL2, FHL2, PENK |
2.3 years (pulp), 7.1 years (dentin) and 2.5 years (cementum) |
teeth |
Zubakov et al. [30] |
DMH1, DMH2, DMH3, FHL2, ELOVL2 |
4.3 years |
blood |
Hong et al [31] |
SST, CNGA3, KLF14, TSSK6, TBR1, SLC12A5, PTPN7 |
3.2 years |
saliva |
Alghanim et al. [32] |
KLF14, SCGN |
7.1 years (saliva) and 10.3 years (blood) |
blood, saliva |
Cho et al. [33] |
ELOVL2, C1orf132, TRIM59, KLF14, FHL2 |
3.3 years |
blood |
Naue et al. [34] |
ELOVL2, F5, KLF14, TRIM59 |
3.6 years |
blood |
Vidaki et al. [35] |
NHLRC1, SCGN, CSNK1D |
7.1 years (blood) and 3.2 years (saliva) |
blood, saliva |
Li et al. [36] |
NHLRC1, SCGN, ASPA, EDARADD, CSNK1D, LAG3 |
4.1 years (healthy samples) and 7.1 years (diseased samples) |
blood |
Alifieri et al. [37] |
VGF, TRIP10, KLF14, CSNK1D, FZD9, C21orf63, SSRP1, NHLRC1, ERG, FXN, P2RXL1, SCGN |
4.1 years (blood) and 7.3 years (saliva) |
blood, saliva |
Feng et al. [38] |
TRIM59, PDE4C, ELOVL2, CCDC102B, C1orf132, RASSF5 |
2.8 years |
blood |
Jung et al [39] |
ELOVL2, FHL2, KLF14, C1orf132, TRIM59 |
3.8 years |
blood, saliva, buccal swabs |
Alsaleh et al. [40] |
FHL2, ELOVL2 |
4.6 years |
blood |
Gentile et al. [41] |
ELOVL2, C1orf132, TRIM59, KLF14, FHL2 |
5.4 years (highest MAD by age group) |
saliva |
Xu et al. [42] |
SALL4, MBP, C17orf76, B3GALT6, NOC2L, SNN, NPTX2, SLC22A18, TMEM106A, LEP, SCAP, C16orf30, FLJ25410 |
4.7 years |
bonemarrow, dermal fibroblast, buccal, prostate NL, sperm, saliva, colon, breast NL, muscle, placenta, liver, fat adip, brain occipital cortex, brain CRBLM |
Dias et al. [43] |
ELOVL2, FHL2, EDARADD, PDE4C, C1orf132 |
8.8 years |
blood |
Heidegger et al. [44] |
ELOVL2, FHL2, KLF14, MIR29B2CHG, TRIM59 |
NA |
blood |
Lee et al. [45] |
TMEM51, TRIM59, ELOVL2, and EPHA6 |
NA |
bones |
Li et al. [46] |
NOX4 |
4.2 years (liquid semen and fresh seminal stains), 4.4 years (aged seminal stains), 3.9 years (mixed stains) |
sperm |
Pan et al. [47] |
ASPA, EDARADD, CCDC102B, ZNF423, ITGA2B, KLF14, FHL2 |
4.6 years |
blood |
Sukawutthiya et al. [22] |
ELOVL2 |
7.1 years |
blood |
Table 1: Information on DNA methylation-based age prediction studies.
NA = not available
MAD values are shown for the validation/training sets
A total of 59 genes were identified as being age-related and the five most used ones were ELOVL2, FHL2, KLF14, TRIM59 and C1orf132. All genes are protein coding genes, except for C1orf132, which is a ncRNA gene type. Interestingly, most of the identified genes (44 of the 59) were not used by multiple studies, being part of only one age prediction model. More information on the less cited genes can also be found in Supplementary Table 1.
Gene |
Number of studies using the gene |
Gene role |
GeneCards/NCBI reference |
ELOVL2 |
16 |
metabolism, and transferase and fatty acid elongase activities |
https://www.genecards.org/cgi-bin/carddisp.pl?gene=ELOVL2 |
FHL2 |
11 |
assembly of extracellular membranes |
https://www.genecards.org/cgi-bin/carddisp.pl?gene=FHL2&keywords=FHL2 |
KLF14 |
10 |
transcriptional repression |
https://www.genecards.org/cgi-bin/carddisp.pl?gene=KLF14&keywords=KLF14 |
TRIM59 |
8 |
regulator for innate immune signaling pathways |
|
C1orf132/MIR29B2CHG |
7 |
mir-29 microRNA precursor |
https://www.genecards.org/cgi-bin/carddisp.pl?gene=MIR29B2CHG&keywords=C1orf132 |
ASPA |
6 |
maintain white matter |
https://www.genecards.org/cgi-bin/carddisp.pl?gene=ASPA&keywords=ASPA |
PDE4C |
6 |
regulation of cellular concentration of cAMP |
|
CCDC102B |
4 |
centrosome cohesion and centrosome linker assembly |
|
EDARADD |
4 |
morphogenesis of ectodermal organs |
|
ITGA2B |
4 |
blood coagulation system |
|
SCGN |
4 |
KCL-stimulated calcium flux and cell proliferation |
https://www.genecards.org/cgi-bin/carddisp.pl?gene=SCGN |
CSNK1D |
3 |
diverse cellular growth and survival processes |
https://www.genecards.org/cgi-bin/carddisp.pl?gene=CSNK1D |
NHLRC1 |
3 |
suppression of cellular toxicity by protein degradation |
https://www.genecards.org/cgi-bin/carddisp.pl?gene=NHLRC1 |
NOX4 |
2 |
signal transduction, cell differentiation and tumor cell growth |
https://www.genecards.org/cgi-bin/carddisp.pl?gene=NOX4 |
NPTX2 |
2 |
excitatory synapse formation |
https://www.genecards.org/cgi-bin/carddisp.pl?gene=NPTX2&keywords=NPTX2 |
ZNF423 |
2 |
signal transduction during development |
|
ADAR |
1 |
RNA editing |
|
B3GALT6 |
1 |
transfer of galactose and glycosaminoglycan synthesis |
|
C16orf30 |
1 |
cell adhesion and cellular permeability at adherens junctions |
https://www.genecards.org/cgi-bin/carddisp.pl?gene=TMEM204 |
C17orf76/LRRC75A |
1 |
protein-protein interactions |
|
C21orf63/EVA1C |
1 |
carbohydrate binding |
https://www.genecards.org/cgi-bin/carddisp.pl?gene=EVA1C&keywords=C21orf63 |
CNGA3 |
1 |
normal vision and olfactory signal transduction |
https://www.genecards.org/cgi-bin/carddisp.pl?gene=CNGA3 |
DMH |
1 |
integrator of intermediate filaments, actin and microtubule cytoskeleton networks |
https://www.genecards.org/cgi-bin/carddisp.pl?gene=DST&keywords=DMH |
EPHA6 |
1 |
transferase activity |
https://www.genecards.org/cgi-bin/carddisp.pl?gene=EPHA6 |
ERG |
1 |
embryonic development, cell proliferation, differentiation, angiogenesis, inflammation, and apoptosis |
https://www.genecards.org/cgi-bin/carddisp.pl?gene=ERG |
F5 |
1 |
blood coagulation cascade |
https://www.genecards.org/cgi-bin/carddisp.pl?gene=F5 |
FLJ25410/SEPTIN12 |
1 |
cytokinesis, exocytosis, embryonic development, and membrane dynamics |
https://www.genecards.org/cgi-bin/carddisp.pl?gene=SEPTIN12 |
FXN |
1 |
mitochondrial iron transport and respiration |
https://www.genecards.org/cgi-bin/carddisp.pl?gene=FXN |
FZD9 |
1 |
neuromuscular junction assembly, neural progenitor cells viability and neuroblast proliferation and apoptotic cell death |
https://www.genecards.org/cgi-bin/carddisp.pl?gene=FZD9 |
GPR137 |
1 |
MTORC1 complex translocation to lysosomes, autophagy and epithelial cell function |
https://www.genecards.org/cgi-bin/carddisp.pl?gene=GPR137 |
LAG3 |
1 |
innate Immune System and Class I MHC mediated antigen processing and presentation |
https://www.genecards.org/cgi-bin/carddisp.pl?gene=LAG3 |
LEP |
1 |
regulation of energy homeostasis, immune and inflammatory responses, hematopoiesis, angiogenesis, reproduction, bone formation and wound healing |
https://www.genecards.org/cgi-bin/carddisp.pl?gene=LEP |
MBP |
1 |
protease binding and structural constituent of myelin sheath |
https://www.genecards.org/cgi-bin/carddisp.pl?gene=MBP |
MEIS1 |
1 |
hematopoiesis, megakaryocyte lineage development and vascular patterning |
https://www.genecards.org/cgi-bin/carddisp.pl?gene=MEIS1 |
NOC2L |
1 |
inhibition of histone acetyltransferase activity |
https://www.genecards.org/cgi-bin/carddisp.pl?gene=NOC2L |
P2RXL1 |
1 |
identical protein binding and ion channel activity |
https://www.genecards.org/cgi-bin/carddisp.pl?gene=P2RX6 |
PENK |
1 |
physiologic functions, including pain perception and responses to stress |
https://www.genecards.org/cgi-bin/carddisp.pl?gene=PENK |
PTDSS2 |
1 |
metabolism and glycerophospholipid biosynthesis |
https://www.genecards.org/cgi-bin/carddisp.pl?gene=PTDSS2 |
PTPN7 |
1 |
cell growth, differentiation, mitotic cycle, and oncogenic transformation |
https://www.genecards.org/cgi-bin/carddisp.pl?gene=PTPN7 |
RASSF5 |
1 |
tumor suppressor |
https://www.genecards.org/cgi-bin/carddisp.pl?gene=RASSF5 |
SALL4 |
1 |
development of abducens motor neurons and maintenance and self-renewal of embryonic and hematopoietic stem cells |
https://www.genecards.org/cgi-bin/carddisp.pl?gene=SALL4 |
SCAP |
1 |
cholesterol and lipid homeostasis |
https://www.genecards.org/cgi-bin/carddisp.pl?gene=SCAP |
SLC12A5 |
1 |
protein kinase binding and potassium:chloride symporter activity |
https://www.genecards.org/cgi-bin/carddisp.pl?gene=SLC12A5 |
SLC22A18 |
1 |
tumor suppressor |
|
SNN |
1 |
toxic effects of organotins and endosomal maturation |
https://www.genecards.org/cgi-bin/carddisp.pl?gene=SNN |
SSRP1 |
1 |
chromatin transcriptional elongation factor FACT |
https://www.genecards.org/cgi-bin/carddisp.pl?gene=SSRP1 |
SST |
1 |
hormone activity |
|
TBOX3 |
1 |
developmental processes, limb pattern formation and regulation of PML function in cellular senescence |
https://www.genecards.org/cgi-bin/carddisp.pl?gene=TBX3&keywords=T-BOX3 |
TBR1 |
1 |
cortical development, including neuronal migration, laminar and areal identity, and axonal projection |
https://www.genecards.org/cgi-bin/carddisp.pl?gene=TBR1 |
TMEM106A |
1 |
Activation and polarization of macrophages |
https://www.genecards.org/cgi-bin/carddisp.pl?gene=TMEM106A |
TMEM51/C1orf72 |
1 |
unkown |
https://www.genecards.org/cgi-bin/carddisp.pl?gene=TMEM51&keywords=TMEM51 |
TRIP10 |
1 |
identical protein binding and lipid binding |
|
TSSK6 |
1 |
sperm production and function, and DNA condensation during postmeiotic chromatin remodeling |
https://www.genecards.org/cgi-bin/carddisp.pl?gene=TSSK6 |
TTC7B |
1 |
regulation of phosphatidylinositol 4-phosphate (PtdIns(4)P) synthesis |
https://www.genecards.org/cgi-bin/carddisp.pl?gene=TTC7B&keywords=TTC7B |
UBE2E1 |
1 |
photodynamic therapy-induced unfolded protein response and regulation of activated PAK-2p34 by proteasome mediated degradation |
https://www.genecards.org/cgi-bin/carddisp.pl?gene=UBE2E1 |
UBQLN1 |
1 |
protein degradation mechanisms and pathways |
https://www.genecards.org/cgi-bin/carddisp.pl?gene=UBQLN1 |
VGF |
1 |
growth factor activity and neuropeptide hormone activity |
https://www.genecards.org/cgi-bin/carddisp.pl?gene=VGF |
ZDHHC22 |
1 |
protein-cysteine S-palmitoyltransferase activity |
https://www.genecards.org/cgi-bin/carddisp.pl?gene=ZDHHC22 |
ZIC4 |
1 |
development and X-linked visceral heterotaxy and holoprosencephaly type 5 |
https://www.genecards.org/cgi-bin/carddisp.pl?gene=ZIC4 |
Supplementary Table 1: Identified genes and their functions. All genes are protein coding genes, except for C1orf132 (ncRNA gene type).
CpG sites in the ELOVL2 gene were the most used ones by multiple studies (Supplementary Table 1). The ELOVL2 gene also to be part of the predictive models with the lowest mean absolute deviation.ELOVL2 (elongation of very-long-chain fatty acids-like 2) plays a role in metabolism, including omega-3 and omega-6 metabolism, and it is also related to transferase and fatty acid elongase activities [48].Chen et al [49] investigated the ELOVL2 association to functional and anatomical aging in vivo, focusing on mouse retina. ELOVL2 is involved with the elongation of long-chain omega-3 and omega-6 polyunsaturated acids, which play essential roles in retinal function. The authors reported that the ELOVL2 promoter region is increasingly methylated with age in the retina and that the decrease in the expression of this gene can be a regulator of a molecular aging clock in the retina. Also, according to Bacalini et al [50], ELOVL2 methylation is related to cell replication and the target regions within ELOVL2 get hypermethylated with cell divisions. Based on other studies [51], the authors suggest that ageing is affected by the number of cell divisions and that the extend life span is related to a decrease in division rate. Also, the same authors present the idea that since ELOVL2 hypermethylation occurs in different tissues; this locus is a target of methyltransferase activity throughout cell replications. The second locus most used by different studies was the FHL2 (Four and a Half LIM Domains Protein 2) gene which encodes a protein that plays a role in the assembly of extracellular membranes [52]. This gene is also related to transcription activity, cell proliferation, apoptosis, adhesion, migration, structural stability, and tissue repair and inflammation [53]. According to Cao et al [54],, FHL2 plays an important role in cancer cell invasion, migration and adhesion to extracellular matrix, and the mutations in this gene and post-translational modifications may also contribute to carcinogenesis.
The KLF14 (Kruppel Like Factor 14) gene was used in age prediction models proposed by ten different studies. This gene encodes a protein that functions as a transcriptional co-repressor [55]. Genome-wide association studies (GWAS) have also shown that KLF14 play an important role in the development of metabolic diseases and that variants near this gene are associated with T2DM (type 2 diabetes mellitus) and HDL-C (high-density lipoprotein cholesterol) [56]. Assuncao et al [57] showed that besides being a regulator of lipid metabolism, KLF14 also mediates lipid signaling. And to further support these findings, [56] showed that the inhibition of glucose uptake induced by high glucose and high insulin can be prevented by the over expression of KLF14. More recently, it has been shown that KLF14 is involved in chronic inflammatory responses and in atherosclerosis development [58,59]. The fourth most used gene by the studies analyzed in this work was TRIM59 (Tripartite Motif Containing 59), a protein coding gene that is involved in the innate immune signaling pathways [60]. TRIM59 was also shown to be involved in the induction of cellular senescence by affecting Ras and RB signal pathways [61] and to be a promoter of growth and migration in different types of cancer [59,62].
C1orf132 (Chromosome 1 Open Reading Frame 132), currently known as MIR29B2CHG (MIR29B2 And MIR29C Host Gene) is the fifth most cited gene. This is an RNA gene part of the lncRNA class and a mir-29 microRNA precursor [63]. microRNAs regulate the translation of proteins and the mir-29 family members were shown to be down-regulated in different types of cancers [64].
As presented in this section and in the supplemental material (Supplementary Table 1), the sites and genes chosen by different studies and that contribute to age prediction have mostly direct association with aging, playing important roles in metabolism, cell proliferation, migration and immune signaling pathways, and also being involved in inflammatory responses and in cancer development.
During life, humans are exposed to many environmental factors that affect the level of methylation in different genes. Most likely, some locations in the chromosomes are more sensitive to changes in the environment and for this reason they present a stronger correlation to the ageing process and provide better markers to be used in predictive models [59]. It is also interesting to point out that in order to create more accurate and reliable age prediction models in forensic practice, researchers should also focus on exploring RNA genes, such as MIR29B2CHG. This type of gene has shown to have direct and indirect effects of DNA methylation and protein expression and can add value to prediction models when combined with other strongly age-associated genes, such as ELOVL2.
Although the function of genes is not the primary criteria when choosing targets to build age prediction models, it is clear that the statistical correlation between DNA methylation and age obtained from the analysis of methylation arrays is not random. The information presented in this work shows how important it is to investigate and discuss the function of the genes and sites used in the prediction models. Future publications should include function information on the targeted genes to provide an extra layer of information to forensic investigators while using this type of investigative tool.
Georgia Karantenislis is conducting an independent graduate-level research project as part of the Advanced Science Research (ASR) program.
None.
Citation: Silva DSBS and Karantenislis G (2021) Analysis of DNA Methylation Sites used for Forensic Age Prediction and their Correlation with Human Aging. Forensic Leg Investig Sci 7: 054.
Copyright: © 2021 Silva DSBS, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.