Purpose: Epilepsy, characterized by recurrent unprovoked seizures, is a common neurological disorder related to a wide variety of genetic, developmental and acquired brain conditions. Genetically determined epilepsies are associated with a multiplicity of potential genetic variants identified in predominately Caucasian populations. There has been limited research in the US Hispanics/Latino populations. Whole Exome Sequencing (WES), as a rapid and reliable means, has identified causative mutations for a number of diseases. Therefore, we applied WES to identify pathogenic mutations for epilepsy in the Latino population.
Methods: Among 14 symptomatic subjects recruited from the Departments of Psychiatry and Neurology at the Texas Tech University Health Sciences Center in El Paso, Texas, 11 met the diagnosis criteria for epilepsy. WES was performed by the Illumina Nextera Rapid Capture Exome Enrichment kits, then bioinformatics predictions were conducted using the SVS program by accessing data from dbNSFP, which provides scores from multiple functional prediction programs and two conservation scores.
Results: A total of 47,392 variants in exons or at splice-site boundaries were identified. After filtering, 18 non-synonymous rare variants were identified in 8 out of 11 patients. To identify high impact genes and biological pathways, further bioinformatics analysis, “hotspot” analysis and literature-based searches were conducted. The CREBBP gene and pathways, such as “cell cycle” and “Notch signaling” were suggested to be important in the pathophysiology of epilepsy.
Conclusion: This WES led to the discovery of additional epilepsy genes in the US Latino population; however, future validation and segregation analysis using a family design are needed to confirm the current findings.
Epilepsy, characterized by recurrent unprovoked seizures, is the most common neurologic disorder after headache, with a prevalence of 5 to 10 in 1000 persons and an incidence of 50 to 120 in 100,000 per year in the United States . Abundant evidence of a genetic contribution to epilepsy in humans derives from family and twin studies shows concordance rates for epilepsy of 50 to 60% in monozygotic twins and 15% in dizygotic twins . Given its affecting a variety of mental and physical functions, this disease extracts a marked toll in terms of morbidity and economic burden and imposes tremendous burden on patients and the health system in general.
Previous studies of epilepsy have suggested the notion that genetics play a major role in the disease, largely by identifying channels and neurotransmitters important in epileptogenesis. However, the mechanism of the disease-associated variants (single nucleotide polymorphisms, SNPs and/or copy number variations) in these channels and neurotransmitters is poorly understood. With the tremendous advance of technological development, the results of Genome-Wide Association (GWA) and candidate gene studies have suggested a few potential risk variants for epilepsy and encompass two broad categories reported in predominately Caucasian populations: 1) the genes/loci discovered in association with primary epilepsy syndromes; 2) the genes suggested in association with disorders of brain development that are associated with epilepsy . However progress in extending beyond identification of disease-causing mutations and clarify disease pathophysiology mechanisms has been slow. In addition, disease susceptibility alleles individually contribute only modestly to the overall disease risk and very limited markers discovered show meaningful predictive genetic power for the disease based on conventional gene-discovery strategies (e.g., GWA, candidate gene studies). To date, 14 GWA studies, 38 meta-analyses, 339 genes in epilepsy have been reported based on Huge Navigator. In addition, a numbers of genes/loci that contribute to the genetic basis of epilepsy collectively account for only a small fraction of the observed heritability of this disease. Furthermore, little is known about the extent to which rare alleles contribute to the heritability of epilepsy and rare and potentially deleterious variants in protein coding regions may not be detected by GWA studies. Recent studies have focused on genetic factors beyond the channelopathies.
Recently, progress towards a full resolution of the genetic basis of human complex diseases is being substantially aided by development of next generation sequencing technologies, including whole exome sequencing. Using whole genome sequencing technology and parent-offspring design, the researchers at the University of Arizona discovered a de novo heterozygous missense mutation (c.5302A>G [p.Asn1768Asp]) in the voltage-gated sodium-channel gene SCN8A in a proband, a 15-year-old female with a severe epileptic encephalopathy consisting of early-onset seizures, features of autism, intellectual disability, ataxia and sudden unexplained death in epilepsy .
In order to better understand the etiology of epilepsy, a recent study in Maryland evaluated 18 genes highly suspected to be associated with major forms of primary and syndromic epilepsy in study participants. Of the 1,600 participants, 261 (16%) were carriers of the anomalous genes, while patients with infantile onset epilepsy had an even higher positive diagnostic rate of about 20% .
It was presumed from these results that 20% of all cases had a Mendelian origin. However, the question remains as to whether or not more genetic aberrancies besides the initially suspected 18 could be responsible for symptoms of epilepsy. Since only 20% of epilepsy cases can be explained with genetic research, new methods of sequencing should be employed in order to identify additional genetic causes of disease.
By evaluating the portions of DNA encoding for proteins in the human genome, a study participant can be screened for mutations much faster and cost-effectively than conventional genome sequencing (<$600 per sample vs $3,000-6,000 per sample conventionally) . Coupled with advanced sequencing analysis software techniques, rare mutations can be located within whole exomes and shed light on proteins and therefore disrupted metabolic pathways presumed to be responsible for the disease. Whole exome sequencing, which generates sequence data from hundreds of millions of short DNA fragments, promises to speed up discovery of the genetic causes of disease in both the basic research and the clinical setting . Increased speed and decreased costs of whole exome sequencing are leading to identification of novel mutations for monogenic and polygenic diseases [8-14] including epileptic encephalopathies  (for review, see ), application in clinical practice and improved medical care .
Importantly, limited research has been done in the US Latino population to date and suggests that general health information and knowledge of etiology and genetic bases of epilepsy among the Latinos have been insufficient. A previous study reported that 400,000 Latino-Americans are among the nation’s 2.7 million affected with epilepsy (15%). Many of these individuals fear epilepsy because of traditional associations between seizures and death; a majority of surveyed Latinos were unwilling to disclose that a family member has the disorder (cited from epilepsy and the Latino community, health awareness, 2006, http://www.napsnet.com/pdf_archive/60/67222.pdf). Consequently, it is poorly understood if US Latino/Hispanics and Non-Hispanic Americans who have epilepsy have different risks and/or disease-causing mutations. Therefore, we focused on patients with epilepsy from the ~80% Latino (mostly Mexican heritage) El Paso-Las Cruces metropolitan population and uncover the genetic basis for the diseases.
In order to gain more insight into the exact genetic basis at the global level in epilepsy in the US Latino population, we applied whole exome sequencing as one of Next Generation Sequencing (NGS) technologies, together with newly developed statistical and bioinformatics tools in 14 adult patients with epilepsy. The objective of the study was to apply whole exome sequencing as unbiased means to detect genetic common and rare variants and to identify disruptive de novo mutation for epilepsy in the US Latino population.
This preliminary investigation into the genetic etiology of epilepsy was conducted in 12 out of 14 patients of Latino decent (85.7%) (two others were South Pacific Islander (Filipino) and Caucasian) and all were identified in El Paso, TX.
Subjects were recruited from the outpatient clinic of the Department of Neurology at the Texas Tech University Health Sciences Center in El Paso, Texas. Detailed phenotypic information was collected at the time of enrollment and put into a database in a de-identified manner for the facilitation of genetics studies. Clinical information is updated periodically when the individual returns to the clinic for routine care and treatment. From the database information, we selected 14 related individuals with epilepsy from the US Latino population for the whole exome-sequencing study. After a careful review of the medical records, three of the initial 14 patients did not meet the diagnosis criteria for epilepsy.
Therefore, the total number of patients with epilepsy were 11 (3 males, 8 females) had met the following criteria and was selected for this study (Table 1): 1) clinical symptoms with documentation of EEG with epileptiform abnormities based on the criteria of the 1989 International League against Epilepsy syndrome definitions . The subjects exhibited different severities of generalized seizures in addition to several other comorbidities; however in the study design, we made sure to consider patients with a past medical history of tonic-clonic (grand mal) seizures and majority was Latino ancestry; 2) neurodevelopmentally normal with no other identifiable/symptomatic explanation for their epilepsy which is consistent with current International League Against Epilepsy (ILAE) terminology for idiopathic generalized epilepsy; 3) generalized seizure spectrum including any combination of the following three seizure types: generalized tonic-clonic, myoclonic and absence. Diagnoses were assigned by the trained neurologists at TTUHSC. Age at onset of illness was defined as the first distinct episode meeting the criteria for epilepsy discussed above.
||Family history of seizures
||Seizures occur in clusters without aura and with hemifacial spasm
||Seizure activity with myoclonus
||Slightly larger R temporal horn with normal hippocampus
||Partial without aura
||Cerebrocerebellar atrophy but no pathology
|| Unprovoked and abrupt loss of consciousness with collapse, Left side predominant stiffness and jerking and incontinence of urine. Sometimes can occur with fever
||Abnormal: diffuse generalized theta activity, highly suggestive of post-ictal state.
||(+) aunt with febrile seizures
||History of seizures that recently reappeared after 10 year period, Myoclonic head jerks to right side, Asymmetric pupils R4mm L2mm,
||Anticipation, auditory hallucination, does not fall or lose consciousness
||Sporadic left frontotemporal sharp waves, multiple artifacts (less reliable findings)
||Tonic-clonic seizures with urinary incontinence and tongue biting
||Occasional tonoclonic activity with urinary incontinence and postictal lethargy and confusion
||Abnormal. Paroxysmal activity consistent with primary epilepsy.
||Patient was not hospitalized for seizure but wrist trauma. History of seizure is reported by patient.
||Sporadic high amplitude R frontotemporal sharp waves, Intermittent irregular R anterior quadrant slowing
||Right mesial temporal lobe sclerosis, incidental pituitary microadenoma, chronic ischemic white matter changes.
||(+) multiple family members
||Complex partial type with aura
||Partial with aura
||Not available for her EEG in the medical chart
||Changes post right temporal craniectomy shows encephalomalacia without residual or recurrent signs of active nerocysticercosis, Small arachnoid cysts noted in right temporal fossa and Meckel’s cave bilaterally, Focal encephalomalacia within R superior temporal gyrus
||Both type of seizures occur with menses
||Mild generalized slowing
Table 1: Clinical details of 11 patients with epilepsy.
Subjects #4 and 7 are Filipino and Caucasian, the remaining 9 subjects are Hispanic population.
“Hotspots”, define as at least two studies in several patients reported the cytogenetic locations were identified in epilepsy related phenotypes.
“+” means family member(s) has/have seizure related phenotypes; “-“ means family member(s) has/have no seizure related phenotype.
This study has been carried out in compliance with the institutional review board at TTUHSC and the relevant ethics boards at the collection sites. Written informed consent was obtained from all study participants or their legal guardians.
Whole exome sequencing
The genomic DNA was extracted from blood using standard methods, as in our previous study . Whole exome sequencing was performed at the next generation sequencing (deep sequencing), Illumina MiSeq
at the Genome Core Facility at TTUHSC at El Paso. A total of 50 ng DNA from each patient was used for whole exome sequencing analysis using Illumina Nextera Rapid Capture Exome Enrichment kits (Illumina, FC-140-1001) to identify potentially pathogenic variants (either de novo
, novel or known) followed by statistical and bioinformatics analyses. Sequence data generated on-machine was aligned and variants were called on aligned reads by BWA49/GATK50 pipeline. Post alignment, the process of re-calibrating, de-duplication, and QA (Quality Assurance) were performed. Data output was in the form of Variant Call Files (VCF).
To identify potentially damaging rare variants, we first identified variants that were possibly functional (non-synonymous, frameshift, and splice site) and rare (minor allele frequency <1% in the 1000 genomes Project (http://www.1000genomes.org/
). We then computationally analyzed Single Nucleotide Variants (SNVs) and insertions or deletions (indels) using algorithms that account for biochemical, evolutionary, and structural information  using Genome Browse-SVS (Version 8.2.1) (http://www.goldenhelix.com/GenomeBrowse/; Golden Helix, Bozeman, MT). Moreover, the observed variants were evaluated for their frequency in the general population using publicly available databases such as dbSNP (http://www.ncbi.nlm.nih.gov/snp
), Exome Variant Server (EVS), Seattle, WA, (http://evs.gs. washington.edu/EVS/
), National Heart, Lung and Blood Institute (NHLBI) Exome Sequencing Project (ESP), 6500 exomes, and 1000 Genomes project (http://www.1000genomes.org/data). Then we analyzed variants for their possible effects on protein function, which was performed within the SVS program by accessing data from dbNSFP, which provides scores from multiple functional prediction programs (SIFT, Polyphen2, Mutation Taster, Mutation Assessor and FATHMM) as well as two conservation scores (GERP++ and PhyloP) .
Pathway enrichment analysis
To identify the functions of the epilepsy-associated genes, we performed the pathway enrichment analysis on the identified genes using R package SubpathwayMiner (version 3.0), which is a software package for flexible identification of pathways . The metabolic pathways and regulatory pathways were obtained by using getMetabolicGEGEUEMGraph and getNonMetabolicGEGEUEMGraph. For each pathway, we calculated the significance of the overlap between the epilepsy-associated genes and genes annotated using hypergeometric distribution test as follows:
is the total number of genes in the whole genome, N is the number of epilepsy-associated genes, M is the number of genes annotated in the pathway and M is the number of epilepsy-associated genes in the pathway. If p value was < 0.05, the pathway was identified as a significantly enriched pathway.
A subnetwork of epilepsy-associated genes
To identify the associations among epilepsy-associated genes, we mapped them into Mentha, an integrated protein interaction network, a resource for browsing integrated protein-interaction networks . Next, we identified the genes which link at least one pair of epilepsy-associated genes in protein interaction network. Finally, a subnet was constructed by extracting the linking genes and epilepsy-associated genes, together with the interactions between them.
From the initial import, the variants identified by whole exome sequencing not shared between study participants were removed, thus revealing 47,392 variants in common, within coding regions of DNA (exons) or near splice site boundaries. The resultant exon-enriched DNA libraries were sequenced using an Illumina MiSeq next generation sequencer to a median coverage of 126×.
Of these original 47,392 variants, four database probe tracks (SIFT, OMIM, NHLBIESP6500SI-V2 Exomes Variant Frequencies, and dbNSFPNS functional predictions 2.4., GHI) were used to remove common variant mutations that were unlikely to attribute to loss of function mutations or gain of aberrant function.
Data probe filtering yielded 18 non-synonymous rare variants within the study population in eight out of the 11 adult patients with epilepsy. All eight patients carried rare variants and these variants were confined to 12 different chromosomes and are displayed in table 2. There was a high quality of the current whole exome sequencing and an average read depth was 126×. Three (patient # 8, 11 and 12) out of 11 patients carried two and/or more mutations. Subject #12 carried three gene mutations, two (MFGE8 and CREBBP) of which are predicated damaging mutations by five functional prediction programs. Subject #8 carried seven gene mutations, three of which showed combined damage score > 3.5 (Table 2) for GPRC5D, SPG20 and NPC1 genes and two of which showed combined damage score >2.5 for genes of POTEH and PCSK1. The biological functions of those five genes include lipid metabolic process, cellular process, transport and regulations of transcription (Table 2).
||Entrez Gene ID
||Putative Biological Function
||Chr. Loc a.
||nervous system development
||lipid metabolic process
|| regulation of transcription
||catabolic process, proteolysis
||DNA replication, cell cycle
||regulation of transcription
||cellular defense response
|| regulation of transcription
||nucleic acid transport
||alternative splicing, disease mutation
||Structural Molecule Activity
Table 2: Possible pathogenic mutations identified in adult patients with epilepsy.
aChr. Loc. Chromosome locations for each of the gene mutation.
bCombined Function Score (CFS): the scores are based on five prediction function programs (SIFT, Polyphen2, Mutation Taster, Mutation Assessor, and FATHMM), e.g., 4.5 means damaging or disease causing from four programs, one of five shows probably damaging. For detail of prediction function from five programs, please see Supplementary table S1.
cSingle Nucleotide Variation (SNV).
e“hotspots”, define as at least two studies in several patients reported the cytogenetic locations were identified in epilepsy related phenotypes.
In addition, three out of the 11 patients did not carry any of the 18 mutations and they were subjects #4, 7 and 9. Clinically, there were no specific similarities or differences among the patients who harbored less pathogenic mutations as compared the rest of 8 patients who carried more pathogenic mutations. However, comorbidity of depression, anxiety or PTSD was less common in the patients (33%, #4, 7 and 9) with less pathogenic mutations as compared with patients (55%, the rest of 8 patients) who harbored more pathogenic mutations.
These 18 mutations were not found in publicly available databases, including 6503 individuals from the National Heart, Lung and Blood Institute Exome Sequencing Project, dbSNP (http://www.ncbi.nlm.nih.gov/snp), Exome Variant Server (EVS), Seattle, WA, (http://evs.gs. washington.edu/EVS/), 6500 exomes and 1000 Genomes project (http://www.1000genomes.org/data).
The “hotspots” were defined as chromosomal deletions, insertion or signals derived from linkage studies reported in several patients with epilepsy or epilepsy related traits in at least two studies using key words of cytogenetic location of newly identified epilepsy gene mutations (e.g., 6p21.3) and epilepsy in the PubMed Search (Updated on May 1, 2016). The current identified pathogenic mutations in the epilepsy patients were located at these “hotspots”, five chromosomal regions, which were given priority because they contained breakpoints precisely defined by molecular cytogenetics: 6p21.3, 15q26.1, 19q13.42, 2q31.1 and 16p13.3 observed in epilepsy related phenotypes.
Pathway analysis of the 18 proteins
By performing the pathway enrichment analysis, we found that 18 epilepsy-associated genes were significantly involved in a number of important biological pathways such as “Notch signaling pathway” and “Cell cycle” (P≤0.05, figure 1A). The results from the previous studies suggest loss of functional Kir channels associated with re-entry of cells into the cell cycle and gliosis. A loss of functional Kir channels has been shown in temporal lobe epilepsy .
We constructed a subnetwork by mapping the 18 epilepsy-associated genes into the menthe protein interaction network, and found that eight of 18 epilepsy-associated genes showed indirect interactions between each other which were mediated by the important genes such as CREBBP (Figure 1B).
Figure 1: Pathway enrichment analysis.
1A: 18 epilepsy-associated genes were significantly involved in a number of important biological pathways such as “Notch signaling pathway” and “Cell cycle” at P≤0.05. 1B shows CREBBP pathway and its relevant proteins. The proteins indicated in green have been pathogenetically associated with epilepsy. Yellow nodes are background pathways or other pathways.
Epilepsy, defined by the presence of recurrent seizures, is associated with abnormalities in cognition, psychiatric status, and social-adaptive behaviors that are now referred to as neurobehavioral comorbidities. Given the increasing evidence of disease risk or causing genetic variants for epilepsy in non-Hispanic population, we hypothesized that the same and/or additional pathogenic mutations will be identified using a cutting edge technology, whole exome sequencing. With a stringent quality control and high coverage (126x) of this whole exome sequencing, we identified a total of 18 rare, heterozygous, predicted pathogenic variants that were present in at least one of the eight patients from a total of 11 patients, 9 of whom were of Latino decent.
The major finding of this study is that a number of potential disease causing mutations were identified. The current results provide pilot evidence that supports the CREBBP gene and Notch signaling pathway and cell cycle might be involved in pathophysiology of epilepsy. A recent study using an acute animal model also demonstrated a correlation between aberrant Notch signaling and epileptic seizures . Furthermore, pathway analysis based on epilepsy associated SNPs identified by genome wide studies supports the cell cycle  in the disease pathophysiology. However, additional studies are warranted to examine the underlying mechanisms of these disease-causing mutations in epilepsy and confirm the findings in a large cohort in this unique population. As far as we know, this is the first report of pathogenic mutations identified for patients with epilepsy in the Latino population using whole exome sequencing (based on the PubMed Search on June 13, 2016).
Among 11 patients, patient #8 carrying six mutations had generalized epilepsy with occasional tonoclonic activity with urinary incontinence and postictal lethargy and confusion. EEG showed paroxysmal activity consistent with primary epilepsy and patient #12 carrying three mutations had a simple partial seizure with aura and MRI demonstrating encephalomalacia.
A number of newly identified disease mutations were also located in the “hotspots” previously reported in epilepsy related phenotypes, such as 6p21.3 microdeletions, where a Leu308Phe mutation in the CYP21A2 gene identified in the current study, were observed in more than four patients in different studies, including a most recent report . The main feature of 6p21.3 deletion occurs in patients with developmental delay with severe speech impairment, seizures and behavioral abnormalities. The structure changes at 15q26.1 (the location of MEFE8 gene mutation observed in the current study) have been identified in a number of patients with epilepsy related phenotypes. A genome wide linkage meta-analysis mapped 19q13.42, (the location of the NLRP2-Glu292Asp mutation in the current study) to patients with genetic generalized epilepsy in 379 families . Five studies reported 2q31.1 deletions in patients with seizure related phenotypes including a migrating partial seizure of infancy  and two patients with develop delay and seizure . More than 30 children with 2q interstitial deletion have been reported (for review, please see ). Moreover, a recent study identified copy number variations in 16p13 in 60 patients with a combination of intellectual disability and genetic generalized epilepsy . One of two studies [32,33] with positive linkage signals in lager families with epilepsy phenotypes also identified TBC1D24 mutation located at 16p13.3 in a family with epilepsy .
Furthermore, the KEGG pathway enrichment analysis (p<0.05) was performed for the 18 candidate proteins to illustrate the relationships between disease pathways of epilepsy and other pathways (Figure 1). The result from this analysis was shown that CREBBP protein was related to seven other proteins where mutations identified in patients with epilepsy.
We are aware of a number of limitations in this study. 1) We need to confirm these pathogenic mutations using a different approach which is our ongoing study: validating these 18 gene mutations using standard Sanger sequencing approach; 2) other variants, such as copy number variation, recent discoveries of putatively-causal structural abnormalities in epilepsy and epilepsy related traits, may not be captured using whole exome sequencing and 3) whole exome sequencing does not assess the impact of non-coding genome regions and whole genome sequencing is considered to the most comprehensive genetic test, although this may be hampered by challenges in data analysis and cost . Therefore, future copy number variation analysis, whole genome sequencing and/or target gene sequencing will provide an opportunity for more in-depth molecular profiling of fundamental biological processes of the variants identified in the current study.
These first discoveries of functional genetic mutations using whole exome sequencing techniques provide insight into the susceptibility of the US Latino population with regard to epilepsy. These mutations represent important candidates for further investigation into the pathogenesis of epilepsy and may reveal potential drug targets for eventual therapy. Future studies will focus on how these functional mutations may influence the risk of epilepsy and confirm the findings in a large cohort and/or family study design.
We are grateful to all the families who participated in the study and many dedicated neurologists at TTUHSC and local hospitals at El Paso for help in patient ascertainment, particularly Dr. Richard D. Brower, a neurologist at TTUHSC_ElPaso for his patient ascertainment and helpful discussion.
Our thanks to Dr. Michael Escamilla, a professor at the TTUHSC-El Paso for allowing us to use his sample collection.
This study was supported, in part, by the TTUHSC Seed grant (PI, Dr. Xu) and TTUHSC SARP Mini grant (PI, Dr. Xu).