Allopolyploidization is an important event in plants, since it enhances heterosis and wide environmental adaptations. Common wheat, Triticum aestivum (AABBDD), arose through hybridization between T. turgidum (AABB) and Aegilops tauschii (DD) and subsequent whole genome duplication. To identify homoeologous genes expressed from the three distinct genomes of common wheat, we comprehensively surveyed available Expressed Sequence Tags (ESTs), based on the proofed 26,241 full-length cDNA data. In total, 76,568 homoeologous genes were classified. These homoeologous genes were grouped into the 36,389 gene clusters, and assigned to each chromosome and/or chromosome arm of common wheat. Transcript specific homoeologous genes could be identified. In addition to protein coding genes, non-coding genes were located on chromosomes and/or chromosome arms. About half of the homoeologous genes acted as single copy genes, showing diploidization of these genes. Preferential gene expression from the B genome was found not only in single copy genes, but also in genes with multiple copies. Wheat specific genes were mostly in single copies, and expressed more from the B genome than the other genomes. GO classification showed that expressed genes have typical functions that characterize hexaploid wheat. This reference set of expressed genes in common wheat should be an indispensable genome resource.
Widespread occurrence of polyploidization (whole genome duplication) in plants provides evidence that it has advantages in development, adaptation and diversification [1-3]. Allopolyploidy resulting from interspecific or intergeneric hybridization and multiplication of more than two sets of genomes provides evolutionary advantages through speciation and environmental adaptation of higher plants, including many important crops [2,4-6]. At certain times, whole genome duplication has led to paleopolyploidy, showing structural genetic diploidization and subgenome fractionation (selective loss and retention of protein coding genes and non-coding RNA genes) leading to balance at the steady state of intergenomic orchestration [7-9]. These processes leading to allopolyploidization should bring about a broad range of genetic and epigenetic responses such as chromosome deletions, rearrangements, transpositions and epigenetic modifications [4,10- 20].
Common wheat, Triticum aestivum (2n = 6x = 42, genome formula AABBDD), formed through two additive allopolyploidizations. About 0.5 million years ago, the first allopolyploidization occurred by hybridization between the wild relatives Aegilops speltoides (2n = 2x = 14, SS?BB) and T. urartu (2n = 2x = 14, AA). Common wheat was spontaneously produced about 10,000 years ago from the second allopolyploidization between the early-cultivated allotetraploid T. turgidum ssp. dicoccum (2n = 4x = 28, AABB) and wild goat grass, Ae.tauschii ssp. strangulata (2n = 2x = 14, DD) followed by chromosome doubling of unreduced gametes [21-24]. Common wheat has been widely cultivated across the world, since it reveals more features of heterosis, such as growth vigor, environmental adaptability, and disease resistance than tetraploids [25]. Since there was a time lag between the two allopolyploidization events of common wheat, it should provide a model system to study genetic interactions among three genomes.
Orchestration of allopolyploid genomes after whole genome duplication leads to genome fractionation (unequal gene loss) as well as neo- and subfunctionalization of duplicated genes due to alternative nucleotide substitution rates. In addition to the biased fractionation of polyploid genomes, genes located on dominant genome regions have a tendency toward higher expression [26-28]. Actually, genomic asymmetry due to the non-random retention of controlling genes favoring one genome over others is manifested in allopolyploid wheat by the control of various genetic traits and syntenic genes [8,29]. Whole genome shotgun sequencing of Chinese Spring wheat showed that allohexaploid wheat lost 10,000 to 16,000 genes during the course of allohexaploidization [30]. Furthermore, reported accelerated alteration of homoeologous genes, such as nucleotide mutations and alternative splicing [31]. These structural changes of the common wheat genome are likely to occur during allotetraploidization, mainly because of the duration of the allopolyploid [32,33]. However, precise genome-wide data are required to show which homoeologous genes are expressed among the three genomes of common wheat for better understanding of gene regulation in allopolyploid. Hence, the present study is aimed to clarify transcriptome of homoeologous genes in common wheat, based on the full-length cDNA clones.
Here, we took advantage of the Full-Length (FL) cDNA sequence data of common wheat to complete reference set of its expressed homoeologous genes. We used all Expressed Sequence Tags (ESTs) of common wheat that had been cloned from cDNAs containing a poly(A)+ tail, and sequenced from both ends of the inserts. The full-length cDNAs which Cover The Coding Sequences (CDSs) or non-coding RNAs were proofed from these ESTs, including the CAP-trapped cDNAs, were classified into homoeologous genes expressed from the A, B and D genomes, and these homoeologous genes were grouped into gene clusters corresponding to those of the diploid [34]. Chromosome locations of these homoeologous genes were determined to show the subgenome fractionation of expressed genes [35].
Completion of Full-Length (FL) cDNA data of common wheat
Figure 3: Number of wheat gene clusters homologous to grass genes.
Figure 4: Gene ontology analysis of wheat specific genes.
Gene ontology of gene clusters (? 2x_WhSp) was compared to that of barley (?Hv), rice (? Os), Brachypodium (? Bd), and Sorghum (? Sb). GO terms were categorized into three groups. Significant differences (χ2-test) are shown as * at the 5% level), and **at the 1% level). Subcategories are shown underneath the grouped GO terms.
|
Protein coding genes |
|
Non-coding genes |
|
|||
|
Homology with the grass genes |
Wheat specific gene |
Subtotal |
miRNA* |
ncRNA** |
Not identified |
Subtotal |
Assigned to the IWGSC genome sequence |
22,792 |
7,477 |
30,269 |
204 |
12 |
927 |
1,143 |
(61,452) |
(8,497) |
(69,949) |
|||||
Not assigned to the IWGSC genome sequence |
1,956 |
200 |
2,156 |
585 |
18 |
2,218 |
2,821 |
(2,441) |
(214) |
(2,655) |
|||||
Total |
24,748 |
7,677 |
32,425 |
789 |
30 |
3,145 |
3,964 |
(63,893) |
(8,711) |
(72,604) |
Figure 5: Chromosome assignment of wheat homoeogenes.
(A) Total of 71,092 homoeogenes was assigned to each chromosome and/or chromosome arm.
(B) The number of located homoeogenes correlated positively with the DNA content of chromosomes except for 2DL and 5BL.
Figure 6: Chromosome assignment of characteristic genes.
Chromosome distribution of wheat specific (?) and non-coding RNA (?) homoeogenes were compared to the distribution of allhomoeogenes (?).
Expressed genomes |
No. expressed genes |
|||||||
|
1 |
2 |
3 |
4-6 |
7-12 |
>13 |
Total |
|
A |
4199 |
349 |
43 |
13 |
1 |
0 |
4605 |
|
B |
5236 |
410 |
85 |
35 |
2 |
0 |
5768 |
|
D |
4268 |
290 |
49 |
6 |
0 |
0 |
4613 |
|
Subtotal |
13703 |
1049 |
177 |
54 |
3 |
0 |
14986 |
|
(49.5) |
||||||||
A + B |
- |
1506 |
488 |
236 |
29 |
1 |
2260 |
|
A + D |
- |
1482 |
508 |
248 |
21 |
0 |
2259 |
|
B + D |
- |
1515 |
503 |
236 |
18 |
0 |
2272 |
|
Subtotal |
|
4503 |
1499 |
720 |
68 |
1 |
6791 |
|
(22.4) |
||||||||
A+B+D |
- |
- |
3951 |
3824 |
626 |
91 |
8492 |
|
(28.1) |
||||||||
Total |
13703 |
5552 |
5627 |
4598 |
697 |
92 |
30269 |
|
(45.3) |
(18.3) |
(18.6) |
(15.2) |
(2.3) |
(0.3) |
Reference set of transcripts is indispensable clues for gene prediction. Hence, we have completely surveyed expressed genes from various wheat tissues of common wheat grown under ordinary conditions and in biotic- and abiotic-stressed conditions, including CAP-trapped cDNAs (FL cDNAs;) [34]. Although collections of FL cDNAs are recognized as significant genetic resources, full-set surveys of FL cDNAs expressed from each genome of common wheat (allohexaploid: AABBDD) are not readily available. Therefore, we completed the sequencing of an additional 4,886 CAP-trapped FL cDNAs, so that 21,693 sequences for Chinese Spring wheat are now available. In addition to these CAP-trapped FL cDNAs, the inserts of 4,548 independent cDNA clones which cover the protein coding regions had been determined. Finally, the nucleotide sequences of the 26,241 FL-cDNA clones are available. This number is equivalent to those of Arabidopsis annotated from the genome (TAIR 10 https://www.arabidopsis.org/), suggesting that almost all expressed wheat genes containing poly-(A)+ tail can be captured with the cDNA clones (Figure S1) [40-43]. This is indispensable genome resource to predict the expressed genes in wheat.
Based on these wheat FL cDNAs, all of the available wheat ESTs, including one-path sequences of CAP-trapped wheat cDNAs, were clustered. Finally, 76,568 homoeologously expressed genes (homoeologous genes) were identified (Figure 1). These classified expressed genes were clone-based and relatively abundant. The homoeologous genes were grouped to estimate the gene members of common wheat, designated as 36,389 gene clusters (Figure 1), of which 32,425 were protein coding genes (Table 1). This estimated gene number is equivalent to the gene number predicted from the genome sequences of diploid tetraploid, and hexaploid wheats [30,35,44-46]. Overall GO analysis of these homoeologous genes exhibited GO terms that were similar to grouped gene clusters of diploids and other cereal genes (Figure 2), suggesting that the list here of cDNA clones could survey almost all expressed genes in common wheat.
The A and B genomes of common wheat have a long history of co-existence, ca. 0.5 million years, before pollination with Ae. tauschii (DD) and genome-wide duplication about 10,000 years ago, giving rise to the allohexaploid [22,23,47]. Accumulation of genomics data in cereals enables characterization of the features of expression of the allohexaploid wheat genes located on the three distinct genomes [48-50]. Thus, orchestration of expressed homoeologous genes in natural hexaploid wheat at a steady level should be clarified. In this study, the number of expressed homoeologous genes in each gene cluster was estimated. About half of expressed genes in common wheat were expressed only from one of three genomes. While, a quarter of expressed genes used two genomes, and remaining a quarter were expressed from all three genomes (Table 2). On the other hand, almost all (ca. 95 %) wheat specific genes were transcribed from one genome (Research Data SF1A), suggesting characteristic feature of wheat specific genes. Preferential gene expression from the B genome was found for both single copy genes and multigene families (Table 2 and Research Data SF1B). This expression preference was also found in wheat specific genes (Table 1). Furthermore, significantly fewer wheat specific homoeologous genes were expressed as single copy genes by the D genome, while the number of wheat specific single copy genes assigned to the A and B genomes were not significantly different (Table 2 and Research Data SF1A and SF1B). These lines of evidence suggest both more negative regulation of the D genome for wheat specific genes and maternal effects on expression of homoeologous genes [33,51]. The observation that certain chromosome arms most of which were of the B genome, harbored more expressed homoeologous genes than expected (Figure 5 and Figure S2) suggests that gene regulation system (s) might operate on specific chromosome regions. Preferential transcription of genes from one progenitor genome has been reported in cotton, Arabidopsis and maize as well as in wheat [17,26,52-55].
GO analysis revealed that single copy genes of common wheat play characteristic roles distinct from other categories of genes such as signal transduction and stress responses (Figure S3). In addition to the GO categories of single copy genes found in common among the three genomes, single copy genes of the B genome fell into further categories (Figure S3B). Categories of the genes expressed from two of the three genomes, and those expressed from each of the three genomes were concerned with basic metabolism (Figure S4). Moreover, multigenes expressed from all three genomes, among which genes of the B genome exhibited preferential expression, showed characteristic functional categories such as stress responses in addition to metabolic processes (Figure S5). These data suggest functional partitioning of respective homoeologous genes. Genetic alterations and epigenetic regulation are known to play roles in gene expression of polyploids [56]. Although substantial DNA loss especially from the A and B genomes, has been reported in common wheat, genetic alterations alone of allohexaploid wheat are unable to explain the observed expression profiles of homoeologous genes: the number of expressed homoeologous genes was similar for the A and D genomes, while more were found in homoeologous genes from the B genome (Tables 1 and 2) [8,12,30,57,58]. This suggests that the epigenetic regulation operating on the genes in each genome is substantial [59-62]. In fact, silencing of homoeologs through altered DNA methylation and repression of counterpart homoeologous genes with miRNAs and siRNAs plays important roles in control of expression of target genes [51,63,64].
Final collection of CAP-trapped cDNA sequences
The work was supported in part by a grant from the Ministry of Agriculture, Forestry and Fisheries of Japan (Genomics for Agricultural Innovation, KGS1002). Most part of DNA sequencing had been carried out in RIKEN Center for Life Science Technologies, Division of Genomic Technologies and Genome Network Analysis Support Facility. Bioinformatic work was partly conducted on the supercomputer system, National Institute of Genetics (NIG), Research Organization of Information and Systems (ROIS).
KM collected cDNA clones, designed images, carried out computational analyses and participated in manuscript writing. KK designed images and performed DNA sequencing of cDNA clones. YK performed computational analyses and developed DATA base. KM contributed to construct full-length cDNA libraries, sequence the full-length cDNA clones, and construct full-length cDNA data base of common wheat. HT, MT NS, JK developed new sequencing strategy of cDNA clones, carried out sequencing and computational analyses. YN and KN contributed in computational data analyses and construction of data base of wheat transcriptome. YO and JK designed the research, and wrote manuscript. All authors read and approve the final manuscript.
Citation: Mishina K, Kawaura K, Kamiya Y, Kajita Y, Mochida K, et al. (2018) Transcriptome of Homoeologous Genes Deduced from the Full-Length cDNA Clones of Common Wheat, Triticum aestivum L. J Genet Genomic Sci 3: 001.
Copyright: © 2018 Yasunari Ogihara, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.