Genome sequence of the cultivated cotton Gossypium arboreum Decoded
Recently, the genome sequence of the cultivated cotton Gossypium arboreum (AA) was decoded after the successfully sequencing of Gossypium raimondii (DD) in 2012. This work was mainly accomplished by Institute of Cotton Research (ICR) of Chinese Academy of Agricultural Sciences (CAAS). The study was published online in Nature Genetics on May 18, 2014.
Institute of Cotton Research of the Chinese Academy of Agricultural Sciences (CAAS) took initiated of Cotton Genome Project (CGP) in December 2007, along with collaborating partner USDA-ARS. The genome sequencing of these two diploid species of tetraploid cottons laid an important foundation for the genome sequencing, assembly and evolutional analysis of tetraploid cottons.
In this study, a highly homozygous cultivar of G. arboreum, shixiya 1, was sequenced. A total of 1 193.6 Gb of clean sequence covering the genome by 112.6-fold was obtained by paired-end sequencing. 90.4% of the assembly were anchored and oriented on 13 pseudochromosomes. It was found that 68.5% of the genome was occupied by repetitive DNA sequences and 41,330 protein-coding genes were predicted in G. arboreum.
Futhermore, molecular phylogenetic analyses suggested a divergence time for G. arboreum and G. raimondii of about 5 (2-13) million years ago. Two whole-genome duplications were shared by G. arboreum and G. raimondii before speciation. Insertions of long terminal repeats in the past 5 million years are responsible for the twofold difference in the sizes of these genomes. These results would facilitate the understanding of the complexity of cotton genome and genetic diversity of cotton genus.
To compare the G. arboreum genome with the G. raimondii and T. cacao genome sequences identified differences in the expression patterns of NBS domain–encoding genes. It showed that genes related disease resistance in G. raimondii were significantly expanded comparing to T. cacao, whereas the number of G. arboreum was similar to T. cacao. This might be the main reason of great difference between the two cottons on resistance of Verticillium wilt. Tandem duplications seemed to have a significant role in the expansion of the NBS-encoding gene family in G. raimondii after its divergence from G. arboreum ~5 million years ago, and segmental loss contributed to its contraction in G. arboreum.
Ethylene is an important signaling molecule that promotes cotton fiber elongation in cotton. Dot plots of promoter regions showed that a deletion of ~130 bp beginning at −470 bp relative to the transcription start site of GaACO1 resulted in loss of a putative MYB-binding site. Very high levels of ACO transcripts in G. raimondii ovules in conjunction with an ethylene burst might force an early fiber senescence phenotype, whereas the inactivation of ACO gene transcription in G. arboreum ovules might be responsibl