Documents

18 pages
6 views

Genome-wide comparative analysis of the Brassica rapa gene space reveals genome shrinkage and differential loss of duplicated genes after whole genome triplication

Please download to get full document.

View again

of 18
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Share
Description
Genome-wide comparative analysis of the Brassica rapa gene space reveals genome shrinkage and differential loss of duplicated genes after whole genome triplication
Transcript
  Genome Biology   2009, 10: R111 Open Access 2009Munet al. Volume 10, Issue 10, Article R111 Research Genome-wide comparative analysis of the Brassica rapa gene space reveals genome shrinkage and differential loss of duplicated genes after whole genome triplication Jeong-HwanMun * , Soo-JinKwon * , Tae-JinYang † , Young-JooSeol * , MinaJin * , Jin-AKim * , Myung-HoLim * , Jung SunKim * , SeunghoonBaek  * , Beom-SoonChoi ‡ , Hee-JuYu § , Dae-SooKim ¶ , NamshinKim ¶ , Ki-ByungLim  ¥  , Soo-InLee * , Jang-HoHahn * , Yong PyoLim # , IanBancroft **  and Beom-SeokPark  *  Addresses: * Department of Agricultural Biotechnology, National Academy of Agricultural Science, Rural Development Administration, 150 Suin-ro, Gwonseon-gu, Suwon 441-707, Korea. † Department of Plant Science College of Agriculture and Life Sciences, Seoul National University, San 56-1, Sillim-dong, Gwanak-gu, Seoul 151-921, Korea. ‡ National Instrumentation Center for Environmental Management, College of Agriculture and Life Sciences, Seoul National University, San 56-1, Sillim-dong, Gwanak-gu, Seoul 151-921, Korea. §  Vegetable Research Division, National Institute of Horticultural and Herbal Science, Rural Development Administration, Tap-dong 540-41, Gwonseon-gu, Suwon 441-440, Korea. ¶ Korea Research Institute of Bioscience and Biotechnology, 111 Gwahangno, Yuseong-gu, Daejeon 305-806, Korea.  ¥  School of Applied Biosciences, College of Agriculture and Life Sciences, Kyungpook National University, Daegu 702-701, Korea. # Department of Horticulture, Chungnam National University, 220 Kung-dong, Yusong-gu, Daejon 305-764, Korea. ** John Innes Centre, Norwich Research Centre, Colney, Norwich NR4 7UH, UK. Correspondence: Beom-SeokPark. Email: pbeom@rda.go.kr © 2009 Mun  et al  .; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the srcinal work is properly cited Brassica rapa genome<p>Euchromatic regions of the Brassica rapa genome were sequenced and mapped onto the corresponding regions in the Arabidopsis thal-iana genome.</p> Abstract Background: Brassica rapa is one of the most economically important vegetable crops worldwide.Owing to its agronomic importance and phylogenetic position, B. rapa provides a crucial referenceto understand polyploidy-related crop genome evolution. The high degree of sequence identity andremarkably conserved genome structure between  Arabidopsis and Brassica genomes enablescomparative tiling sequencing using  Arabidopsis sequences as references to select the counterpartregions in B. rapa , which is a strong challenge of structural and comparative crop genomics. Results: We assembled 65.8 megabase-pairs of non-redundant euchromatic sequence of B. rapa and compared this sequence to the  Arabidopsis genome to investigate chromosomal relationships,macrosynteny blocks, and microsynteny within blocks. The triplicated B. rapa genome contains onlyapproximately twice the number of genes as in  Arabidopsis because of genome shrinkage. Genomecomparisons suggest that B. rapa has a distinct organization of ancestral genome blocks as a resultof recent whole genome triplication followed by a unique diploidization process. A lack of the mostrecent whole genome duplication (3R) event in the B. rapa genome, atypical of other Brassica genomes, may account for the emergence of B. rapa from the Brassica progenitor around 8 millionyears ago. Published: 12 October 2009 Genome Biology   2009, 10: R111(doi:10.1186/gb-2009-10-10-r111)Received: 18 May 2009Revised: 9 August 2009Accepted: 12 October 2009The electronic version of this article is the complete one and can be found online at http://genomebiology.com/2009/10/10/R111  http://genomebiology.com/2009/10/10/R111 Genome Biology 2009, Volume 10, Issue 10, Article R111 Mun et al. R111.2 Genome Biology   2009, 10: R111 Conclusions: This work demonstrates the potential of using comparative tiling sequencing forgenome analysis of crop species. Based on a comparative analysis of the B. rapa sequences and the  Arabidopsis genome, it appears that polyploidy and chromosomal diploidization are ongoingprocesses that collectively stabilize the B. rapa genome and facilitate its evolution. Background Flowering plants (angiosperms) have evolved in genome sizesince their sudden appearance in the fossil records of the lateJurassic/early Cretaceous period [1-4]. The genome expan- sion seen in angiosperms is mainly attributable to occasionalpolyploidy. Estimation of polyploidy levels in angiospermsindicates that the genomes of most (>90%) extantangiosperms, including many crops and all the plant modelspecies sequenced thus far, have experienced one or moreepisodes of genome doubling at some point in their evolution-ary history [5,6]. The accumulation of transposable elements (TEs) has been another prevalent factor in plant genomeexpansion. Recent studies on maize, rice, legumes, and cottonhave demonstrated that the genome sizes of these crop spe-cies have increased significantly due to the accumulationand/or retention of TEs (mainly long terminal repeat retro-transposons (LTRs)) over the past few million years; the per-centage of the genome made up of transposons is estimated to be between 35% and 52% based on sequenced genomes [7-12]. However, genome expansion is not a one-way process inplant genome evolution. Functional diversification or sto-chastic deletion of redundant genes by accumulation of muta-tions in polyploid genomes and removal of LTRs viaillegitimate or intra-strand recombination can result indownsizing of the genome [13-15]. Nevertheless, neither of  the aforementioned mechanisms has been demonstrated tooccur frequently enough to balance genome size growth, andplant genomes tend, therefore, to expand over time.The progress in whole genome sequencing of model genomespresents an important challenge in plant genomics: to apply the knowledge gained from the study of model genomes to biological and agronomical questions of importance in cropspecies. Comparative structural genomics is a well-estab-lished strategy in applied agriculture in several plant families.However, comparative analyses of modern angiospermgenomes, which have experienced multiple rounds of poly-ploidy followed by differential loss of redundant sequences,genome recombination, or invasion of LTRs, are character-ized by interrupted synteny with only partial gene orthology even between closely related species, such as cereals [16], leg-umes [17,18], and  Brassica species [19,20]. Furthermore, functional divergence of duplicated genes limits interpreta-tion of function based on orthology, which complicatesknowledge transfer from model to crop plants. Thus, betterdelimitation of comparative genome arrangements reflectingevolutionary history will allow information obtained fromfully sequenced model genomes to be used to target syntenicregions of interest and to infer parallel or convergent evolu-tion of homologs important to biological and agronomicalquestions in closely related crop genomes.The mustard family (Brassicaceae or Cruciferae), the fifthlargest monophyletic angiosperm family, consists of 338 gen-era and approximately 3,700 species in 25 tribes [21], and isfundamentally important to agriculture and the environment,accounting for approximately 10% of the world's vegetablecrop produce and serving as a major source of edible oil and biofuel [22]. Brassicaceae includes two important model sys-tems:  Arabidopsis thaliana (  At  ), the most scientifically important plant model system for which complete genomesequence information is available, and the closely related,agriculturally important  Brassica complex -  B. rapa (  Br , A genome),  B. nigra (  Bn , B genome),  B. oleracea (  Bo , Cgenome), and their three allopolyploids,  B. napus (  Bna , ACgenome),  B. juncea (  Bj  , AB genome), and  B. carinata (  Bc , BCgenome). Syntenic relationships and polyploidy history inthese two model systems have been investigated, althoughdetails about macro- and microsyntenic relationships between  At and  Brassica are limited and fragmented. Previ-ous studies demonstrated broad-range chromosome corre-spondence between the  At and  Brassica genomes [23,24], and a few studies have demonstrated specific cases of conser- vation of gene content and order with frequent disruption by interspersed gene loss and genome recombination [19,20].  Although this issue is contentious, there is evidence thatBrassicaceae genomes have undergone three rounds of wholegenome duplication (WGD; hereafter referred to as 1R, 2R,and 3R, which are equivalent to the γ , β , and α  duplicationevents) [5,25,26]. One profound finding from comparative analyses is the triplicate nature of the  Brassica genome, indi-cating the occurrence of a whole genome triplication event(WGT, 4R) soon after divergence from the  At lineage approx-imately 17 to 20 million years ago (MYA) [19,20,26]. This result strongly suggests that comparative genomic analysesusing single gene-specific amplicons or those based on smallscale synteny comparisons will fail to identify all relatedgenome segments, and thus not be able to provide accurateindications of orthology between the  At and  Brassica genomes. However, obtaining sufficient sequence informa-tion from  Brassica genomes to identify genome-wide orthol-ogous relationships between the  At and  Brassica genomes isa major challenge.  Br  was recently chosen as a model species representing the  Brassica 'A' genome for genome sequencing [27,28]. This species was selected because it has already proved a usefulmodel for studying polyploidy and because it has a relatively   http://genomebiology.com/2009/10/10/R111 Genome Biology 2009, Volume 10, Issue 10, Article R111 Mun et al. R111.3 Genome Biology   2009, 10: R111 small (approximately 529 megabase-pair (Mbp)) but compactgenome with genes concentrated in euchromatic spaces.However, widespread repetitive sequences in the  Br genomehinder direct application of whole genome shotgun sequenc-ing. Instead, targeted sequencing of specific regions of the  Br genome could be informed by the reference  At genome by selecting genomic clones based on sequence similarity; thisapproach is referred to as comparative tiling [29]. Here, wereport sequencing of large-scale regions of the  Br euchro-matic genome, covering almost all of the  At euchromaticregions, obtained using the comparative tiling method. Weperformed a genome-wide sequence comparison of  Br and  At  and analyzed the number of substitutions per synonymoussite (Ks) between the two genomes and among related  Brassica sequences to identify syntenic relationships and tofurther refine our understanding of the evolution of poly-ploidy. We also investigated genome microstructure conser- vation between the two genomes. In this study, we provide afoundation to reconstruct both the ancestral genome of the  Brassica progenitor and the evolutionary history of the  Brassica lineage, which we anticipate will provide a robustmodel for  Brassica genomic studies and facilitate the investi-gation of the genome evolution of domesticated crop species. Results Generation of Br euchromatic sequence contigs and genome coverage Bacterial artificial chromosome (BAC) sequence assembly generated 410  Br sequence contigs (sequences composed of more than one BAC sequence) covering 65.8 Mbp (Tables S1and S2 in Additional data file 1). These sequence contigs span75.3 Mbp of the  At genome, representing 92.2% of the total  At  euchromatic region (Figure 1 and Table 1). A total of 43.9 Mbp remain as uncovered gaps: among these, 6.4 Mbp are attrib-utable to euchromatin gaps, and the remaining 37.5 Mbp topericentromeric heterochromatin gaps.The genome coverage of the gene-rich  Br sequences was esti-mated by representation in two different datasets: expressedsequence tag (EST) sequences and conserved single-copy genes. Based on a BLAT analysis of 32,395  Br unigenes (a setof ESTs that appear to arise from the same transcriptionlocus) against the sequence contigs, the proportion of hitsrecovered under stringent conditions (see Materials andmethods) was 29.2%. This result was largely consistent withthe proportion of rosid-conserved single-copy genes showingmatches to  Br sequences. A TBLASTN comparison of 1,070  At  -  Medicago truncatula (  Mt  ) conserved single-copy genesagainst  Br sequences revealed a 24.3% match. Both methodsindicate approximately 30% coverage of euchromatin in thedataset analyzed; thus, the euchromatic region of  Br is esti-mated to be approximately 220 Mbp, 42% of the wholegenome given that the genome size of  Br is 529 Mbp [30]. Characteristics of the B. rapa gene space Gene annotation was carried out using our specialized  Br annotation pipeline. Gene prediction of the  Br sequence datausing a variety of ab initio , similarity-based, and EST/full-length cDNA-based methods resulted in the construction of 15,762 gene models. Taken together with the genome cover-age of  Br sequences, the overall number of protein-codinggenes in the  Br genome is at least 52,000 to 53,000, which ishigher than those of other plant genomes sequenced thus far,including  At [7], rice ( Oryza sativa ( Os )) [8], poplar (  Populustrichocarpa (  Pt  )) [9], grape [10], papaya [11], and sorghum [12]. However, the estimated total number of genes in the  Br genome is only twice that of  At  . Details of the annotation areavailable online at the URL cited in the 'Data used in thisstudy' section in the Materials and methods.The gene structure and density statistics are shown in Table2. The base composition of  Br and  At genes is very similar.The average length of  Br genes (ATG to stop codon) is 73%that of  At genes. This is consistent with previous reports on Table 1Summary of B. rapa chromosome sequences comparatively tiled on the  A. thaliana genome B. rapa A . thaliana Number of BACsNumber of sequence contigsTotal sequence length (Mbp)Coverage of  At genome (Mbp)Gaps of  At genome (Mbp)EuchromatinHeterochromatin At114710516.518.51.410.5At2985910.312.41.46At31248914.215.70.47.4At4977311.311.40.96.2At51238413.517.32.37.4Total58941065.875.36.437.5Sequence length and coverage were calculated according to Tables S1 and S2 in Additional data file 1.  http://genomebiology.com/2009/10/10/R111 Genome Biology 2009, Volume 10, Issue 10, Article R111 Mun et al. R111.4 Genome Biology   2009, 10: R111  Bo [19,20,26]. This difference appears to be due to one less exon per gene and shorter exon and intron lengths in  Br . Theaverage gene density of 1 per 4.2 kilobase-pairs (kbp) in  Br isslightly lower than that in  At (1 per 3.8 kbp). Thus, the  At  /  Br ratio of gene density is 0.90, indicating slightly less compactorganization of  Br euchromatin than  At euchromatin. More-over, the distance between the homologous block endpointsin  Br and  At has an R  2 of 0.63 with a dAt  / dBr slope of 1.36(Figure S1 in Additional data file 2). This result indicates thatgene-containing regions in  At occupy approximately 30 to40% more space than their  Br counterparts. Based on thesedata and the results mentioned above, we postulate that theeuchromatic genome of  Br has shrunken by approximately 30% compared to its syntenic  At counterpart. Most of thegenome shrinkage in  Br could be explained by the deletion of roughly one-third of the redundant proteome as well as TEsin the euchromatic  Br genome. Only 14% of the  Br genes weretandem duplicates compared with 27% of  At genes in a 100-kbp window interval. In addition, only 45 nucleotide bindingsite-encoding genes were identified in  Br , suggesting that thetotal number of nucleotide binding site-encoding genes in the  Br genome is likely to be almost the same as that in  At  (approximately 200) [31,32]. A database search revealed that a total of 12,802 (81%) of the predicted  Br genes have similar-ity (<E -10 ) to proteins in the non-redundant nucleotide data- base of the National Center for Biotechnology Information(NCBI); 2,960 (19%) are  Br unique genes. To assess the puta-tive function of the genes that recorded no hits to non-redun-dant proteins, we assigned functional categories to the  Br unique genes using gene ontology analysis; however, thisanalysis could not identify a putative function for approxi-mately 85% of the  Br unique genes. Thus, we can concludethat 16% of the proteome of  Br has acquired a novel functionsince the  Br -  At divergence.Repetitive sequence analysis revealed that 6% of euchromatic  Br sequences are composed of TEs, a twofold greater amountthan identified in the counterpart  At euchromatic genome,presumably due to a greater number of LTRs and long inter-spersed elements (Table 3). In addition, low complexity repetitive sequences are relatively abundant in the  Br euchro-matic region, indicating  Br -specific expansion of repetitivesequences. The distribution of repetitive sequences and TEsalong the chromosomes was not uneven (Figure S2 in Addi-tional data file 2). It has previously been reported, based onpartial draft genome shotgun sequences, that  Bo (approxi-mately 696 Mbp) has a significantly higher proportion of bothclass I and class II TEs sequences than  At [33]. Takentogether with these previous reports [34,35], TEs appear to be partly responsible for genome expansion in the  Brassica lin-eage, and these TEs appear to accumulate predominantly inthe heterochromatic regions of  Br . Synteny between the B. rapa and  A. thaliana genomes To identify syntenic regions in the  Br and  At genomes, wecompared the whole proteome between the two genomesusing BLASTP analysis, and putative synteny blocks wereplotted using DiagHunter and GenoPix2D programs [36].The non-redundant chromosome-ordered genome sequencein the  Br  build was 62.5 Mbp. An additional 3.2 Mbp had not yet been assigned to chromosomes and was therefore notused for synteny analysis. We examined the synteny blocks atthree different levels: whole genome (Figure 2a), large-scalesynteny blocks in chromosome-to-chromosome windows(Figure 2 b; Additional data file 3), and microsynteny <2.5Mbp (the synteny can be viewed at the URL cited in the 'Dataused in this study' section in the Materials and methods). Although the  Br genome build was partial and incomplete with only approximately 30% of euchromatin representedand some misordered contigs present, the level of synteny  between the genomes was prominent and distinct. The Diag-Hunter program detected 227 highly homologous syntenic blocks with 72% of the sequenced and anchored  Br sequenceassigned to synteny blocks in  At and 72% of  At euchromaticsequence assigned to synteny blocks in  Br  when multiple blocks overlapping the same region were counted (Figure 2a).Considering the history of frequent genome duplicationevents in Brassicaceae, this result strongly indicates the pres-ence of secondary or tertiary blocks resulting from WGT.The  Br and  At genomes share a minimum of 20 large-scalesynteny blocks with substantial microsynteny; these synteny  blocks extend the length of whole chromosome arms.  At  shows synteny of chromosome arms with multiple chromo-some blocks of  Br , apparently corresponding to triplicatedremnants (Figure 2 b).  At  1S (short arm),  At  2L (long arm),  At  4L, and  At  5 have three long-range synteny counterparts inthree independent  Br chromosomes. However,  At  1L and  At  3have only one or two synteny blocks in the  Br genome. More-over, some genome regions of  At  , including a smaller sectionof  At  2S and  At  4S, show no significant synteny with  Br coun-terparts, indicating chromosome-level deletion of triplicatedsegments. Incidentally,  Br shows synteny with a major singlechromosome along almost the entire length (A1, A2, A4, and A10) or fragments of multiple  At chromosomes in a compli-cated mosaic pattern, indicating frequent recombination of   Br chromosomes. Notable regions of synteny are shown inFigure 2 b, and are At1S-A6/A8/A9, At1L-A7, At2L-A3/A4/ A5, At3S-A3/A5, At3L-A7/A9, At4L-A1/A3/A8, and At5-A2/ A3/A10 (synteny view available at the URL cited in the 'Dataused in this study' section in the Materials and methods. Additional synteny blocks scattered throughout genomeregions, probably due to recombination, were also identified. Within individual synteny blocks, microsynteny (conserva-tion of gene content and order) was considerable. The averagedegree of proteome conservation for all predicted synteny  blocks was 52 ± 13% in the blocks (Table S3 in Additional datafile 1). This value is almost the same as that of the  Mt  -  Lotus japonicus comparison in which an ancient WGD event at asimilar time period (Ks 0.7 to 0.9) as the  Br -  At  WGD but ear-lier speciation (Ks 0.6) than  Br -  At  was detected [18]. The  http://genomebiology.com/2009/10/10/R111 Genome Biology 2009, Volume 10, Issue 10, Article R111 Mun et al. R111.5 Genome Biology   2009, 10: R111 underestimated value reported here presumably reflects sig-nificant gene loss and rearrangement after WGT in the  Br lin-eage resulting in genome shrinkage, based on the fact thatdeletion events in syntenic blocks of the  Br genome were two-fold more frequent than in the  At genome. Genes without cor-responding homologs in syntenic regions contributed to 15 ±7% of all genes from  Br  but 33 ± 13% from  At (Table S3 in Additional data file 1; Additional data file 3). Genes encodingproteins involved in transcription or signal transduction werenot found to be significantly more retained in syntenic blocksthan those encoding proteins classified as having other func- In silico allocation of 410 B. rapa BAC sequence contigs to  A. thaliana chromosomes Figure 1 In silico allocation of 410 B. rapa BAC sequence contigs to  A. thaliana chromosomes. BAC sequence contigs (blue bars) were aligned to  At chromosomes based on significant and directional matches of sequences using a BLASTZ cutoff of <E -6 . At Chr.1 0 1M 2M 3M 4M 5M 6M 7M 8M 9M 10M 11M 12M 13M 14M 15M 16M 17M 18M 19M 20M 21M 22M 23M 24M 25M 26M 27M 28M 29M 30M12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970 71 72737475767778798081828384858687888990919293949596979899100101102103104105 At Chr.2 0 1M 2M 3M 4M 5M 6M 7M 8M 9M 10M 11M 12M 13M 14M 15M 16M 17M 18M 19M106107108109110111 112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164 At Chr.3 0 1M 2M 3M 4M 5M 6M 7M 8M 9M 10M 11M 12M 13M 14M 15M 16M 17M 18M 19M 20M 21M 22M 23M165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253 At Chr.4 0 1M 2M 3M 4M 5M 6M 7M 8M 9M 10M 11M 12M 13M 14M 15M 16M 17M 18M254255256257258259260261 262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326 At Chr.5 0 1M 2M 3M 4M 5M 6M 7M 8M 9M 10M 11M 12M 13M 14M 15M 16M 17M 18M 19M 20M 21M 22M 23M 24M 25M 26M327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410 LowHigh
Related Documents
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks
SAVE OUR EARTH

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!

x