11 pages

The first generation of a BAC-based physical map of Brassica rapa

of 11
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
The first generation of a BAC-based physical map of Brassica rapa
  BioMed   Central Page 1 of 11 (page number not for citation purposes) BMC Genomics Open Access Research article The first generation of a BAC-based physical map of Brassica rapa Jeong-HwanMun 1 , Soo-JinKwon 1 , Tae-JinYang  1,5 , Hye-SunKim 2 , Beom-SoonChoi 3 , SeunghoonBaek  1 , JungSunKim 1 , MinaJin 1 , JinAKim 1 , Myung-HoLim 1 , SooInLee 1 , Ho-IlKim 1 , HyungtaeKim 2 , YongPyoLim 4  and Beom-SeokPark* 1  Address: 1 Brassica Genomics Team, National Institute of Agricultural Biotechnology, Rural Development Administration, 225 Seodun-dong, Gwonseon-gu, Suwon 441-707, South Korea, 2 Macrogen, 60-24 Gasan-dong, Geumcheon-gu, Seoul 153-023, South Korea, 3 National Instrumentation Center for Environmental Management, Seoul National University, San 56-1, Sillim-dong, Gwanak-gu, Seoul 151-921, South Korea, 4 Department of Horticulture, Chungnam National University, 220 Kung-dong, Yusong-gu, Daejon 305-764, South Korea and 5 Department of Plant Science, College of Agriculture and Life Sciences, Seoul National University, San 56-1, Sillim-dong, Gwanak-gu, Seoul 151-921, South KoreaEmail: Jeong-HwanMun-munjh@rda.go.kr; Soo-JinKwon-sjkwon@rda.go.kr; Tae-JinYang-tjyang@snu.ac.kr; Hye-SunKim-sonne20@macrogen.com; Beom-SoonChoi-bschoi@nicem.snu.ac.kr; SeunghoonBaek-yohan_bosco@hotmail.com; JungSunKim-jsnkim@rda.go.kr; MinaJin-genemina@rda.go.kr; JinAKim-jakim@rda.go.kr; Myung-HoLim-mlim312@rda.go.kr; SooInLee-silee@rda.go.kr; Ho-IlKim-hikim@rda.go.kr; HyungtaeKim-htkim@macrogen.com; YongPyoLim-yplim@cnu.ac.kr; Beom-SeokPark*-pbeom@rda.go.kr * Corresponding author Abstract Background: The genus Brassica includes the most extensively cultivated vegetable cropsworldwide. Investigation of the Brassica genome presents excellent challenges to study plantgenome evolution and divergence of gene function associated with polyploidy and genomehybridization. A physical map of the B. rapa genome is a fundamental tool for analysis of Brassica "A" genome structure. Integration of a physical map with an existing genetic map by linking geneticmarkers and BAC clones in the sequencing pipeline provides a crucial resource for the ongoinggenome sequencing effort and assembly of whole genome sequences. Results: A genome-wide physical map of the B. rapa genome was constructed by the capillaryelectrophoresis-based fingerprinting of 67,468 Bacterial Artificial Chromosome (BAC) clones usingthe five restriction enzyme SNaPshot technique. The clones were assembled into contigs by meansof FPC v8.5.3. After contig validation and manual editing, the resulting contig assembly consists of 1,428 contigs and is estimated to span 717 Mb in physical length. This map provides 242 anchoredcontigs on 10 linkage groups to be served as seed points from which to continue bidirectionalchromosome extension for genome sequencing. Conclusion: The map reported here is the first physical map for Brassica "A" genome based onthe High Information Content Fingerprinting (HICF) technique. This physical map will serve as afundamental genomic resource for accelerating genome sequencing, assembly of BAC sequences,and comparative genomics between Brassica genomes. The current build of the B. rapa physical mapis available at the B. rapa Genome Project website for the user community. Published: 12 June 2008 BMC Genomics  2008, 9 :280doi:10.1186/1471-2164-9-280Received: 6 November 2007Accepted: 12 June 2008This article is available from: http://www.biomedcentral.com/1471-2164/9/280© 2008 Mun et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the srcinal work is properly cited.  BMC Genomics  2008, 9 :280http://www.biomedcentral.com/1471-2164/9/280Page 2 of 11 (page number not for citation purposes) Background  The genus Brassica is one of the most important vegetablecrop genera in the world because it contributes to humandiet, condiments, animal feed, forage, and edible or industrial oil. Many cultivated species of Brassica are alsoincreasingly recognized as good sources of healthy metab-olites such as vitamin C, soluble fiber, and multiple anti-cancer glucosinolate compounds including diindolyl-methane and sulforaphane [1]. In addition, current emphasis on rapeseed oil as a biofuel or a renewableresource for industry worldwide makes Brassica a good tar-get of metabolic engineering. The close phylogenetic relationship between the Brassica species and model plant  Arabidopsis thaliana predicts that the knowledge transfer from  Arabidopsis for Brassica cropimprovement would be straightforward. However, thecomplex genome organization of the Brassica species as aresult of multiple rounds of polyploidy and genomehybridization makes the identification of orthologousrelationships of genes between the genomes highly diffi-cult. In particular, comparative genomics study of Flower-ing Locus C region between B. rapa and  A. thaliana genomes revealed that the Brassica genome triplicated 13to 17 million years ago very soon after divergence fromthe  Arabidopsis lineage. A following extensive interspersedgene loss or gain events and large scale chromosomal rear-rangements including segmental duplications or dele-tions in the Brassica lineage complicated the orthologousrelationships of the loci between the two genomes [2].Hybridization between Brassica species is another sourceof the Brassica genome complexity. The interspecific breeding between three diploid Brassica species, B. rapa (AA genome), B. nigra (BB genome), and B. oleracea (CCgenome), resulted in the creation of three new species of allotetraploid hybrids B. juncea (AABB genome), B. napus (AACC genome), and B. carinata (BBCC genome) [3]. Thus, investigation of the Brassica genome provides sub-stantial opportunities to study the divergence of genefunction and genome evolution associated with poly-ploidy, extensive duplication and hybridization.Several crop Brassica species have had their genomes char-acterized in-depth. With favorable genetic attributes, B.rapa has been selected as a model species representing the Brassica "A" genome and is the focus of multinationalgenome projects. The early fruits of investigation with this well-characterized genome are evident in the recent advance in our understanding of Brassica "A" genomestructure and evolution [2,4-7]. Linkage maps have been constructed for B. rapa ssp. pekinensis cv.  Jangwon [4], cv. VCS (Kim et al., unpublished our data), and cv. Chiifu [5]. These genetic maps with associated markers and compar-ative genomics study have enabled the identification of quantitative trait loci (QTL) for club root resistance andflowering time. Large EST databases are publicly availableand a 24 K oligo microarray has been developed and usedto examine the transcriptome profile of B. rapa [8]. Morethan 127,000 Bacterial Artificial Chromosome (BAC) endsequences and about 580 seed BAC sequences of phase 2or 3 are also available at the National Center for Biotech-nology Information (NCBI) database. In parallel to theseactivities, international programs are collaborating tocharacterize the Brassica "A" genome at the whole genomesequence level through a BAC-by-BAC sequencing approach [9]. A crucial component of successful genome sequencing activity with the BAC-by-BAC strategy is the availability of a genome-wide, BAC-based physical map [10]. To date,the utility of a physical map has been reported by major genome sequencing projects of human [11],  A. thaliana [12], Oryza sativa [13], and  Medicago truncatula [14]. Thesephysical maps were constructed with a combination of restriction-enzyme digested BAC fragments fingerprinting on agarose gels and assembly of the fingerprints by meansof FingerPrinted Contigs (FPC) software package [15]. Theagarose method has been successful, but it has limitedthroughput because of the need for human band calling. This is a time-consuming process requiring ample skilleven when using image software [16]. Another disadvan-tage of the agarose method is that few large fragments aregenerated, and they are difficult to size. Bands manually selected using the agarose method can often lead to a poor map [17,18]. Fluorescence-labeled fingerprinting meth- ods using DNA sequencing gel [19,20] or capillary electro- phoresis [21,22] are alternative methods that have been developed to make larger and more accurate contigs withincreased throughput. Fluorescence-labeled capillary elec-trophoresis methods include the 3-enzyme method [22]and the High-Information Content Fingerprinting (HICF)methods which use type IIS restriction enzyme [16] or theSNaPshot labeling technique [21,23-25]. These methods facilitate improved physical map construction both interms of throughput and quality of fingerprinting com-pared to the agarose method due to their automatic work-flow and higher resolution [17,22]. However, an increase in the number of enzymes and labeling colors in the HICFmethod can give partial digestion, star activity, and low labeling efficiency [23]. Accordingly, several whole-genome HICF assembly maps have been built for smallfungi genomes [23,24] as well as for large genomes of  maize [16] and catfish [25]. Brassica rapa has a haploid genome size of 550 megabasepairs (Mb) [26]. Here we report the first genome-wide,BAC-based SNaPshot physical map of the Brassica "A"genome. To build a physical map, we have fingerprintedabout 99,000 BAC clones by the HICF method using an ABI SNaPshot labeling kit and constructed a BAC clone  BMC Genomics  2008, 9 :280http://www.biomedcentral.com/1471-2164/9/280Page 3 of 11 (page number not for citation purposes) contig map by means of FPC v8.5.3. Sequence-tagged sitegenetic markers incorporated in the genetic map anchoredthe euchromatic portion of the physical map to chromo-somal loci. The resulting physical map allows facilitatedselection of BAC clones for the B. rapa  whole genomesequencing effort. Results and discussion BAC library source and fingerprinting  Construction of a physical map for a genome that hasevolved through polyploidy, extensive genome duplica-tion or hybridization presents robust challenges togenome analysis. Successful contig build of the B. rapa genome relies on the quality and availability of deep-cov-erage large insert genomic libraries. Three large-insert BAClibraries of B. rapa ssp. pekinensis cv. Chiifu are available inthe public sector providing >34-fold genome coverage[7,8]. The first step to construct a physical map is genera- tion of fingerprints representing restriction digests of BACDNA using efficient techniques [20,27]. We have chosen the HICF fingerprinting method based on its well-estab-lished format with a commercially available SNaPshot labeling kit (ABI) and increased throughput using the ABI3730 xl sequencer [17,21]. A total of 99,456 BAC clones (~22.5× coverage) from the three independent libraries were fingerprinted by digestion with five restrictionenzyme combinations ( Eco RI, Bam HI,  Xba I,  Xho I, and Hae III) followed by SNaPshot reagent labeling of four colors at the 3' ends of the restriction fragments and sizing on the ABI 3730 xl (Table 1). The size of DNA fragmentsfrom the capillary fingerprinting chromatograms was col-lected by GeneMapper. There was an average of 114restriction fragments produced per BAC clone. The aver-age size of the band was calculated as 1.09 kb with averageinsert size of BAC clones at 124 kb. The fingerprint data was then imported to GenoProfiler [28] to change dataformat suitable for FPC analysis. Of these fingerprints,5,767 (5.8%) were removed from the data set due to noinsert clones, failure in fingerprinting, clones having fewer than 50 bands or more than 200 bands in the range of 50–500 bp, or cross-contamination. Thus, a total of 93,689clones (94.2%) were successfully fingerprinted to be usedfor contig assembly. Contig assembly   With BAC fingerprints, the creation of a physical map of aeukaryotic genome is a three-step process. First, the finger-prints are assembled into contigs, which are accurately ordered contiguous overlapping clone sets [29]. Second,the contigs are anchored on the genetic map to accurately represent the true order [30,31]. Third, questionable con- tigs are broken to increase contig reliability or contigsassociated with adjacent regions of the genome are fusedto organize big contigs [32]. Genome duplication, repeti-tive sequence blocks, questionable clones (Q clones),and/or fingerprinting error complicate these steps and canresult in contigs containing false-positive overlaps of clones [16,29]. Therefore, as a prelude to developing a reliable physical map of B. rapa , it is worth discarding low quality or problematic data before the fingerprint assem-bly to avoid chimeric contigs. Moreover, the eliminatedclones can later be placed back onto the physical mapafter the contig merger is completed [21]. In the three B.rapa BAC library sources, up to 29% clones were estimatedto contain centromeric or pericentromeric repetitivesequences [6]. To screen out the clones having heterochro-matic repetitive sequences before contig assembly, weremoved 26,221 clones (28.0%) containing centromeric repetitive sequences (CentBr and CRB) at least in one endor pericentromeric repetitive sequences (PCRBr, 5S, and25S rDNA) in both ends based on BLASTN search of BACend sequences (Table 1). Thus, a total of 67,468 BACclone fingerprints with an average band size of 1.39 kb(Table 2), equivalent to 15.2× of the B. rapa genome, werefinally converted into the FPC database. Of these 67,468clones, 37,041 (8.4×) were from the Hin dIII library,24,767 (5.7×) from the Bam HI library, and 5,660 (1.0×)from the Sau 3AI library. To assemble the physical map contigs of the B. rapa genome from BAC fingerprints, we used the program FPC v8.5.3. Before contig assembly, a series of tests were per- Table 1: Characteristics of the three source BAC libraries of Brassica rapa ssp.  pekinensis cv. Chiifu that were used in the HICF map. Libraries a Genomic DNA partially digested withAverage insert size (kb)No. of 384 plateNo. of BACsAverage no. of valid bands per clones b Genome coverage c No. of BACs with successful fingerprintsNo. of BACs with repetitive sequencesKBrH Hin dIII125KBrH001-14756,44812412.9×53,44316,402KBrB Bam HI126KBrB001-09636,8641048.5×34,3719,604KBrS Sau 3AI100KBrS001-0166,144941.2×5,875215Total12425999,45611422.5×93,68926,221 a For details of the BAC libraries, see [7] and [8]. b Valid bands are those in the range of 50~500 bp. c Genome coverage was estimated based on the haploid genome equivalent of B. rapa as 550 Mb.  BMC Genomics  2008, 9 :280http://www.biomedcentral.com/1471-2164/9/280Page 4 of 11 (page number not for citation purposes) formed to determine the FPC parameter suitable for con-tig assembly of the full data set. Contig build at highstringency prevents chimeric joining of duplicatedregions, whereas starting builds at low stringency resultsin maps with larger contigs that encompass more genomespace [16]. Thus, the best approach should rely on thestructural characteristics of a target genome. The auto-matic contig build using a randomly chosen data set wastried with different cutoff values from 1e-40 to 1e-80.Based on the preliminary test, the initial cutoff value waschosen to be 1e-45. The initial parameter is reasonably stringent because the contigs generated at this cutoff valueincluded up to 70% of the clones with less than 10%questionable clones (Q clones) which can cause chimeric assembly. Of course, assembly at higher stringency improved the build by reducing Q clones but contig cov-erage reduced significantly. For example, contig build at 1e-70 included only 40% of the fingerprints in contigsand left 60% as singletons. Based on this analysis, weassembled the physical map contigs in three steps. First, acutoff value of 1e-45 was used for automatic contig assem-bly. Second, the "DQer" function was used to break up Qcontigs (contigs containing more than 10% of Q clones)from the initial builds. Third, the remaining contigs wereend-merged by "End to End" function and then singletons were added to the end of contigs by "Singles to End" func-tion at 6 successively lower cutoffs, starting at 1e-40 andterminating at 1e-15. At each round, additional "DQer" was used to break up all bad contigs containing more than15% Q clones (Table 2). As a result, the first contig buildresulting from automatic assembly and DQer contained4,726 contigs assembled with 42,427 (63%) clones but 25,041 (37%) clone fingerprints remained as singletons.Following an iterative process of consecutive FPC func-tions, "End to End", "Singles to End", and "Dqer", eachsuccessive round contributed nicely to a decrease in thecontig number, singleton number, and genome coveragebut to an increase in average contig length (Table 2). It isobvious that merger of singletons into the assembly isresponsible for most of the increase in the number of Qclones in the map [16]. However, Table 2 shows that only  ~34% of singletons integrated into the end of the contigscontributed to the increase of Q clones in the build. Thisresult suggests that many clones that remained as single-tons at the initial stringency cutoff are not just becausetheir fingerprints were low quality but because they may come from regions of low coverage. If this is true, the BAClibraries we used would not deeply cover the whole B. rapa genome. An unsatisfactory aspect of this assembly is itslarge number of Q clones (Table 2). The Q clones in thisassembly corresponded to 15% of the clones. This is a big-ger proportion than the cases reported from catfish(7.3%) [25] and maize (11%) [16]. A large number of Q clones may result from fingerprinting error due to partialdigestion, star activity, or low labeling efficiency. Though we removed the fingerprints containing centromeric repeat sequences, the remaining dataset still includedhighly repetitive DNA sequences. If repetitive sequencessignificantly affect contig assembly, deep contigs (toomany clones assembled in a small region) can be made. The impact of repetitive DNA sequences on the contig assembly has been estimated. Of the 1,417 contigs, three were found to be deep contigs. Chloroplast DNA can be asource of deep contig assembly [33]. However, Blast anal- ysis of B. rapa chloroplast sequence against BAC-endsequences from the deep contigs suggested that these deepcontigs may be derived from B. rapa genomic DNA. Thesethree deep contigs included 71–84% of the clones as Qclones, which contribute to ~48.3% of all Q clones in theinitial build. Thus, when we kill three deep contigs of theinitial build due to false positive overlaps, the Q clones inthe remaining 1,414 contigs correspond to 7.7% of the whole clones. The initial build, named B. rapa physical map Build 1, has1,417 contigs with an average length of 512 kb covering 725 Mb, 1.3× coverage of the genome. The total coverageof the physical contigs suggests that most contigs are not sufficiently overlapping and the gaps between the contigsneed to be closed by additional fingerprinting. However, Table 2: Summary of the B. rapa physical map autobuild produced from assembly of the 67,468 BAC clones. Build a ContigsAvr. contig length (kb) b Longest contig (kb)Genome coveragePhysical length (Mb) b Q clones (%)No. of contigs of different sizesSingletons ≥  10099-5049-2524-10<10Initial 1e-454,7262087,5961.8×9856,376 (9.5)9342518923,54025,041Merge 1e-404,0572305,5481.7×9356,457 (9.6)10643188002,86523,977Merge 1e-303,0012877,3291.6×8606,927(10.3)241263846061,86121,351Merge 1e-201,8014218,6861.4×7598,832(13.1)8218229937086817,086Merge 1e-151,4175129,3901.3×72510,135(15.0)11117724126961914,001 Each HICF assembly was performed starting with a complete build, followed by iteration of the Dqer, end-merge, and singleton-merge routines by means of FPC v8.5.3. a Additional Dqer, end-merge, and singleton-merge routines at 1e-35 and 1e-25 are not shown. b Each fingerprint band was estimated to represent an average of 1.39 kb. It was estimated by the average insert size of the BAC clones (124 kb, Table 1) divided by the total number of valid bands of 67,468 BAC clones (6,005,758 bands) used for the map contig assembly.  BMC Genomics  2008, 9 :280http://www.biomedcentral.com/1471-2164/9/280Page 5 of 11 (page number not for citation purposes)  with our current assembly, more fingerprinting of thesame libraries would not be effective in increasing cover-age of the contigs and closing the gaps efficiently, becausea higher proportion of the BAC clones are covering repet-itive sequence regions and some regions of the genomecould be poorly represented in those libraries generatedby restriction enzyme digestion. For this reason, we willadd more fingerprint data from a randomly sheared BAClibrary that is under construction, and will develop a new contig build. Validation of contigs and manual editing  Several different approaches were used to evaluate the reli-ability of the B. rapa contig assembly. First, marker anchors have been developed as an effective tool to vali-date contig structure and orientation. We analyzed whether positive BAC clones of single locus RFLP markers were assembled into the same segment of a contig. For example as shown in Figure 1, a total of seven positiveBAC clones were identified through a Hin dIII BAC library screen using a single locus BAN245 marker designed froma hydrolase gene (Fig. 1 A). FPC database search showedthat six of the positive clones were assembled into thesame segment of contig 415, and one clone was located very close to the others on the consensus band (CB) map(Fig. 1B). Marker anchors strongly supported proper assembly of contigs. We anchored 187 contigs on an exist-ing genetic map [4] using 315 genetic markers (Table 3 and Table S1 in additional file 1). Among the 187 contigscontaining BAC clones associated with framework genetic markers, 37 contigs having at least two marker anchors were selected to validate the contig build. The framework markers displayed close genetic linkage for contigs. Evennine questionable contigs (greater than 10 Q clones per contig) of the 37 contigs showed nice anchoring of themarker pairs on the genetic map. Figure 2 presents anexample of contig validation by mapping, in which a con-tig spanning the region of 86–91 cM of linkage group R9 was examined. A single locus RFLP marker, BAN235,designed from a pectinesterase (PE) gene expressed inanther was first used to screen the Hin dIII library, andthree positive BAC clones (KBrH016E13, KBrH059J05,and KBrH071P14) were identified at high stringency. AnFPC database search detected the corresponding contig containing the positive clones. Contig 180 consisted of 68BAC clones and was 1.3 Mb in size. Two SSR markers,KS31203 and KS31191, were designed from the BACclones KBrH001H24 and KBrH076J01, respectively, which were found at both ends of the contig. Genetic mapping of the SSR markers showed close genetic linkageon linkage group R9, consistent with clone orders in thecontig. This result was supported by sequence analysis of the selected BAC clones. BAC sequence analysis of 11selected clones in this contig successfully generated twooverlapping sequence blocks in accordance with thegenetic mapping result. Additional mapping and BACsequencing enabled merger of contig 180 with five adja-cent contigs to make a big contig extended to 3.1 Mb insize (data not shown). As a second validation, a grouping of a multigene family  was examined to determine if clones containing paralo-gous genes would be correctly assembled in the HICFmap. As shown in Figure 3, the contigs spanning theregions containing the pectinesterase gene family mem-bers were investigated. At least 14 members of the PE genefamily were identified from a B. rapa EST database search.Screening of the HindIII library using a RFLP marker BAN2 designed from a PE gene identified 22 positive BACclones. Southern blot analysis of the 22 clones by EcoRV digestion and hybridization with the BAN2 probegrouped the clones into at least four different typesaccording to shared main bands (Fig. 3 A). We analyzedthe contig assembly of 19 clones successfully finger-printed from the 22 positive BAC clones. HICF assembly of the 19 clones resulted in grouping of 14 clones in six independent contigs consistent with the observed South-ern hybridization pattern (groups I to VI corresponding tocontigs 672, 180, 205, 1428, 224, and 596, respectively);the remaining five clones were singletons (Fig. 3B and Fig.S1 in additional file 2). The clones of groups II/III andgroups IV/V shared the same main hybridization bands of  Type 2 and Type 3, respectively, but they were assembledin separate physical contigs. These results strongly support the assumption that paralogous clones are correctly assembled in independent contigs or remain as singletonsin the current build. We found five additional cases of cor-rectly assembled homeologous regions (data not shown).Finally, the reliability of the assembly has been confirmedby the results of ongoing genome sequencing of B. rapa .Integration of physical contigs into the genetic loci identi-fied a conflict between anchors of sequence-tagged sitemarkers. Contig 166 was found to be assembled by a falsejoining. Two of the markers, KS50140 and KR50161,anchored on this contig belonged to linkage group R3 but KS10551 marker was assigned to R9. We checked the CBmaps of the fingerprint order of this contig and found that  Table 3: Summary of sequence-tagged site genetic markers used for contig integration into the B. rapa genetic map. Total number of markers used315Total number of positive clones306Positive clones in contigs234Positive singleton clones72Number of markers in contigs242Number of markers in singletons73Number of contigs containing genetic markers187Contigs containing one genetic marker150Contigs containing more than one genetic markers37
Related Documents
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks