Methods: Aligning wild rice clones and contigs to the O. sativa Genome

We have constructed a synteny pipeline to align clones and contigs from the wild species to O. sativa by combining data from the BES alignments to O. sativa and fingerprint contigs. The pipeline consists of 5 steps:

  1. upload FPC maps and GSS BES read data to the Gramene mappings database
  2. align BES to the O. sativa genome
  3. determine the best alignments for each clone
  4. assemble the clone positions to determine the regions where the contig is found to align
  5. utilize the data to create visualizations in CMap as part of Gramene
In the first step of the process, the OMAP FPC maps for the wild rice species are loaded into the Gramene 'mappings' relational database such that the data model captured the locations of FPContigs on chromosomes, and clones on FPContigs. Next, all BES were loaded into the 'mappings' database, and cross references were made between the BES entries and their parent clones. Pairs of clone end reads could therefore be readily identified. In the second step, BES were aligned to the O.sativa pseudomolecules (TIGR v4 assembly) using blat. The standard blat parameters were used, and the alignments from the top 10 scoring hits loaded into the database. In the third step, mappings between clones and the O. sativa genome were inferred where both BES of a given clone aligned and were within two standard deviations of the mean clone size. The mean clone size was determined by examining the band counts for the clones in the FPC map and extrapolating the clone size in base pairs using an estimated band size of 1195 bp. The BES for a clone could map in either the same or different orientations to account for possible inversions, but could not be mapped to different chromosomes or to regions on the same chromosome that fell outside of the expected clone size region (+/- 2 std dev. from the mean clone size). Clones could map to multiple regions of the genome if the score for the hits in the region were the same yielding multiple top hits. In the fourth step, FPContig mappings on the genome were inferred where three or more overlapping clones on a given contig were located. The start and end of the clones became the start and end points of each region. If a gap existed, another region was be formed by the clones in that region. Discrete sections of each FPContig were free to map to multiple locations on the genome. Lastly we exported the data from our Gramene mappings database to create tracks in the Genome Browser, maps and correspondences to related maps in the CMap viewer.