Methods: Aligning wild rice clones and contigs to the O. sativa Genome
We have constructed a synteny pipeline to align clones and contigs
from the wild species to O. sativa by combining data from the BES
alignments to O. sativa and fingerprint contigs. The pipeline
consists of 5 steps:
- upload FPC maps and GSS BES read data to the Gramene mappings database
- align BES to the O. sativa genome
- determine the best alignments for each clone
- assemble the clone positions to determine the regions where the
contig is found to align
- utilize the data to create visualizations in CMap as part of
Gramene
In the first step of the process, the OMAP FPC maps for the
wild rice species are loaded into the Gramene 'mappings' relational
database such that the data model captured the locations of FPContigs
on chromosomes, and clones on FPContigs. Next, all BES were loaded
into the 'mappings' database, and cross references were made between
the BES entries and their parent clones. Pairs of clone end reads
could therefore be readily identified. In the second step, BES were
aligned to the O.sativa pseudomolecules (TIGR v4 assembly) using blat.
The standard blat parameters were used, and the alignments from the
top 10 scoring hits loaded into the database. In the third step,
mappings between clones and the O. sativa genome were inferred where
both BES of a given clone aligned and were within two standard
deviations of the mean clone size. The mean clone size was determined
by examining the band counts for the clones in the FPC map and
extrapolating the clone size in base pairs using an estimated band
size of 1195 bp. The BES for a clone could map in either the same or
different orientations to account for possible inversions, but could
not be mapped to different chromosomes or to regions on the same
chromosome that fell outside of the expected clone size region (+/- 2
std dev. from the mean clone size). Clones could map to multiple
regions of the genome if the score for the hits in the region were the
same yielding multiple top hits. In the fourth step, FPContig
mappings on the genome were inferred where three or more overlapping
clones on a given contig were located. The start and end of the clones
became the start and end points of each region. If a gap existed,
another region was be formed by the clones in that region. Discrete
sections of each FPContig were free to map to multiple locations on
the genome. Lastly we exported the data from our Gramene mappings
database to create tracks in the Genome Browser, maps and
correspondences to related maps in the CMap viewer.