This reports the protocol used to align the sorghum_wgs features to tigrv4-genome.
Thu Aug 10 11:32:26 2006


Source of sorghum_wgs : Downloaded from trace archive under the query SPECIES_CODE='SORGHUM BICOLOR' 

Alignment procedure details 
--------------------------- 

6749814 sorghum_wgs are aligned to tigrv4-genome using blat with blat parameters -minIdentity=50 followed by PslReps with -singleHit. This was followed by a filtering procedure described below and applied in general to 'CrossSpecies-Genomic' data sets.

Initial summary
# aligments : 3561092
# unique Features these alignments represent: 2561132
% of total features these alignments represent : 37.94 %

The length of the matches are distributed as follows 
Hit_length	# alignments
--------	--------
100		1051724
150		474113
200		293229
250		229178
300		189089
350		158306
400		144863
450		126744
500		118610
550		108168
600		92785
650		81728
700		75494
750		81175
800		86617
10000		249269

Alignments with matches less than 100 bp are filtered 
# remaining Aligments : 2519454
# unique Features these represent alignments represent: 1744503
% of total features these alignments represent : 25.85 %

gap distribution of the remaining features
gaps	# alignments
--------	--------
1000		2239555
2000		17905
3000		4960
4000		2937
5000		1503
6000		1796
7000		2026
8000		3179
9000		1895
10000		1515
20000		9428

Alignments with gaps  > 4000 bp are filtered
# remaining Aligments : 2265357
# unique Features these represent alignments represent: 1554276
% of total features these alignments represent : 23.03 %

Frequency distribution of the remaining features
# hits	# features
--------	--------
1		1276251
2		123267
3		28611
4		45580
5		31626
6		41411
8		4263
9		1059
10		798
20		1368
30		34
40		5
50		1
100		2

Features that hit more than thrice are deleted. 
# remaining Aligments : 1608618
# unique Features these represent alignments represent: 1428129
% of total features these alignments represent : 21.16 %

% Identity distribution of the remaining features
% Identity	# alignemnts
--------	--------
10		3
20		928
30		8310
40		39123
50		104748
60		173161
70		226204
80		201434
90		541929
100		312778

Alignments with percent identity lower than 60 deleted. 
# remaining Aligments : 1305042
# unique Features these represent alignments represent: 1167319
% of total features these alignments represent : 17.29 %

Following is the final summary
# alignments : 19246
# unique Features these alignments represent: 17249
% of total features these alignments represent : 17.25 %