This reports the protocol used to align the Wheat_EST features to tigrv4-genome.
Mon Apr 17 12:17:41 2006


Source of Wheat_EST : from Gramene markers database, originally Downloaded from genbank with query ' txid4564[orgn]  AND  gbdiv_est[PROP]' 

Alignment procedure details 
--------------------------- 

623606 Wheat_EST are aligned to tigrv4-genome using blat with blat parameters -minIdentity=50 followed by PslReps with -singleHit. This was followed by a filtering procedure described below and applied in general to 'CrossSpecies-Coding' data sets.

Initial summary
# alignments : 464583
# unique Features these alignments represent: 410784
% of total features these alignments represent : 65.87 %

The length of the matches are distributed as follows 
Hit_Length	# alignments
--------	--------
100	 46176
150	 46791
200	 53978
250	 56147
300	 57641
350	 51940
400	 47804
450	 37852
500	 26182
550	 17258
600	 11080
650	 5802
700	 3082
750	 1438
800	 564
10000	 848

Alignments with matches less than 150 bp are deleted
# remaining Alignments : 372614
# unique Features these remaining alignments represent: 324958
% of total features these alignments represent : 52.11 %

Frequency distribution of the remaining features
# hits	# features
--------	--------
1	 301884
2	 13415
3	 3205
4	 2393
5	 1770
6	 1274
8	 788
9	 157
10	 40
20	 32
30	 0
40	 0
50	 0
100	 0

 Features that hit more than thrice are deleted.  
# remaining Alignments : 338329
# unique Features these remaining alignments represent: 318504
% of total features these alignments represent : 51.07 %

% Identity distribution of the remaining features
% Identity	# features
--------	--------
10	 0
20	 2
30	 1
40	 13
50	 56
60	 234
70	 1407
80	 16261
90	 207339
95	 103626
100	 9390

Following is the distribution of gaps
Gaps	# features
--------	--------
1000	 270782
2000	 44855
3000	 10203
4000	 2587
5000	 1310
6000	 637
7000	 1176
8000	 451
9000	 254
10000	 178