47. Comparison of eleven chloroplast genes between an Indica and a Japonica cultivar of rice

Elizabeth C. KEMMERER1 and Ray Wu1,2

1) Field of Botany; 2) Section of Biochemistry, Molecular and Cell Biology Biotechnology Building, Cornell University, Ithaca, New York 14853, U.S.A.

We present here a comparison between eleven chloroplast genes sequenced in the Oryza sativa Indica and Japonica cultivars. In the past few years, our lab has studied several chloroplast genes from an Indica cultivar of Oryza sativa (var. Labelle). The entire chloroplast genome has been sequenced by Hiratsuka et al. (1989) using a Japonica cultivar,Oryza sativa (var. Nihonbare). This provided an excellent opportunity to compare chloroplast genes between two closely related cultivars. The following genes were used in this comparison:

1) the atpB gene encoding ATPase subunit beta (Moon et al. 1987a; Hiratsuka et al. 1989);

2) the atpE gene encoding ATPase subunit epsilon (Moon et al. 1987a; Hiratsuka et al., 1989);

3) the petA gene encoding apocytochrome f (Wu et el. 1986; Hiratsuka et al. 1989);

4) the psbA gene encoding the Q\B\ protein (Wu et al. 1987; Hiratsuka et al. 1989);

5) the rbcL gene encoding the large subunit of ribulose bisphosphate carboxylase (Moon et al., 1987b; Hiratsuka et ai. 1989);

6) the rpl2 gene encoding ribosomal protein L2 (Moon and Wu 1988; Hiratstika et al. 1989);

7) the rpl23 gene encoding ribosomal protein L23 (Moon and Wu 1988; Hiratsuka et al. 1989);

8) the rpsl4 gene encoding ribosomal protein S14 (Cote and Wu 1989; Hiratsuka et al. 1989);

9) the rpsl9 gene encoding ribosomal protein S19 (Moon and Wu 1988; Hiratsuka et al. 1989);

10) the trnM gene encoding tRNAMet (Moon and Wu 1987a; Hiratsuka et a]. 1989);

11) the trnHI gene encoding tRNAHis (Moon and Wu 1988; Hiratsuka et al. 1989).

The coding regions of these eleven chloroplast genes were aligned using the University of Wisconsin Genetics Computer Group Sequence Analysis Software Package for VAX/VMS computers (Devereux et al., 1984) version 6.0. As can be seen in Table 1, there are very few nucleotide changes between the genes in their coding regions. Four genes are perfectly conserved: atpE, rpsl4, trnHI and trnM. The others have five or fewer changes in their coding regions for an overall percent similarity of 98.2% or more. It is interesting to note that most of these changes

Table 1. Percent similarity, synonomous substitutions, nonsynonomous substitutions and intron substitutions between eleven chloroplast genes from an Indica and a Japonica cultivar of Oryza sativa

________________________________________________________________
                Percent  Length  Synonomous Nonsynonomous Intron
Gene           similarity (Bp)   substitu-  substitutions subst-
                                 tions                    itut.
________________________________________________________________
atpB              99.7    1494       1           4           -
atpE             100.0     411       0           0           -
petA              99.7     960       0           2           -
psbA              99.7    1059       3           0           -
rbcL              99.6    1431       2           4           -
rpl2;minus intron 99.99    818       2           4           -
rpl2;plus intron  98.9    1487       -           -          18
rpl23             98.6     279       1           2           -
rps14            100.0     309       0           0           -
rps19             98.2     279       0           5           -
trnHI            100.0      75       0           0           -
trnM             100.0      73       0           0           -
________________________________________________________________
Synonomous substitution: A change in the nucleotide sequence that does not result in a change in the amino acid sequence. Nonsynonomous substitution: A change in the nucleotide sequence that results in a change in the encoded amono acid sequence.

are nonsynonomous jubstitutions leading to amino acid changes. Intuitively, one would expect the majority of the changes to be synonomous substitutions. It is our belief that some of these are not true sequence differences; rather, they reflect an error in sequencing or typesetting. For example, the codons ATC (Ile) CAA (Gln) in the Japonica rbcL gene are ATG (Met) GAA (Glu) in the Indica cultivar. It is unlikely that the CC/GG change is a true difference given that these are two of only 6 nucleotide changes in the 1431 bp sequence. In the petA gene, the codon GTA (Val) in the Japonica cultivar is GAT (Asp) in the Indica. Again the TA/AT change is more likely to be a sequencing or typesetting error. In determining the sequence, if one nucleotide is missed, it will result in a frame shift and may cause premature termination or bypassing a termination signal. Experienced sequencers know that one often gets band compression in GC rich regions, especially when several G or C residues occur in tandem. These observations show that this sequencing or typesetting error factor should be taken into account when making sequence comparisons.

We used the rpl2 and rbcL genes to study the differences between the non-coding regions of the Japonica and Indica cultivars. The intron in the rpl2 gene in the Indica cultivar is 665 bp long; in Japonica, it is 662 bp long. There are 18 differences within this intron region which is significantly more than the 6 differences in the 818 bp coding region. There are 18 bp in the 5' flanking region between the rpl2 initiating ATG codon and the termination codon of the rpl23 gene upstream of it. This region is the same length and nucleotide composition in the Indica and Japonica cultivars. In the 3' flanking region of the rpl2 gene, there are 55 bp between the rpl2 termination codon and the end of the trnHI gene. There are only 3 differences in this region between the two cultivars. These flanking regions are very highly conserved. However, the rpl2 gene is in a gene cluster in the large inverted repeat region which may put further constraints upon the flanking region sequences of the gene. We decided to look at the flanking regions of the rbcL gene, which is not present in a gene cluster. A comparison of 100 bp of 5' and 3' flanking region sequences showed no changes in length or nucleotide sequence between the Indica and Japonica cultivars. Therefore, even flanking regions are highly conserved between the two cultivars.

In conclusion, there are very few differences between the sequences of eleven chloroplast genes from Oryza sativa Indica and Japonica cultivars. There are more differences in the intron sequences than the coding sequences of the rpl2 gene. Up to 100 bp of the flanking sequences of the rpl2 and rbcL genes were as highly conserved as the coding sequences.

This work was supported by research grants RF84066, Allocation No. 3, from the Rockefeller Foundation, and GM29179 from the NIH, U.S. Public Health Service.

References

Cote, J. C. and R. Wu, 1989. Sequence of the chloroplast rpsl4 gene encoding the chloroplast ribosomal protein S14 from rice. Nucl. Acids Res. 17: 1780.

Devereux, J., P. Haeberli and 0. Smithies, 1984. A comprehensive set of sequence analysis programs for the VAX. Nucl. Acids Res. 12: 387-395.

Hiratsuka, J., H. Shimada, R. Whittier, T. Ishibashi, M. Sakamoto, M. Mori, C. Kondo, Y Honji, C. R. Sun, B. Y. Meng, Y. Q. Li, A. Kanno, Y. Nishizawa, A. Hirai, K. Shinozaki and M. Suglura, 1989. The complete sequence of the rice (Oryza sativa) chloroplast genome: intermolecular recombination between distinct tRNA genes accounts for a major plastid DNA inversion during the evolution of the cereals. Mol. Gen. Genet., 217: 185- 194.

Moon, E., T. H. Kao and R. Wu, 1987a. Rice chloroplast DNA molecules are heterogeneous as revealed by DNA sequences of a cluster of genes. Nucl. Acids Res. 15: 611-630.

____, _____,and _____, 1987b. Sequence of the chloroplast-encoded atpB-atpE-trnM genes clusters from rice. Nucl. Acids Res. 15: 4358-4359.

____ and R. Wu, 1988. Organization and nucleotide sequence of genes at both junctions between the two inverted repeats and the large single-copy region in the rice chloroplast genome. Gene 70: 1-12.

Wu, N. H., J. C. Cote ond R. Wu, 1986. Nucleotide sequence of the rice cytochrome f gene and the presence of sequence variation near this gene. Gene 50: 271-278.

____, ____ and ____, 1987. Structure of the chloroplast psbA gene encoding the Q\B\ protein from Oryza sativa L. Dev. Genet. 8: 339-350.