Comparative Genomics

Data available

  • Gene trees are constructed using the longest protein for every gene in Ensembl.Genomes Homologues are deduced from these trees. Proteins are clustered based on Best-Reciprocal Hits and Blast Score Ratios, and each cluster of proteins is aligned using Muscle. Finally, TreeBeST is used to produce a gene tree from each multiple alignment, reconciling it with the species tree to call duplication events. More information
  • Whole genome alignments are performed using multiple species. More information →
  • Ancestral sequences are calculated from multi-species whole genome alignments. More information→
  • Conservation scores and constrained elements are calculated from the whole genome multiple alignments. More information→
  • Syntenies are calculated from the multiple alignments. More information→

Access

Data can be accessed using the Compara Perl API, BioMart, or comparative genomics pages on the browser. Gene trees can be viewed from any 'Gene' page on the browser, and exported via the control panel.

Species Trees

Pan Taxonomic Compara

The Pan Taxonomic Compara is built using representative genomes from all significant clades represented in Ensembl and EnsemblGenomes, to offer a broad view of homologous relationships from across the taxonomy. It is available from here.

Clade Compara

The taxonomy tree for all species is well-defined enough to be used in the peptide analysis.