Comparative Genomics

Data available

  • Gene trees are constructed using the longest protein for every gene in Ensembl.Genomes Homologues are deduced from these trees. Proteins are clustered based on Best-Reciprocal Hits and Blast Score Ratios, and each cluster of proteins is aligned using Muscle. Finally, TreeBeST is used to produce a gene tree from each multiple alignment, reconciling it with the species tree to call duplication events. More information
  • Families are constructed by clustering of all Ensembl Genomes proteins, i.e. not only the longest protein. Proteins from complete genomes present in Ensembl and UniProtKB (SwissProt and TrEMBL) are added to extend the protein set. More information →
  • Whole genome alignments are performed using multiple species. More information →
  • Ancestral sequences are calculated from multi-species whole genome alignments. More information→
  • Conservation scores and constrained elements are calculated from the whole genome multiple alignments. More information→
  • Syntenies are calculated from the multiple alignments. More information→

Access

Data can be accessed using the Compara Perl API, BioMart, or comparative genomics pages on the browser. Gene trees can be viewed from any 'Gene' page on the browser, and exported via the control panel.

Species Trees

Pan Taxonomic Compara

The Pan Taxonomic Compara is built using representative genomes from all significant clades represented in Ensembl and EnsemblGenomes, to offer a broad view of homologous relationships from across the taxonomy. The species tree used is a combination of Ensembl's taxonomy based tree and a manually calculated tree for bacteria. It is available from here.

Clade Compara

The species tree for plants was generated by Gramene.