Magnaporthe grisea Multigene Family Analysis

This analysis clusters together genes with similar protein sequences. This is not a categorization of genes based on function - only based on protein similarity to other genes within the genome.

See Genes by Multigene Family for a listing of all multigene families.

Multigene Family Analysis:

Methods

We identified multigene families in the Magnaporthe grisea genome by running blastp on the entire proteome. In order to filter out protein domains, we classified as paralogs only those genes with alignments (E<= 1e-5, score>10) that covered more than 60% of the longer gene and contained over 30% amino acid identity. *

Genes were grouped into multigene families by single linkage. In order to measure the similarity between genes within a family, we calculated the Average Percent Identity for all blastp alignments between genes within the family. Additionally we computed the family Completeness ratio (observed # hits)/(total possible # hits) between all genes in the family.

Note: Multigene Family names are suspect! The name of the Multigene family was set automatically from one of the genes within the family, and does not reflect the best possible family name. Work remains to better determine these labels.

Families were assigned unique number IDs, in decreasing order of family size. Thus family 1 is the largest family, followed by family 2, etc.

There is currently a slight error in the calculation of Completeness, causing some families to report a percentage greater than 100%. This is due to the fact that a few protein pairs have several overlapping alignments. This error will be fixed in the next few days.

Additionally, the Average Percent Identity is calculated by using the number of identities over the length of the alignment, rather than the percent identity over the entire length of two protein sequences. This error will also be corrected in the next few days.

Data

There are 527 Multigene families, containing a total of 1723 genes (17%). The average Multigene Family contains 2.7 genes, but the range is between 2 and 48 genes per family. Below is a table showing the distribution of family sizes.
  • 662 Multigene Families
  • 2928 genes in a Multigene Family
  • 166 genes in the largest Multigene Family
  • 4.4 genes in the average Multigene Family
# Genes/FamilyNumber Families% Families
237512.8
31354.6
4521.8
526.9
612.4
79.3
86.2
94.1
105.2
116.2
122.1
133.1
144.1
153.1
161.0
201.0
221.0
232.1
251.0
282.1
311.0
321.0
352.1
361.0
411.0
491.0
581.0
771.0
861.0
1051.0
1661.0

References

* Similar to methods used for Vibrio cholerae (BLASTX E<=1e-5,>60% length of query ORF) [1], Thermotoga maritima (BLASTX E<=1e-5,>60% length of query ORF) [3], Helicobacter pylori (FASTA, >60% length of smaller ORF) [4], Archaeoglobus fulgidus (FASTA, >60% length of smaller ORF) [2].

[1] Heidelberg, John F. et al. DNA sequence of both chromosomes of the cholera pathogen Vibrio cholerae. Nature 406, 477-483 (2000).

[2] Klenk, Hans-Peter. et al. The complete genome sequence of the hyperthermophilic, sulphate-reducing archaeon Archaeoglobus fulgidus. Nature 390, 364-370 (1997).

[3] Nelson, Karen E. et al. Evidence for lateral gene transfer between Archaea and Bacteria from genome sequence of Thermotoga maritima. Nature 399, 323-329 (1999).

[4] Tomb, J.-F. et al. The complete genome sequence of the gastic pathogen Helicobacter pylori. Nature 388, 359-547 (1997).