MitImpact 2

MitImpact is a collection of pre-computed pathogenicity predictions for all possible nucleotide changes that cause non-synonymous substitution in human mitochondrial protein coding genes. The new version 2.9.1 greatly updates the tool published in (PMID: 25516408) with the features and contents below.

User guide

Search by Genomic position

Genomic positions

Search by dbSNP ID

dbSNP ID

Search by Gene or Protein position

Gene or Protein position

List of mitochondrial gene and protein identifiers

# Gene Symbol Ensembl Gene ID Ensembl Protein ID Uniprot Name Uniprot ID Ncbi Gene ID Ncbi Protein ID
1 MT-ND1 ENSG00000198888 ENSP00000354687 NU1M_HUMAN P03886 4535 YP_003024026.1
2 MT-ND2 ENSG00000198763 ENSP00000355046 NU2M_HUMAN P03891 4536 YP_003024027.1
3 MT-ND3 ENSG00000198840 ENSP00000355206 NU3M_HUMAN P03897 4537 YP_003024033.1
4 MT-ND4 ENSG00000198886 ENSP00000354961 NU4M_HUMAN P03905 4538 YP_003024031.1
5 MT-ND4L ENSG00000212907 ENSP00000354728 NU4LM_HUMAN P03901 4539 YP_003024034.1
6 MT-ND5 ENSG00000198786 ENSP00000354813 NU5M_HUMAN P03915 4540 YP_003024036.1
7 MT-ND6 ENSG00000198695 ENSP00000354665 NU6M_HUMAN P03923 4541 YP_003024037.1
8 MT-ATP6 ENSG00000198899 ENSP00000354632 ATP6_HUMAN P00846 4508 YP_003024031.1
9 MT-ATP8 ENSG00000228253 ENSP00000355265 ATP8_HUMAN P03928 4509 YP_003024030.1
10 MT-CO1 ENSG00000198804 ENSP00000354499 COX1_HUMAN P00395 4512 YP_003024028.1
11 MT-CO2 ENSG00000198712 ENSP00000354876 COX2_HUMAN P00403 4513 YP_003024029.1
12 MT-CO3 ENSG00000198938 ENSP00000354982 COX3_HUMAN P00414 4514 YP_003024032.1
13 MT-CYB ENSG00000198727 ENSP00000354554 CYB_HUMAN P00156 4519 YP_003024038.1

top

Main annotation databases

The putative effect of missense mutations within the 13 mitochondrially-encoded proteins was calculated by the following missense pathogenicity predictors:

  1. PolyPhen2 (ver. 2.2.2)
  2. SIFT (ver. 5.0.3)
  3. FatHmm (ver. 2.2, "weighted" and "unweighted" setting)
  4. MutationAssessor (ver. 2.0)
  5. PROVEAN (ver. 1.3)
  6. EFIN (accessed on May 2015)
  7. CADD (ver. 1.2)
  8. VEST (accessed through the CRAVAT webserver on June 2015)
  9. PANTHER (accessed through the Meta-SNP webserver on July 2015)
  10. PhD-SNP (accessed through the Meta-SNP webserver on July 2015)
  11. SNAP (accessed through the Meta-SNP webserver on July 2015)
  12. MutationTaster ver. 2 (accessed on December 2016)
  13. SNPdryad (accessed on December 2016)
  14. DEOGEN (accessed on October 2017)
  15. [NEW] Mitoclass.1 (accessed on December 2018)

Mutations were also annotated by these meta-predictors:

  1. CAROL (accessed on November 2014)
  2. Condel (accessed on November 2014)
  3. COVEC (vers. 0.4)
  4. Meta-SNP (accessed on July 2015)
  5. APOGEE (ver. 1.0)

Predictions can be obtained from the following web URLs:

  • PolyPhen2 - http://genetics.bwh.harvard.edu/pph2
  • SIFT - http://sift.bii.a-star.edu.sg
  • FatHmm - http://fathmm.biocompute.org.uk
  • PROVEAN - http://provean.jcvi.org
  • MutationAssessor - http://mutationassessor.org
  • EFIN - http://paed.hku.hk/efin
  • CADD - http://cadd.gs.washington.edu
  • VEST, CHASM - http://www.cravat.us
  • Meta-SNP, PANTHER, PhD-SNP, SNAP - http://snps.biofold.org/meta-snp
  • COVEC - http://sourceforge.net/projects/covec
  • CAROL - http://www.sanger.ac.uk/science/tools/carol
  • Condel - http://bg.upf.edu/fannsdb
  • MToolBox - https://github.com/mitoNGS/MToolBox/blob/master/MToolBox/data/patho_table.txt
  • MutationTaster - http://www.mutationtaster.org
  • SNPdryad - http://snps.ccbr.utoronto.ca:8080/SNPdryad/
  • [NEW] Mitoclass.1 - https://github.com/tonomartin2/MITOCLASS.1/

Extra annotations

Additional annotations were retrieved from:
  • [UPDATE] Mitomap (accessed on October 2018)
  • [UPDATE] dbSNP (ver. 151)
  • [UPDATE] ClinVar (accessed on June 2018)
  • PhyloP and PhastCons - evolutionary conservation indices (UCSC Gene Tables, group: Comparative Genomics; track: Conservation; tables: phyloP100wayAll and PhastCons100way) (UCSC accessed on May 2015)
  • SiteVar - human mtDNA site-specific variability (Hmtdb, Spring 2013)
  • [UPDATE] COSMIC - somatic variants (ver. 87)
  • MISTIC Mutual Information scores (accessed on May 2015)
  • CHASM (accessed through the CRAVAT webserver on June 2015)
  • TransFIC (accessed on May 2015)
  • Compensated Pathogenic Deviations

top

APOGEE

APOGEE is a LMT-based consensus classifier.
LMT (Logistic Model Tree) is a machine learning technique which consists of a combination of decision trees and logistic regressions at the leaves. The model is evaluated on the basis of some predictor variables that can be used for making decisions in the tree construction and selected for logistic models.
The difference between decision tree and LMT is that the former classifies all the instances belonging to a leaf with the class having the highest frequency in the leaf. While LMT constructs a logistic model for classifying the instances in the same leaf by giving, to each instance, the probability of belonging to a class.

APOGEE handles two pathogenicity classes: neutral and pathogenic. Mutations are considered as instances of the following predictors:

  • PhyloP 100V
  • PhastCons 100V
  • PolyPhen2 (HumDiv dataset)
  • SIFT
  • FatHmm (weighted version)
  • PROVEAN
  • Mutation Assessor
  • EFIN (SwissProt dataset)
  • EFIN (HumDiv dataset)
  • CADD Phred
  • PANTHER
  • PhD-SNP
  • SNAP

Once defined the classification function, we implemented and tested a bootstrap strategy, which randomly selected 70% of the pathogenic mutations and considered the same number of neutral mutations. In brief, for 100 iterations, we run this algorithm:

  • Sampling the training set, as described above;
  • Estimating the LMT;
  • Predicting the pathogenicity of all the mutations stored in the database.
Each iteration gave an estimate of pathogenicity for each variant. These were summarized by calculating the probability mean for each variant. A variant was deemed harmful if the mean of the probabilities of being harmful calculated on 100 runs resulted > 0.5. Compared to a single run of LMT, the bootstrap strategy implies a loss of generalization of the resulting model.

The LMT models generated during the 100 iterations can be downloaded here.

Comparison of the performance of classification among meta-predictors

Method Accuracy Precision FDR MCC MCR
MetaSNP 0.54 0.29 0.71 0.09 45.83
CAROL 0.59 0.33 0.67 0.13 40.28
Condel 0.49 0.23 0.78 -0.08 51.16
COVERC WMV 0.59 0.33 0.67 0.12 41.27
MToolBox DS 0.48 0.28 0.72 0.06 51.62
APOGEE bootstrap 0.84 0.73 0.27 0.59 15.97

top

Programmatic access to data

Variants can be searched by a new RESTfull interface, either directly in your browser or by curl.
The output is formatted in JSON. The empty result set is a string: {"variants": null}.

  1. curl mitimpact.css-mendel.it/api/v2.0/genomic_position/3307
  2. [range query] curl mitimpact.css-mendel.it/api/v2.0/genomic_position/3307-3309
  3. curl mitimpact.css-mendel.it/api/v2.0/dbsnp/rs3020563
  4. [multiple query] curl mitimpact.css-mendel.it/api/v2.0/dbsnp/rs3020563,rs28520706,rs1041870
  5. curl mitimpact.css-mendel.it/api/v2.0/protein_position?pos=20&id=MT-ATP6
  6. [multiple query] curl mitimpact.css-mendel.it/api/v2.0/protein_position?pos=10&id=ENSG00000198840,P00414
  7. [multiple range query] curl mitimpact.css-mendel.it/api/v2.0/protein_position?pos=10-12&id=ENSG00000198840,P00414
  8. [multiple query] curl mitimpact.css-mendel.it/api/v2.0/protein_position?pos=10,11,13&id=ENSG00000198840,P00414
  9. curl mitimpact.css-mendel.it/api/v2.0/pathogenicity?id=ID&min=9, where ID can be any gene or protein identifier in the table above. The parameter "min" specifies the minimum number of pathogenic assessments that a variant in the result set must have. This function queries: PolyPhen2, SIFT, FatHmm, FatHmmW, PROVEAN, MutationAssessor, EFIN_SP, EFIN_HD, CADD, PANTHER, PhD-SNP, SNAP and MutationTaster.
  10. curl mitimpact.css-mendel.it/api/v2.0/consensus_pathogenicity?id=ID&min=2, where ID can be any gene or protein identifier in the table above. The parameter "min" specifies the minimum number of pathogenic assessments that a variant in the result set must have. This function queries the meta-predictors: Meta-SNP, CAROL, Condel, COVEC WMV, MToolBox and APOGEE.

top