MitImpact 3D

MitImpact is a collection of pre-computed pathogenicity predictions for all possible nucleotide changes that cause non-synonymous substitution in human mitochondrial protein coding genes. The new version 3 greatly updates the tool published in (PMID: 25516408) with the features and contents below.

User guide

Search by Genomic position

Genomic positions

Search by dbSNP ID

dbSNP ID

Search by Gene or Protein position

Gene or Protein position

List of mitochondrial gene and protein identifiers

# Gene Symbol Ensembl Gene ID Ensembl Protein ID Uniprot Name Uniprot ID Ncbi Gene ID Ncbi Protein ID
1 MT-ND1 ENSG00000198888 ENSP00000354687 NU1M_HUMAN P03886 4535 YP_003024026.1
2 MT-ND2 ENSG00000198763 ENSP00000355046 NU2M_HUMAN P03891 4536 YP_003024027.1
3 MT-ND3 ENSG00000198840 ENSP00000355206 NU3M_HUMAN P03897 4537 YP_003024033.1
4 MT-ND4 ENSG00000198886 ENSP00000354961 NU4M_HUMAN P03905 4538 YP_003024031.1
5 MT-ND4L ENSG00000212907 ENSP00000354728 NU4LM_HUMAN P03901 4539 YP_003024034.1
6 MT-ND5 ENSG00000198786 ENSP00000354813 NU5M_HUMAN P03915 4540 YP_003024036.1
7 MT-ND6 ENSG00000198695 ENSP00000354665 NU6M_HUMAN P03923 4541 YP_003024037.1
8 MT-ATP6 ENSG00000198899 ENSP00000354632 ATP6_HUMAN P00846 4508 YP_003024031.1
9 MT-ATP8 ENSG00000228253 ENSP00000355265 ATP8_HUMAN P03928 4509 YP_003024030.1
10 MT-CO1 ENSG00000198804 ENSP00000354499 COX1_HUMAN P00395 4512 YP_003024028.1
11 MT-CO2 ENSG00000198712 ENSP00000354876 COX2_HUMAN P00403 4513 YP_003024029.1
12 MT-CO3 ENSG00000198938 ENSP00000354982 COX3_HUMAN P00414 4514 YP_003024032.1
13 MT-CYB ENSG00000198727 ENSP00000354554 CYB_HUMAN P00156 4519 YP_003024038.1

top

Main annotation databases

The putative effect of missense mutations within the 13 mitochondrially-encoded proteins was calculated by the following missense pathogenicity predictors:

  1. PolyPhen2 (ver. 2.2.2)
  2. SIFT (ver. 5.0.3)
  3. FatHmm (ver. 2.2, "weighted" and "unweighted" setting)
  4. MutationAssessor (ver. 2.0)
  5. PROVEAN (ver. 1.3)
  6. EFIN (accessed on May 2015)
  7. CADD (ver. 1.2)
  8. VEST (accessed through the CRAVAT webserver on June 2015)
  9. PANTHER (accessed through the Meta-SNP webserver on July 2015)
  10. PhD-SNP (accessed through the Meta-SNP webserver on July 2015)
  11. SNAP (accessed through the Meta-SNP webserver on July 2015)
  12. MutationTaster ver. 2 (accessed on December 2016)
  13. SNPdryad (accessed on December 2016)
  14. DEOGEN (accessed on October 2017)
  15. Mitoclass.1 (accessed on December 2018)

Mutations were also annotated by these meta-predictors:

  1. CAROL (accessed on November 2014)
  2. Condel (accessed on November 2014)
  3. COVEC (vers. 0.4)
  4. Meta-SNP (accessed on July 2015)
  5. APOGEE (ver. 1.0)

Predictions can be obtained from the following web URLs:

  • PolyPhen2 - http://genetics.bwh.harvard.edu/pph2
  • SIFT - http://sift.bii.a-star.edu.sg
  • FatHmm - http://fathmm.biocompute.org.uk
  • PROVEAN - http://provean.jcvi.org
  • MutationAssessor - http://mutationassessor.org
  • EFIN - http://paed.hku.hk/efin
  • CADD - http://cadd.gs.washington.edu
  • VEST, CHASM - http://www.cravat.us
  • Meta-SNP, PANTHER, PhD-SNP, SNAP - http://snps.biofold.org/meta-snp
  • COVEC - http://sourceforge.net/projects/covec
  • CAROL - http://www.sanger.ac.uk/science/tools/carol
  • Condel - http://bg.upf.edu/fannsdb
  • MToolBox - https://github.com/mitoNGS/MToolBox/blob/master/MToolBox/data/patho_table.txt
  • MutationTaster - http://www.mutationtaster.org
  • SNPdryad - http://snps.ccbr.utoronto.ca:8080/SNPdryad/
  • Mitoclass.1 - https://github.com/tonomartin2/MITOCLASS.1/

Extra annotations

Additional annotations were retrieved from:
  • Mitomap (accessed on October 2018)
  • dbSNP (ver. 151)
  • ClinVar (accessed on June 2018)
  • PhyloP and PhastCons - evolutionary conservation indices (UCSC Gene Tables, group: Comparative Genomics; track: Conservation; tables: phyloP100wayAll and PhastCons100way) (UCSC accessed on May 2015)
  • SiteVar - human mtDNA site-specific variability (Hmtdb, Spring 2013)
  • COSMIC - somatic variants (ver. 87)
  • MISTIC Mutual Information scores (accessed on May 2015)
  • CHASM (accessed through the CRAVAT webserver on June 2015)
  • TransFIC (accessed on May 2015)
  • Compensated Pathogenic Deviations - amino acid substitutions that are reported to be pathogenic in human population but occur as wild type residues in non-human ortholog proteins. We identified mitochondrial CPDs by: extracting pathogenicity evidences for non-synonymous human variants from MITOMAP and dbSNP-ClinVar resources; identifying homologous positions for those variants within orthologous protein aligment (taxon: Mammalia); analyzing sequence context around investigated positions and removing highly diverging sequences; counting the number of surviving orthologous sequences carrying the variants (a sort of CPD occurring frequency within Mammalian species). We also provide several details for each identified CPD: reference amino acid (human); alternative amino acid (human pathogenic or reference in non-human species); position of the amino acid within the multi-alignment; NCBI Refseq ID, NCBI Taxon ID and scientific name for the species in which the human alternative amino acid is the reference one. See Jordan et al.2015 (PMID:26123021), Azevedo et al. 2017 (27703146) for a deep insight into the CPD topic.
  • Site A-B InterP, Site A-B IntraP - Pairwise covariation analyses was implemented by using two alternate methods in the I-COMS resource. For each subunit couple of every Respiratory Chain Complex (e.g. CO1 vs. CO2, CO2 vs CO3, CO1 vs CO3 for Complex IV), co-specific protein reference sequences were concatenated, forming a unique aligment; for each site pair of that alignment, two covariation measures were calculated (cMI, mfDCA). The top500 high scoring site pairs were retained (suggested cutoff) and those located into the two proteins (“inter-protein” analysis) are shown in MitImpact and were further considered for structural analysis. Intra-protein high scoring sites: each RC protein was analyzed by producing concatenated alignment with the same “control protein”, ND1. Top500 high scoring pairs were retrieved from both I-COMS methods, but only those with sites located into the queried protein were retained. Note that a certain protein site could have different intra-protein or inter-protein high scoring sites. Co-variation does not necessarily imply functional or evolutionary relationship. The cutoff of 500 has been chosen because no other reasonable common threshold could be imposed for different protein alignments.
  • EV Mutation - EV Mutation algorithm (https://marks.hms.harvard.edu/evmutation/index.html) predicts intra-protein sites that significantly co-variate each other. From its “Download” page we retrieved covariation scores for mitochondrially encoded RC subunits (‘Coupling_scores.csv’) and extracted couples with score >=0.063 (top 5% of the score distribution). “EVMutation variant A” defines the member A of a specific pair; “EVMutation variant B”: associated member B or members; “Coupling Score”: list of EVMutation score for variants A and B.

top

APOGEE

APOGEE is a LMT-based consensus classifier.
LMT (Logistic Model Tree) is a machine learning technique which consists of a combination of decision trees and logistic regressions at the leaves. The model is evaluated on the basis of some predictor variables that can be used for making decisions in the tree construction and selected for logistic models.
The difference between decision tree and LMT is that the former classifies all the instances belonging to a leaf with the class having the highest frequency in the leaf. While LMT constructs a logistic model for classifying the instances in the same leaf by giving, to each instance, the probability of belonging to a class.

APOGEE handles two pathogenicity classes: neutral and pathogenic. Mutations are considered as instances of the following predictors:

  • PhyloP 100V
  • PhastCons 100V
  • PolyPhen2 (HumDiv dataset)
  • SIFT
  • FatHmm (weighted version)
  • PROVEAN
  • Mutation Assessor
  • EFIN (SwissProt dataset)
  • EFIN (HumDiv dataset)
  • CADD Phred
  • PANTHER
  • PhD-SNP
  • SNAP

Once defined the classification function, we implemented and tested a bootstrap strategy, which randomly selected 70% of the pathogenic mutations and considered the same number of neutral mutations. In brief, for 100 iterations, we run this algorithm:

  • Sampling the training set, as described above;
  • Estimating the LMT;
  • Predicting the pathogenicity of all the mutations stored in the database.
Each iteration gave an estimate of pathogenicity for each variant. These were summarized by calculating the probability mean for each variant. A variant was deemed harmful if the mean of the probabilities of being harmful calculated on 100 runs resulted > 0.5. Compared to a single run of LMT, the bootstrap strategy implies a loss of generalization of the resulting model.

The LMT models generated during the 100 iterations can be downloaded here.

Comparison of the performance of classification among meta-predictors

Method Accuracy Precision FDR MCC MCR
MetaSNP 0.54 0.29 0.71 0.09 45.83
CAROL 0.59 0.33 0.67 0.13 40.28
Condel 0.49 0.23 0.78 -0.08 51.16
COVERC WMV 0.59 0.33 0.67 0.12 41.27
MToolBox DS 0.48 0.28 0.72 0.06 51.62
APOGEE bootstrap 0.84 0.73 0.27 0.59 15.97

top

Programmatic access to data

Variants can be searched by a new RESTfull interface, either directly in your browser or by curl.
The output is formatted in JSON. The empty result set is a string: {"variants": null}.

  1. curl mitimpact.css-mendel.it/api/v2.0/genomic_position/3307
  2. [range query] curl mitimpact.css-mendel.it/api/v2.0/genomic_position/3307-3309
  3. curl mitimpact.css-mendel.it/api/v2.0/dbsnp/rs3020563
  4. [multiple query] curl mitimpact.css-mendel.it/api/v2.0/dbsnp/rs3020563,rs28520706,rs1041870
  5. curl mitimpact.css-mendel.it/api/v2.0/protein_position?pos=20&id=MT-ATP6
  6. [multiple query] curl mitimpact.css-mendel.it/api/v2.0/protein_position?pos=10&id=ENSG00000198840,P00414
  7. [multiple range query] curl mitimpact.css-mendel.it/api/v2.0/protein_position?pos=10-12&id=ENSG00000198840,P00414
  8. [multiple query] curl mitimpact.css-mendel.it/api/v2.0/protein_position?pos=10,11,13&id=ENSG00000198840,P00414
  9. curl mitimpact.css-mendel.it/api/v2.0/pathogenicity?id=ID&min=9, where ID can be any gene or protein identifier in the table above. The parameter "min" specifies the minimum number of pathogenic assessments that a variant in the result set must have. This function queries: PolyPhen2, SIFT, FatHmm, FatHmmW, PROVEAN, MutationAssessor, EFIN_SP, EFIN_HD, CADD, PANTHER, PhD-SNP, SNAP and MutationTaster.
  10. curl mitimpact.css-mendel.it/api/v2.0/consensus_pathogenicity?id=ID&min=2, where ID can be any gene or protein identifier in the table above. The parameter "min" specifies the minimum number of pathogenic assessments that a variant in the result set must have. This function queries the meta-predictors: Meta-SNP, CAROL, Condel, COVEC WMV, MToolBox and APOGEE.

top