MitImpact 3D

MitImpact 3 collects genomic, clinical and functional annotations for all possible human missense variants. The latest release focuses on variant interactions, by providing scores of sequence co-variation and effect compensation.

User guide

Search by Genomic position

Search for genomic positions

Search by Genomic position and alleles

Search for genomic positions and alleles

It is thus possible to specify a genomic position and a variant, in the form REF>ALT, straight in the browser. If valid, this request will redirect to the result page, where the information about this variant will be displayed in the first tab.

Search by dbSNP ID

Search for dbSNP ID

Search by Gene or Protein position

Search for Gene or Protein position

Annotate a VCF file

Annotate a vcf file

List of mitochondrial gene and protein identifiers

# Gene Symbol Ensembl Gene ID Ensembl Protein ID Uniprot Name Uniprot ID Ncbi Gene ID Ncbi Protein ID
1 MT-ND1 ENSG00000198888 ENSP00000354687 NU1M_HUMAN P03886 4535 YP_003024026.1
2 MT-ND2 ENSG00000198763 ENSP00000355046 NU2M_HUMAN P03891 4536 YP_003024027.1
3 MT-ND3 ENSG00000198840 ENSP00000355206 NU3M_HUMAN P03897 4537 YP_003024033.1
4 MT-ND4 ENSG00000198886 ENSP00000354961 NU4M_HUMAN P03905 4538 YP_003024031.1
5 MT-ND4L ENSG00000212907 ENSP00000354728 NU4LM_HUMAN P03901 4539 YP_003024034.1
6 MT-ND5 ENSG00000198786 ENSP00000354813 NU5M_HUMAN P03915 4540 YP_003024036.1
7 MT-ND6 ENSG00000198695 ENSP00000354665 NU6M_HUMAN P03923 4541 YP_003024037.1
8 MT-ATP6 ENSG00000198899 ENSP00000354632 ATP6_HUMAN P00846 4508 YP_003024031.1
9 MT-ATP8 ENSG00000228253 ENSP00000355265 ATP8_HUMAN P03928 4509 YP_003024030.1
10 MT-CO1 ENSG00000198804 ENSP00000354499 COX1_HUMAN P00395 4512 YP_003024028.1
11 MT-CO2 ENSG00000198712 ENSP00000354876 COX2_HUMAN P00403 4513 YP_003024029.1
12 MT-CO3 ENSG00000198938 ENSP00000354982 COX3_HUMAN P00414 4514 YP_003024032.1
13 MT-CYB ENSG00000198727 ENSP00000354554 CYB_HUMAN P00156 4519 YP_003024038.1

top

Main annotation databases

The putative effect of missense mutations within the 13 mitochondrially-encoded proteins was calculated by the following missense pathogenicity predictors:

  1. PolyPhen2 (ver. 2.2.2)
  2. SIFT (ver. 5.0.3)
  3. FatHmm (ver. 2.2, "weighted" and "unweighted" setting)
  4. MutationAssessor (ver. 2.0)
  5. PROVEAN (ver. 1.3)
  6. EFIN (accessed on May 2015)
  7. CADD (ver. 1.2)
  8. VEST (accessed through the CRAVAT webserver on June 2015)
  9. PANTHER (accessed through the Meta-SNP webserver on July 2015)
  10. PhD-SNP (accessed through the Meta-SNP webserver on July 2015)
  11. SNAP (accessed through the Meta-SNP webserver on July 2015)
  12. MutationTaster ver. 2 (accessed on December 2016)
  13. SNPdryad (accessed on December 2016)
  14. DEOGEN (accessed on October 2017)
  15. Mitoclass.1 (accessed on December 2018)

Mutations were also annotated by these meta-predictors:

  1. CAROL (accessed on November 2014)
  2. Condel (accessed on November 2014)
  3. COVEC (vers. 0.4)
  4. Meta-SNP (accessed on July 2015)
  5. APOGEE (ver. 1.0)

Predictions can be obtained from the following web URLs:

  • PolyPhen2 - http://genetics.bwh.harvard.edu/pph2
  • SIFT - http://sift.bii.a-star.edu.sg
  • FatHmm - http://fathmm.biocompute.org.uk
  • PROVEAN - http://provean.jcvi.org
  • MutationAssessor - http://mutationassessor.org
  • EFIN - http://paed.hku.hk/efin
  • CADD - http://cadd.gs.washington.edu
  • VEST, CHASM - http://www.cravat.us
  • Meta-SNP, PANTHER, PhD-SNP, SNAP - http://snps.biofold.org/meta-snp
  • COVEC - http://sourceforge.net/projects/covec
  • CAROL - http://www.sanger.ac.uk/science/tools/carol
  • Condel - http://bg.upf.edu/fannsdb
  • MToolBox - https://github.com/mitoNGS/MToolBox/blob/master/MToolBox/data/patho_table.txt
  • MutationTaster - http://www.mutationtaster.org
  • SNPdryad - http://snps.ccbr.utoronto.ca:8080/SNPdryad/
  • Mitoclass.1 - https://github.com/tonomartin2/MITOCLASS.1/

top

APOGEE

APOGEE is a LMT-based consensus classifier.
LMT (Logistic Model Tree) is a machine learning technique which consists of a combination of decision trees and logistic regressions at the leaves. The model is evaluated on the basis of some predictor variables that can be used for making decisions in the tree construction and selected for logistic models.
The difference between decision tree and LMT is that the former classifies all the instances belonging to a leaf with the class having the highest frequency in the leaf. While LMT constructs a logistic model for classifying the instances in the same leaf by giving, to each instance, the probability of belonging to a class.

APOGEE handles two pathogenicity classes: neutral and pathogenic. Mutations are considered as instances of the following predictors:

  • PhyloP 100V
  • PhastCons 100V
  • PolyPhen2 (HumDiv dataset)
  • SIFT
  • FatHmm (weighted version)
  • PROVEAN
  • Mutation Assessor
  • EFIN (SwissProt dataset)
  • EFIN (HumDiv dataset)
  • CADD Phred
  • PANTHER
  • PhD-SNP
  • SNAP

Once defined the classification function, we implemented and tested a bootstrap strategy, which randomly selected 70% of the pathogenic mutations and considered the same number of neutral mutations. In brief, for 100 iterations, we run this algorithm:

  • Sampling the training set, as described above;
  • Estimating the LMT;
  • Predicting the pathogenicity of all the mutations stored in the database.
Each iteration gave an estimate of pathogenicity for each variant. These were summarized by calculating the probability mean for each variant. A variant was deemed harmful if the mean of the probabilities of being harmful calculated on 100 runs resulted > 0.5. Compared to a single run of LMT, the bootstrap strategy implies a loss of generalization of the resulting model.

The LMT models generated during the 100 iterations can be downloaded here.

Comparison of the performance of classification among meta-predictors

Method Accuracy Precision FDR MCC MCR
MetaSNP 0.54 0.29 0.71 0.09 45.83
CAROL 0.59 0.33 0.67 0.13 40.28
Condel 0.49 0.23 0.78 -0.08 51.16
COVERC WMV 0.59 0.33 0.67 0.12 41.27
MToolBox DS 0.48 0.28 0.72 0.06 51.62
APOGEE bootstrap 0.84 0.73 0.27 0.59 15.97

top

Analysis of variant interaction

Site co-variation

Pairwise co-variation analyses was implemented using two alternate methods implemented in I-COMS (http://i-coms.leloir.org.ar). For each pair of the subunits of every Respiratory Chain Complex (e.g. CO1 vs. CO2, CO2 vs CO3, CO1 vs CO3 for Complex IV), the tool allows to:

  • create a concatenated alignment given two co-specific protein reference sequences;
  • restrict the sequence search on a pre-defined taxon (Mammalia in this case);
  • calculate two covariation measures for each site pair of alignments (corrected MI, mfDCA).

The top500 (cutoff suggested by the I-COMS authors) high-scoring site pairs were retained: those whose members are located into the two distinct proteins are named inter-protein. Top500 high scoring pairs were defined intra-protein if both the variants fell into the same queried protein (which was concatenated with ND1 by default). Note that a certain protein site could have different intra-protein or inter-protein co-varying site partners. Furthermore, site co-variation does not necessarily imply the existence of any real functional or evolutionary relationship. I-COMS was used here because of its simplicity, completeness and responsivity.

Raw I-COMS score matrices and protein alignments relative to the current version of MitImpact and generated using I-COMS are available from this link.


Compensated Pathogenic Deviations

CPDs are amino acid substitutions that are reported to be pathogenic in the human population, but occur as wild-type residues in non-human ortholog proteins. We identified mitochondrial CPDs by:

  • extracting pathogenicity evidences for non-synonymous human variants from MITOMAP and dbSNP-ClinVar resources (last access in March 2020);
  • identifying homologous positions for those variants that were found in orthologous protein alignments (taxon: Mammalia, alignments available upon request at bioinformatics@css-mendel.it);
  • analyzing sequence context (± 5bp surrounding the investigated positions) and removing CPD candidates if more than 3 proximal sites differ from the human reference;
  • counting the number of retained sequences carrying the variants on the total number of aligned sequences.

For each putative CPD, we have then defined the:


Binding affinity

Inter and intra-protein relationships between co-varying variants were investigated energetically. FoldX 4.0 was used to calculate the free-energy changes upon mutation of residues lying at the interaction interface. Alternative amino acids that caused a ΔΔG to exceed the cutoff suggested by the authors (±0.61 Kcal/mol) for the single mutant were tagged as disruptive. Pairs of mutants with ΔΔG conservatively close to zero (< ±0.1 Kcal/mol) were considered as structurally compensative.

In particular, MitImpact reports:


Molecular dynamics simulation

As a pilot study, we have selected all pairs of variants obtained with I-COMS and predicted to be energetically compensative, where at least one of the pairs was reported as pathogenetic in the MITOMAP database. For these pairs, we looked for the corresponding human 3D structures into the Protein Data Bank and investigated the interacting properties of the wild-type complex as well as of the single and double-mutated complexes. We then ran ten replicas of four independent classical molecular dynamics simulations of 50 nanoseconds (cf. methods here).

With the aim to understand whether a protein carrying the two mutations of a pair was stable and close to the wild-type structure, the following measures were calculated on the simulation trajectories:

top

Programmatic access to data

Variants can be searched by a new RESTfull interface, either directly in your browser or by curl.
The output is formatted in JSON. The empty result set is a string: {"variants": null}.

  1. curl mitimpact.css-mendel.it/api/v2.0/genomic_position/3307
  2. [range query] curl mitimpact.css-mendel.it/api/v2.0/genomic_position/3307-3309
  3. [locus and variant] curl mitimpact.css-mendel.it/api/v2.0/search_allele/6253/T>A
  4. curl mitimpact.css-mendel.it/api/v2.0/dbsnp/rs3020563
  5. [multiple query] curl mitimpact.css-mendel.it/api/v2.0/dbsnp/rs3020563,rs28520706,rs1041870
  6. curl mitimpact.css-mendel.it/api/v2.0/protein_position?pos=20&id=MT-ATP6
  7. [multiple query] curl mitimpact.css-mendel.it/api/v2.0/protein_position?pos=10&id=ENSG00000198840,P00414
  8. [multiple range query] curl mitimpact.css-mendel.it/api/v2.0/protein_position?pos=10-12&id=ENSG00000198840,P00414
  9. [multiple query] curl mitimpact.css-mendel.it/api/v2.0/protein_position?pos=10,11,13&id=ENSG00000198840,P00414
  10. curl mitimpact.css-mendel.it/api/v2.0/pathogenicity?id=ID&min=9, where ID can be any gene or protein identifier in the table above. The parameter "min" specifies the minimum number of pathogenic assessments that a variant in the result set must have. This function queries: PolyPhen2, SIFT, FatHmm, FatHmmW, PROVEAN, MutationAssessor, EFIN_SP, EFIN_HD, CADD, PANTHER, PhD-SNP, SNAP and MutationTaster.
  11. curl mitimpact.css-mendel.it/api/v2.0/consensus_pathogenicity?id=ID&min=2, where ID can be any gene or protein identifier in the table above. The parameter "min" specifies the minimum number of pathogenic assessments that a variant in the result set must have. This function queries the meta-predictors: Meta-SNP, CAROL, Condel, COVEC WMV, MToolBox and APOGEE.

top