MitImpact 3D

MitImpact 3 collects genomic, clinical and functional annotations for all possible human missense variants. The latest release focuses on variant interactions, by providing scores of sequence co-variation and effect compensation.

User guide

Search by Genomic position

Search by Genomic position and alleles

Search for genomic positions and alleles

It is thus possible to specify a genomic position and a variant, in the form REF>ALT, straight in the browser. If valid, this request will redirect to the result page, where the information about this variant will be displayed in the first tab.

Search by dbSNP ID

Search by Gene or Protein position

Annotate a VCF file

List of mitochondrial gene and protein identifiers

#	Gene Symbol	Ensembl Gene ID	Ensembl Protein ID	Uniprot Name	Uniprot ID	Ncbi Gene ID	Ncbi Protein ID
1	MT-ND1	ENSG00000198888	ENSP00000354687	NU1M_HUMAN	P03886	4535	YP_003024026.1
2	MT-ND2	ENSG00000198763	ENSP00000355046	NU2M_HUMAN	P03891	4536	YP_003024027.1
3	MT-ND3	ENSG00000198840	ENSP00000355206	NU3M_HUMAN	P03897	4537	YP_003024033.1
4	MT-ND4	ENSG00000198886	ENSP00000354961	NU4M_HUMAN	P03905	4538	YP_003024031.1
5	MT-ND4L	ENSG00000212907	ENSP00000354728	NU4LM_HUMAN	P03901	4539	YP_003024034.1
6	MT-ND5	ENSG00000198786	ENSP00000354813	NU5M_HUMAN	P03915	4540	YP_003024036.1
7	MT-ND6	ENSG00000198695	ENSP00000354665	NU6M_HUMAN	P03923	4541	YP_003024037.1
8	MT-ATP6	ENSG00000198899	ENSP00000354632	ATP6_HUMAN	P00846	4508	YP_003024031.1
9	MT-ATP8	ENSG00000228253	ENSP00000355265	ATP8_HUMAN	P03928	4509	YP_003024030.1
10	MT-CO1	ENSG00000198804	ENSP00000354499	COX1_HUMAN	P00395	4512	YP_003024028.1
11	MT-CO2	ENSG00000198712	ENSP00000354876	COX2_HUMAN	P00403	4513	YP_003024029.1
12	MT-CO3	ENSG00000198938	ENSP00000354982	COX3_HUMAN	P00414	4514	YP_003024032.1
13	MT-CYB	ENSG00000198727	ENSP00000354554	CYB_HUMAN	P00156	4519	YP_003024038.1

top

Main annotation databases

The putative effect of missense mutations within the 13 mitochondrially-encoded proteins was calculated by the following missense pathogenicity predictors:

PolyPhen2 (ver. 2.2.2)
SIFT (ver. 5.0.3)
FatHmm (ver. 2.2, "weighted" and "unweighted" setting)
MutationAssessor (ver. 2.0)
PROVEAN (ver. 1.3)
EFIN (accessed on May 2015)
CADD (ver. 1.2)
VEST (accessed through the CRAVAT webserver on June 2015)
PANTHER (accessed through the Meta-SNP webserver on July 2015)
PhD-SNP (accessed through the Meta-SNP webserver on July 2015)
SNAP (accessed through the Meta-SNP webserver on July 2015)
MutationTaster ver. 2 (accessed on December 2016)
SNPdryad (accessed on December 2016)
DEOGEN (accessed on October 2017)
Mitoclass.1 (accessed on December 2018)

Mutations were also annotated by these meta-predictors:

CAROL (accessed on November 2014)
Condel (accessed on November 2014)
COVEC (ver. 0.4)
Meta-SNP (accessed on July 2015)
APOGEE (ver. 2)

Predictions can be obtained from the following web URLs:

PolyPhen2 - http://genetics.bwh.harvard.edu/pph2
SIFT - http://sift.bii.a-star.edu.sg
FatHmm - http://fathmm.biocompute.org.uk
PROVEAN - http://provean.jcvi.org
MutationAssessor - http://mutationassessor.org
EFIN - http://paed.hku.hk/efin
CADD - http://cadd.gs.washington.edu
VEST, CHASM - http://www.cravat.us
Meta-SNP, PANTHER, PhD-SNP, SNAP - http://snps.biofold.org/meta-snp
COVEC - http://sourceforge.net/projects/covec
CAROL - http://www.sanger.ac.uk/science/tools/carol
Condel - http://bg.upf.edu/fannsdb
MToolBox - https://github.com/mitoNGS/MToolBox/blob/master/MToolBox/data/patho_table.txt
MutationTaster - http://www.mutationtaster.org
SNPdryad - http://snps.ccbr.utoronto.ca:8080/SNPdryad/
Mitoclass.1 - https://github.com/tonomartin2/MITOCLASS.1/

top

APOGEE 2

APOGEE 2 is a mitochondrially-centered ensemble method resulting from a 20-fold cross-validation repeated five times, where 19 folds of its training set were iteratively used for the training and tuning of the hyperparameters of a KNN RusSmote ML algorithm and the remaining fold was used for testing. The performance of the method was assessed after selecting the best set of hyperparameters. This was done using an inner 10-fold Grid-Search cross-validation.
An extensive description can be found here.

APOGEE 2 aggregates information of the following predictors and features:

EFIN (HumDiv dataset)
EFIN (SwissProt dataset)
SNAP
PROVEAN
MtMam
Mutation Assessor
PhD-SNP
Nucleotide coordinates
Amino acid coordinates (X-axis)
Amino acid coordinates (Y-axis)
Amino acid coordinates (Z-axis)
PANTHER
PhyloP 100V
FatHmm
FatHmm (weighted version)
VEST
PolyPhen2 (HumDiv dataset)
SIFT
ΔΔG
CADD Phred
MutationTaster
PhastCons 100V

APOGEE 2 refers to five pathogenicity classes: benign, likely-benign, VUS, likely-pathogenic, and pathogenic, which are inferred from a pathogenicity probability provided by APOGEE, which, in turn, is calculated on the KNN RusSmote-relative prediction score.
Score/probability thresholds are represented in this figure:

APOGEE 2 scores and probbailities

Please note that:

We have split the probability scores range, i.e., P=0.1 - P=0.9, into tertilies since one may be especially interested in differentiating between a high-scoring VUS (i.e., closer to the likely pathogenic threshold) and a low-scoring VUS (i.e., closer to the likely benign threshold) with the aim to investigate high or low-scoring VUSs deeper and, therefore, to move variants to the final category.
Thresholds of pathogenicity will undoubtedly change as more features are added to the predictor, and more clinical or functional evidence is published. Please refer to this website for the most up-to-date thresholds and scores.

Comparison of the performance of classification among meta-predictors

Method	MCC	Precision	auPR curve	auROC curve	Accuracy	Balanced accuracy	Sensitivity	Specificity
MetaSNP	0.323	0.195	0.448	0.883	0.721	0.790	0.871	0.709
CAROL	0.329	0.209	0.235	0.827	0.754	0.785	0.821	0.749
Condel	0.283	0.160	0.249	0.832	0.629	0.767	0.929	0.605
COVEC WMV	0.376	0.239	0.307	0.867	0.787	0.816	0.850	0.782
MToolBox DS	0.277	0.156	0.439	0.889	0.621	0.762	0.929	0.596
APOGEE 1	0.385	0.266	0.573	0.855	0.823	0.802	0.779	0.826
APOGEE 2	0.569 ± 0.041	0.431 ± 0.035	0.716 ± 0.054	0.95 ± 0.016	0.9 ± 0.011	0.888 ± 0.027	0.874 ± 0.053	0.903 ± 0.011

top

Analysis of variant interaction

Site co-variation

Pairwise co-variation analyses was implemented using two alternate methods implemented in I-COMS (http://i-coms.leloir.org.ar). For each pair of the subunits of every Respiratory Chain Complex (e.g. CO1 vs. CO2, CO2 vs CO3, CO1 vs CO3 for Complex IV), the tool allows to:

create a concatenated alignment given two co-specific protein reference sequences;
restrict the sequence search on a pre-defined taxon (Mammalia in this case);
calculate two covariation measures for each site pair of alignments (corrected MI, mfDCA).

The top500 (cutoff suggested by the I-COMS authors) high-scoring site pairs were retained: those whose members are located into the two distinct proteins are named inter-protein. Top500 high scoring pairs were defined intra-protein if both the variants fell into the same queried protein (which was concatenated with ND1 by default). Note that a certain protein site could have different intra-protein or inter-protein co-varying site partners. Furthermore, site co-variation does not necessarily imply the existence of any real functional or evolutionary relationship. I-COMS was used here because of its simplicity, completeness and responsivity.

Raw I-COMS score matrices and protein alignments relative to the current version of MitImpact and generated using I-COMS are available from this link.

Compensated Pathogenic Deviations

CPDs are amino acid substitutions that are reported to be pathogenic in the human population, but occur as wild-type residues in non-human ortholog proteins. We identified mitochondrial CPDs by:

extracting pathogenicity evidences for non-synonymous human variants from MITOMAP and dbSNP-ClinVar resources (last access in March 2020);
identifying homologous positions for those variants that were found in orthologous protein alignments (taxon: Mammalia, alignments available upon request at bioinformatics@css-mendel.it);
analyzing sequence context (± 5bp surrounding the investigated positions) and removing CPD candidates if more than 3 proximal sites differ from the human reference;
counting the number of retained sequences carrying the variants on the total number of aligned sequences.

For each putative CPD, we have then defined the:

reference amino acid (human);
alternative amino acid (clinically significant in human, reference in non-human species);
position of the amino acid in the alignment;
NCBI Refseq ID, NCBI Taxon ID and scientific name for the species carrying the CPD. See Jordan et al.2015 for a deep investigation of CPDs into nuclear proteins and Azevedo et al. 2017 for methodological details on CPD search into mitochondrial genome.

Binding affinity

Inter and intra-protein relationships between co-varying variants were investigated energetically. FoldX 4.0 was used to calculate the free-energy changes upon mutation of residues lying at the interaction interface. Alternative amino acids that caused a ΔΔG to exceed the cutoff suggested by the authors (±0.61 Kcal/mol) for the single mutant were tagged as disruptive. Pairs of mutants with ΔΔG conservatively close to zero (< ±0.1 Kcal/mol) were considered as structurally compensative.

In particular, MitImpact reports:

ΔΔG intraP: The free-energy change of folding of pairs of co-varying amino acid variants occurring in the same protein, individually or if considered together;
ΔΔG intraP interface: ΔΔG_bind of interaction energy. Both variants of the pair belong to one of the proteins and are located in their interaction interface.
ΔΔG interP: ΔΔG_bind of interaction energy calculated as the energetic difference between wild-type and mutant proteins. Variants are located in different proteins.

Molecular dynamics simulation

As a pilot study, we have selected all pairs of variants obtained with I-COMS and predicted to be energetically compensative, where at least one of the pairs was reported as pathogenetic in the MITOMAP database. For these pairs, we looked for the corresponding human 3D structures into the Protein Data Bank and investigated the interacting properties of the wild-type complex as well as of the single and double-mutated complexes. We then ran ten replicas of four independent classical molecular dynamics simulations of 50 nanoseconds (cf. methods here).

With the aim to understand whether a protein carrying the two mutations of a pair was stable and close to the wild-type structure, the following measures were calculated on the simulation trajectories:

RMSD: The root-mean-square deviation measures the average distance between all heavy atoms (in this case C_α atomic coordinates) with respect to the reference X-ray structure. RMSD af all heavy atoms is shown in the RMSD plot in the result page of the Molecular Dynamics section. This plot shows the RMSD profiles along the simulations of the wild-type, single and double mutants.
RMSF: The root mean square fluctuation measures the deviation over time between the positions of the C_α atomic coordinates of each residue with respect to the reference X-ray structure. As for RMSD, RMSF is plotted in the result page of the Molecular Dynamics section for the A and B chains.
Binding energy: The components of the binding energy were calculated using the MM-PBSA method, except for the entropic term and the energetic contribution of each residue to the binding, which were calculated by means of the energy decomposition scheme. The binding energy plot is shown in the result page of the Molecular Dynamics section. This plot shows the binding energy profiles along the simulations of the wild-type, single and double mutants.
Essential dynamics: Principal Component Analysis was performed to extract the essential dynamical motions of the wild-type, single and double mutants by filtering global slow motions from the fast motions. Essential molecular dynamics are summarized in short clips in the result page of the Molecular Dynamics section. Interface residues are colored according to their binding energy values.
Hydrogen-bonds: Per-residue hydrogen-bonds were enumerated by the g_hbond GROMACS module, setting an angle cutoff of 30° and a donor-acceptor distance of 3.5 Å. In the result page of the Molecular Dynamics section all hydrogen-bonds (acceptor and donor residue positions) and the corresponding resident times (as the percentage of the total simulation time) are reported for the wild-type, single and double mutants.

top

Programmatic access to data

Variants can be searched by a new RESTfull interface, either directly in your browser or by curl.
The output is formatted in JSON. The empty result set is a string: {"variants": null}.

curl mitimpact.css-mendel.it/api/v2.0/ genomic_position/3307
[range query] curl mitimpact.css-mendel.it/api/v2.0/genomic_position/3307-3309
[locus and variant] curl mitimpact.css-mendel.it/api/v2.0/search_allele/6253/T>A
curl mitimpact.css-mendel.it/api/v2.0/dbsnp /rs3020563
[multiple query] curl mitimpact.css-mendel.it/api/v2.0/dbsnp/rs3020563,rs28520706,rs1041870
curl mitimpact.css-mendel.it/api/v2.0/ protein_position?pos=20&id=MT-ATP6
[multiple query] curl mitimpact.css-mendel.it/api/v2.0/protein_position?pos=10&id=ENSG00000198840,P00414
[multiple range query] curl mitimpact.css-mendel.it/api/v2.0/ protein_position ?pos=10-12&id=ENSG00000198840,P00414
[multiple query] curl mitimpact.css-mendel.it/api/v2.0/protein_position ?pos=10,11,13&id=ENSG00000198840,P00414
curl mitimpact.css-mendel.it/api/v2.0/ pathogenicity?id=ID&min=9, where ID can be any gene or protein identifier in the table above. The parameter "min" specifies the minimum number of pathogenic assessments that a variant in the result set must have. This function queries: PolyPhen2, SIFT, FatHmm, FatHmmW, PROVEAN, MutationAssessor, EFIN_SP, EFIN_HD, CADD, PANTHER, PhD-SNP, SNAP and MutationTaster.
curl mitimpact.css-mendel.it/api/v2.0/ consensus_pathogenicity?id=ID&min=2, where ID can be any gene or protein identifier in the table above. The parameter "min" specifies the minimum number of pathogenic assessments that a variant in the result set must have. This function queries the meta-predictors: Meta-SNP, CAROL, Condel, COVEC WMV, MToolBox and APOGEE.

top

choose

choose version