MitImpact 3 collects genomic, clinical and functional annotations for all possible human missense variants. The latest release focuses on variant interactions, by providing scores of sequence co-variation and effect compensation.
It is thus possible to specify a genomic position and a variant, in the form REF>ALT, straight in the browser. If valid, this request will redirect to the result page, where the information about this variant will be displayed in the first tab.
|#||Gene Symbol||Ensembl Gene ID||Ensembl Protein ID||Uniprot Name||Uniprot ID||Ncbi Gene ID||Ncbi Protein ID|
The putative effect of missense mutations within the 13 mitochondrially-encoded proteins was calculated by the following missense pathogenicity predictors:
Mutations were also annotated by these meta-predictors:
Predictions can be obtained from the following web URLs:
APOGEE is a LMT-based consensus classifier. LMT (Logistic Model Tree) is a machine learning technique which consists of a combination of decision trees and logistic regressions at the leaves. The model is evaluated on the basis of some predictor variables that can be used for making decisions in the tree construction and selected for logistic models. The difference between decision tree and LMT is that the former classifies all the instances belonging to a leaf with the class having the highest frequency in the leaf. While LMT constructs a logistic model for classifying the instances in the same leaf by giving, to each instance, the probability of belonging to a class.
APOGEE handles two pathogenicity classes: neutral and pathogenic. Mutations are considered as instances of the following predictors:
Once defined the classification function, we implemented and tested a bootstrap strategy, which randomly selected 70% of the pathogenic mutations and considered the same number of neutral mutations. In brief, for 100 iterations, we run this algorithm:
The LMT models generated during the 100 iterations can be downloaded here.
Pairwise co-variation analyses was implemented using two alternate methods implemented in I-COMS (http://i-coms.leloir.org.ar). For each pair of the subunits of every Respiratory Chain Complex (e.g. CO1 vs. CO2, CO2 vs CO3, CO1 vs CO3 for Complex IV), the tool allows to:
The top500 (cutoff suggested by the I-COMS authors) high-scoring site pairs were retained: those whose members are located into the two distinct proteins are named inter-protein. Top500 high scoring pairs were defined intra-protein if both the variants fell into the same queried protein (which was concatenated with ND1 by default). Note that a certain protein site could have different intra-protein or inter-protein co-varying site partners. Furthermore, site co-variation does not necessarily imply the existence of any real functional or evolutionary relationship. I-COMS was used here because of its simplicity, completeness and responsivity.
Raw I-COMS score matrices and protein alignments relative to the current version of MitImpact and generated using I-COMS are available from this link.
CPDs are amino acid substitutions that are reported to be pathogenic in the human population, but occur as wild-type residues in non-human ortholog proteins. We identified mitochondrial CPDs by:
For each putative CPD, we have then defined the:
Inter and intra-protein relationships between co-varying variants were investigated energetically. FoldX 4.0 was used to calculate the free-energy changes upon mutation of residues lying at the interaction interface. Alternative amino acids that caused a ΔΔG to exceed the cutoff suggested by the authors (±0.61 Kcal/mol) for the single mutant were tagged as disruptive. Pairs of mutants with ΔΔG conservatively close to zero (< ±0.1 Kcal/mol) were considered as structurally compensative.
In particular, MitImpact reports:
As a pilot study, we have selected all pairs of variants obtained with I-COMS and predicted to be energetically compensative, where at least one of the pairs was reported as pathogenetic in the MITOMAP database. For these pairs, we looked for the corresponding human 3D structures into the Protein Data Bank and investigated the interacting properties of the wild-type complex as well as of the single and double-mutated complexes. We then ran ten replicas of four independent classical molecular dynamics simulations of 50 nanoseconds (cf. methods here).
With the aim to understand whether a protein carrying the two mutations of a pair was stable and close to the wild-type structure, the following measures were calculated on the simulation trajectories: