MitImpact 3 collects genomic, clinical and functional annotations for all possible human missense variants. The latest release focuses on variant interactions, by providing scores of sequence co-variation and effect compensation.
It is thus possible to specify a genomic position and a variant, in the form REF>ALT, straight in the browser. If valid, this request will redirect to the result page, where the information about this variant will be displayed in the first tab.
|#||Gene Symbol||Ensembl Gene ID||Ensembl Protein ID||Uniprot Name||Uniprot ID||Ncbi Gene ID||Ncbi Protein ID|
The putative effect of missense mutations within the 13 mitochondrially-encoded proteins was calculated by the following missense pathogenicity predictors:
Mutations were also annotated by these meta-predictors:
Predictions can be obtained from the following web URLs:
APOGEE is a mitochondrially-centered ensemble method resulting from a 20-fold cross-validation repeated five times, where 19 folds of its training set were iteratively used for the training and tuning of the hyperparameters of a KNN RusSmote ML algorithm and the remaining fold was used for testing. The performance of the method was assessed after selecting the best set of hyperparameters. This was done using an inner 10-fold Grid-Search cross-validation.
APOGEE refers to five pathogenicity classes: benign, likely-benign, VUS, likely-pathogenic, and pathogenic, which are inferred from a pathogenicity probability provided by APOGEE, which, in turn, is calculated on the KNN RusSmote-relative prediction score.
APOGEE aggregates information of the following predictors and features:
|Method||MCC||Precision||auPR curve||auROC curve||Accuracy||Balanced accuracy||Sensitivity||Specificity|
|APOGEE 2||0.569 ± 0.041||0.431 ± 0.035||0.716 ± 0.054||0.95 ± 0.016||0.9 ± 0.011||0.888 ± 0.027||0.874 ± 0.053||0.903 ± 0.011|
Pairwise co-variation analyses was implemented using two alternate methods implemented in I-COMS (http://i-coms.leloir.org.ar). For each pair of the subunits of every Respiratory Chain Complex (e.g. CO1 vs. CO2, CO2 vs CO3, CO1 vs CO3 for Complex IV), the tool allows to:
The top500 (cutoff suggested by the I-COMS authors) high-scoring site pairs were retained: those whose members are located into the two distinct proteins are named inter-protein. Top500 high scoring pairs were defined intra-protein if both the variants fell into the same queried protein (which was concatenated with ND1 by default). Note that a certain protein site could have different intra-protein or inter-protein co-varying site partners. Furthermore, site co-variation does not necessarily imply the existence of any real functional or evolutionary relationship. I-COMS was used here because of its simplicity, completeness and responsivity.
Raw I-COMS score matrices and protein alignments relative to the current version of MitImpact and generated using I-COMS are available from this link.
CPDs are amino acid substitutions that are reported to be pathogenic in the human population, but occur as wild-type residues in non-human ortholog proteins. We identified mitochondrial CPDs by:
For each putative CPD, we have then defined the:
Inter and intra-protein relationships between co-varying variants were investigated energetically. FoldX 4.0 was used to calculate the free-energy changes upon mutation of residues lying at the interaction interface. Alternative amino acids that caused a ΔΔG to exceed the cutoff suggested by the authors (±0.61 Kcal/mol) for the single mutant were tagged as disruptive. Pairs of mutants with ΔΔG conservatively close to zero (< ±0.1 Kcal/mol) were considered as structurally compensative.
In particular, MitImpact reports:
As a pilot study, we have selected all pairs of variants obtained with I-COMS and predicted to be energetically compensative, where at least one of the pairs was reported as pathogenetic in the MITOMAP database. For these pairs, we looked for the corresponding human 3D structures into the Protein Data Bank and investigated the interacting properties of the wild-type complex as well as of the single and double-mutated complexes. We then ran ten replicas of four independent classical molecular dynamics simulations of 50 nanoseconds (cf. methods here).
With the aim to understand whether a protein carrying the two mutations of a pair was stable and close to the wild-type structure, the following measures were calculated on the simulation trajectories: