Isoelectric Point Calculator 2.0
Prediction of isoelectric point and pKa dissociation constants using deep learning


Isolectric point prediction accuracy on leave-out 25% datasets


a - Protein dataset consisting of 581 proteins (25% randomly chosen proteins, not used for the feature selection, training or optimisation).
b - Peptide dataset consisting of 10,902 peptides (25% randomly chosen peptides, not used for the feature selection, training or optimisation).
c - The outliers are defined at 0.5 and 0.25 pH unit difference between the predicted and experimental pI threshold for the protein and peptide datasets, respectively.
NA - The PredpI program was designed for peptides only within the 3.7–4.9 pH range; thus, for proteins, it returned 0 and could not be evaluated on the protein dataset.

The DL models developed in this study are in bold and sorted by RMSD.

Conclusions: IPC2 provides two deep learned models customized for proteins and peptides. Both provide more accurate and robust prediction of isoelectric point.

Isolectric point prediction accuracy on training datasets (with 10-fold cross-validation)


a - Protein dataset consisting of 1,743 proteins (75% randomly chosen proteins, used for the feature selection, training or optimisation).
b - Peptide dataset consisting of 31,427 peptides (75% randomly chosen peptides, used for the feature selection, training or optimisation).
c - The outliers are defined at 0.5 and 0.25 pH unit difference between the predicted and experimental pI threshold for the protein and peptide datasets, respectively.
NA - The PredpI program was designed for peptides only within the 3.7–4.9 pH range; thus, for proteins, it returned 0 and could not be evaluated on the protein dataset.


pKa prediction accuracy on Rosetta pKa dataset (test set)

a - For the validation of pKa, the dataset from Kilambi and Gray (2012) was used (260* residues from 34 proteins). The numbers next to the residue type indicate the number of cases and the average pKa value with standard deviation.
b - The outliers are defined at 0.5 pH unit difference between the predicted and experimental pKa threshold.
* - The dataset consists of 260 instead of 264 residues due to parsing problems (four missing residues could not be mapped to the protein sequence, wrong residue register).

Conclusions: The pKa prediction accuracy using IPC2_pKa model is in comparable range to the Rosseta-based approach, yet IPC2_pKa return the predictions within seconds and uses only sequence information.

pKa prediction accuracy on PKAD dataset (1075 charged residues) - train set

All predictions for individual methods are available as 7z archive.
Contact: Lukasz P. Kozlowski