Isoelectric Point Calculator 2.0
Prediction of isoelectric point and pKa dissociation constants using deep learning


Feature selection procedure

One of the first steps in machine learning is the identification of informative features. In this project apart from the amino acid sequence, one can identify additional features to boost the prediction accuracy. For this purpose, AAindex database has been used.

The most informative AAindex features for protein isoelectric point prediction (IPC_protein_75 dataset)

The most informative AAindex features for peptide isoelectric point prediction (IPC2_peptide_75 dataset)
For brevity, the feature selection for individual pKa datasets (8 similar tables) had been skipped.

Augmentation

The available data for IPC_protein_100 and IPC2_pKa datasets were highly limited. This was way below the level that could be used in deep learning, therefore the augmentation technique had been used (details of the augmentation scheme will be released after manuscript publication).

Deep learning

The amino acid sequences (one-hot-encoding) plus extra features are converted into vectors that can be used in machine learning supervised training. The number of architectures had been tested. The final model architecture consists of the mixture of the convolution and dense layers, ReLU, Softplus and Softsign activation functions. To avoid the overfitting 10-fold cross-validation and the dropout had been used. Details of the model will be released after manuscript publication.

In a nutshell, we used:

&



Contact: Lukasz P. Kozlowski