Algorithm

Main Links

CPCRI Home
Bioinformatics Centre
PantibioticPred
Promoter Prediction
Physico-Chemical

Why we need to know?

With the explosion of sequence data in public and private databases and the coming explosion of gene expression data in a similar vein, it is becoming increasingly important to understand how to apply well-established data analysis and data classification methods that have been developed in other fields to this field to try to make sense of the data, to glean biological insights from it, to categorize the data, and to put all of these to good use in industrial applications.

What is the Algorithm?

The dataset consists of 639 non-redundant set of antibiotic proteins and 602 non-redundant set of non-antibiotic proteins obtained from NCBI. To reduce redundancy in the sequence ExPASy sequence alignment tool Decrease Redundancy was used with a criteria that no two sequence had >90% sequence identity to any other sequence in the dataset.

The Support Vector Machine (SVM) module developed by Thorsten Joachims was used to train the dataset. SVM light is a freely available package which can be downloaded form http://svmlight.joachims.org

Amino acid composition provides the information of protein in a vector of 20 dimensions. The amino acid composition is the fraction of each amino acid in protein.

Dipeptide composition provides information of protein in a vector of 400 dimensions. The dipeptide composition encapsulates the information about fraction of amino acids as well as thier local order.