A study to find a potent feature by combining the various disulphide bonds of protein using data mining technique

Suprativ Saha,Twinkle Paul,Tanmay Bhattacharya

A study to find a potent feature by combining the various disulphide bonds of protein using data mining technique

2021

In the sphere of bioinformatics, the identification of an effective protein feature, is of the essence. The fruitfulness of any classification technique, relies heavily on the identification of informative and distinct features. Various pre-existing classifiers recognised the use of a single type of disulphide bond (viz, parallel, or alternate) as a useful feature. However, the computational efficiency may be increased by the identification of appropriate combination of disulphide bonds, as a single feature. Hence, in this paper, the various combinations of disulphide bonds have been studied, to formulate a potent protein feature. It can be utilised in various studies, for achieving better protein classification results, without incorporating redundant data. After that, a data mining approach has been applied on the seven different combinations of disulphide bonds (viz. parallel, alternate and quad) to identify the best feature. A statistical analysis conducted in terms of confusion matrix and various point metrics (such as, sensitivity, specificity, recall and precision), resulted in a high level of accuracy and F score, for the feature, formed by the combination of two disulphide bonds i.e. alternative and quad bond. The average F Score achieved in this combination is approximately, 0.9 and the average accuracy level turned out to be more than 93%. These turn out to be an unprecedented level of precision for any individual feature, considered so far, in any research methodology. Also, the combination of two disulphide bonds instead of three ensures less computational time. The overall analytical results, in this study, revealed that the combination of alternative and quad disulphide bonds can be used as an effective feature in any form of protein classification.

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations