FS4RVDD: A Feature Selection Algorithm for Random Variables with Discrete Distribution
2018
Feature Selection is a crucial step for inferring regression and classification models in QSPR (Quantitative Structure–Property Relationship) applied to Cheminformatics. A particularly complex case of QSPR modelling occurs in Polymer Informatics because the features under analysis require the management of uncertainty. In this paper, a novel feature selection method for addressing this special QSPR scenario is presented. The proposed methodology assumes that each feature is characterized by a probabilistic distribution of values associated with the polydispersity of the polymers included in the training dataset. This new algorithm has two sequential steps: ranking of the features, generated by correlation analysis, and iterative subset reduction, obtained by feature redundancy analysis. A prototype of the algorithm has been implemented in order to conduct a proof of concept. The method performance has been evaluated by using synthetic datasets of different sizes and varying the cardinality of the feature selected sub-sets. These preliminary results allow concluding that the chosen mathematical representation and the proposed method is suitable for managing the uncertainty inherent to the polymerization. Nevertheless, this research constitutes a piece of work in progress and additional experiments should be conducted in the future in order to assess the actual benefits and limitations of this methodology.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
17
References
2
Citations
NaN
KQI