Automated granule discovery in continuous data for feature selection
2021
Abstract Real-world database applications possess massive data collections with different data formats such as continuous, discrete or nominal. Continuous data makes the analysis process more complex as the data can take any value within a particular range and so granule mining has been used recently with techniques such as neighbourhood rough sets to discover granules in continuous data. This approach is yet to address the granule resolution design concepts, so this paper presents a novel method, Hierarchical Clustering-based Granulation (HCluG) to improve the granule identification of continuous data by combining hierarchical clustering with neighborhood rough sets, reducing user involvement in granule resolution parameters tuning and introducing an automated granule discovery method. HCluG comprises a feature selection method to evaluate the quality of the granules generated with the proposed granule approximations. Experimental results show HCluG reduces the number of selected features while improving the classification performance. HCluG outperforms the rough sets-based feature selection baselines when used with K-Nearest Neighbours and Radial Basis Function Support Vector Machine on average and performs better on average than using the complete feature set. This method can be used in data analysis to achieve high classification performance with a fewer number of features and less user involvement.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
47
References
0
Citations
NaN
KQI