Automatic features generation and selection from external sources: A DBpedia use case

2022 
Abstract Feature engineering is one of the major challenges of machine learning. While multiple automation solutions have been proposed in recent years, the vast majority focuses on extracting features from the analyzed dataset itself and not from other (external) sources. In this study we present FGSES, a general framework for automatic feature engineering and its application to DBpedia. Our framework automatically matches the entities in the analyzed dataset to those of the external data source, and then proceeds to generate a large and diverse set of candidate features, both from structured and unstructured content. To efficiently process the large number of generated features, FGSES uses a meta learning-based ranking approach. Our evaluation, conducted on 18 tabular datasets with diverse characteristics, shows that FGSES achieves an average error reduction of 16.5%, significantly outperforming the evaluated baselines.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    39
    References
    0
    Citations
    NaN
    KQI
    []