Interaction Identification and Clique Screening for Classification with Ultra-high Dimensional Discrete Features

2021 
Interactions have greatly influenced recent scientific discoveries, but the identification of interactions is challenging in ultra-high dimensions. In this study, we propose an interaction identification method for classification with ultra-high dimensional discrete features. We utilize clique sets to capture interactions among features, where features in a common clique have interactions that can be used for classification. The number of features related to the interaction is the size of the clique. Hence, our method can consider interactions caused by more than two feature variables. We propose a Kullback-Leibler divergence-based approach to correctly identify the clique sets with a probability that tends to 1 as the sample size tends to infinity. A clique screening method is then proposed to filter out clique sets that are useless for classification, and the strong sure screening property can be guaranteed. Finally, a clique naive Bayes classifier is proposed for classification. Numerical studies demonstrate that our proposed approach performs very well.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    22
    References
    0
    Citations
    NaN
    KQI
    []