A sample selection algorithm based on maximum entropy and contribution

2010 
The focus of sample selection algorithm is to decide which sample to store for generalization. Storing too many samples can result in large storage requirement and slow execution speed, and it leads to overfitting when predicting. This paper presents a new sample selection algorithm for nearest neighbor rule. In this algorithm, an evaluation function for samples is defined. According to the evaluation function, which combines maximum entropy and contribution of a sample, the most valuable samples are selected. This algorithm prefers to select samples on the boundary, and it can achieve good prediction accuracy. As certain error rate is allowed on the training data, this algorithm is noise insensitive. Experiments are conducted on both synthetic and real datasets.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    20
    References
    1
    Citations
    NaN
    KQI
    []