A sample selection algorithm based on maximum entropy and contribution

Ning Zhang,Tao Xiao

A sample selection algorithm based on maximum entropy and contribution

2010

Ning Zhang
Tao Xiao

The focus of sample selection algorithm is to decide which sample to store for generalization. Storing too many samples can result in large storage requirement and slow execution speed, and it leads to overfitting when predicting. This paper presents a new sample selection algorithm for nearest neighbor rule. In this algorithm, an evaluation function for samples is defined. According to the evaluation function, which combines maximum entropy and contribution of a sample, the most valuable samples are selected. This algorithm prefers to select samples on the boundary, and it can achieve good prediction accuracy. As certain error rate is allowed on the training data, this algorithm is noise insensitive. Experiments are conducted on both synthetic and real datasets.

Keywords:

Artificial intelligence
Machine learning
Selection algorithm
Principle of maximum entropy
k-nearest neighbors algorithm
Pattern recognition
FSA-Red Algorithm
Overfitting
Word error rate
Statistical classification
Mathematics
Evaluation function
sample selection
Computer science
Algorithm

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations