NEAREST NEIGHBOR BIAS IN THE SUBSTITUTION OF MISSING VALUES

2021 
We present a simplified illustration of the bias inherent in the general case of the Nearest Neighbor (NN) method used to substitute missing values. This presentation doesn't make any assumptions about the geometry of the sampled subjects. The general examples illustrate that the bias exists mainly at the limits of the data range and not necessarily within the center part of the range. However, the latter is also possible around any significant data gaps. Since the NN data domain stretches across an arbitrary subject characteristic rather than across the physical space, it is possible to reduce the bias by assuring that the domain range of the considered attribute is well-represented within its entire range, especially at its upper and lower limits.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []