Nonsingular subsampling for S-estimators with categorical predictors

2012 
An integral part of many algorithms for S-estimators of linear regression is random subsampling. For problems with only continuous predictors simple random subsampling is a reliable method to generate initial coefficient estimates that can then be further refined. For data with categorical predictors, however, random subsampling often does not work, thus limiting the use of an otherwise fine estimator. This also makes the choice of estimator for robust linear regression dependent on the type of predictors, which is an unnecessary nuisance in practice. For data with categorical predictors random subsampling often generates singular subsamples. Since these subsamples cannot be used to calculate coefficient estimates, they have to be discarded. This makes random subsampling slow, especially if some levels of categorical predictors have low frequency, and renders the algorithms infeasible for such problems. This paper introduces an improved subsampling algorithm that only generates nonsingular subsamples. We call it nonsingular subsampling. For data with continuous variables it is as fast as simple random subsampling but much faster for data with categorical predictors. This is achieved by using a modified LU decomposition algorithm that combines the generation of a sample and the solving of the least squares problem.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    4
    References
    1
    Citations
    NaN
    KQI
    []