Improved Variable Selection Algorithm Using a LASSO-Type Penalty, with an Application to Assessing Hepatitis B Infection Relevant Factors in Community Residents.

2015 
Objectives In epidemiological studies, it is important to identify independent associations between collective exposures and a health outcome. The current stepwise selection technique ignores stochastic errors and suffers from a lack of stability. The alternative LASSO-penalized regression model can be applied to detect significant predictors from a pool of candidate variables. However, this technique is prone to false positives and tends to create excessive biases. It remains challenging to develop robust variable selection methods and enhance predictability. Material and methods Two improved algorithms denoted the two-stage hybrid and bootstrap ranking procedures, both using a LASSO-type penalty, were developed for epidemiological association analysis. The performance of the proposed procedures and other methods including conventional LASSO, Bolasso, stepwise and stability selection models were evaluated using intensive simulation. In addition, methods were compared by using an empirical analysis based on large-scale survey data of hepatitis B infection-relevant factors among Guangdong residents. Results The proposed procedures produced comparable or less biased selection results when compared to conventional variable selection models. In total, the two newly proposed procedures were stable with respect to various scenarios of simulation, demonstrating a higher power and a lower false positive rate during variable selection than the compared methods. In empirical analysis, the proposed procedures yielding a sparse set of hepatitis B infection-relevant factors gave the best predictive performance and showed that the procedures were able to select a more stringent set of factors. The individual history of hepatitis B vaccination, family and individual history of hepatitis B infection were associated with hepatitis B infection in the studied residents according to the proposed procedures. Conclusions The newly proposed procedures improve the identification of significant variables and enable us to derive a new insight into epidemiological association analysis.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    45
    References
    34
    Citations
    NaN
    KQI
    []