STATISTICAL INFERENCE WITH F-STATISTICS WHEN FITTING SIMPLE MODELS TO HIGH-DIMENSIONAL DATA

2021 
We study linear subset regression in the context of the high-dimensional overall model $y = \vartheta+\theta' z + \epsilon$ with univariate response $y$ and a $d$-vector of random regressors $z$, independent of $\epsilon$. Here, "high-dimensional" means that the number $d$ of available explanatory variables is much larger than the number $n$ of observations. We consider simple linear sub-models where $y$ is regressed on a set of $p$ regressors given by $x = M'z$, for some $d \times p$ matrix $M$ of full rank $p < n$. The corresponding simple model, i.e., $y=\alpha+\beta' x + e$, can be justified by imposing appropriate restrictions on the unknown parameter $\theta$ in the overall model; otherwise, this simple model can be grossly misspecified. In this paper, we establish asymptotic validity of the standard $F$-test on the surrogate parameter $\beta$, in an appropriate sense, even when the simple model is misspecified.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    34
    References
    0
    Citations
    NaN
    KQI
    []