Concentration bounds for the empirical angular measure with statistical learning applications
2021
The angular measure on the unit sphere characterizes the first-order
dependence structure of the components of a random vector in extreme regions
and is defined in terms of standardized margins. Its statistical recovery is an
important step in learning problems involving observations far away from the
center. In the common situation when the components of the vector have
different distributions, the rank transformation offers a convenient and robust
way of standardizing data in order to build an empirical version of the angular
measure based on the most extreme observations. However, the study of the
sampling distribution of the resulting empirical angular measure is
challenging. It is the purpose of the paper to establish finite-sample bounds
for the maximal deviations between the empirical and true angular measures,
uniformly over classes of Borel sets of controlled combinatorial complexity.
The bounds are valid with high probability and scale essentially as the square
root of the effective sample size, up to a logarithmic factor. Discarding the
most extreme observations yields a truncated version of the empirical angular
measure for which the logarithmic factor in the concentration bound is replaced
by a factor depending on the truncation level. The bounds are applied to
provide performance guarantees for two statistical learning procedures tailored
to extreme regions of the input space and built upon the empirical angular
measure: binary classification in extreme regions through empirical risk
minimization and unsupervised anomaly detection through minimum-volume sets of
the sphere.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
38
References
2
Citations
NaN
KQI