Conservation of the $t$-digest Scale Invariant
2019
A $t$-digest is a compact data structure that allows estimates of quantiles which increased accuracy near $q = 0$ or $q=1$. This is done by clustering samples from $\mathbb R$ subject to a constraint that the number of points associated with any particular centroid is constrained so that the so-called $k$-size of the centroid is always $\le 1$. The $k$-size is defined using a scale function that maps quantile $q$ to index $k$. Since the centroids are real numbers, they can be ordered and thus the quantile range of a centroid can be mapped into an interval in $k$ whose size is the $k$-size of that centroid. The accuracy of quantile estimates made using a $t$-digest depends on the invariance of this constraint even as new data is added or $t$-digests are merged. This paper provides proofs of this invariance for four practically important scale functions.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
1
References
4
Citations
NaN
KQI