Conservation of the $t$-digest Scale Invariant

2019 
A $t$-digest is a compact data structure that allows estimates of quantiles which increased accuracy near $q = 0$ or $q=1$. This is done by clustering samples from $\mathbb R$ subject to a constraint that the number of points associated with any particular centroid is constrained so that the so-called $k$-size of the centroid is always $\le 1$. The $k$-size is defined using a scale function that maps quantile $q$ to index $k$. Since the centroids are real numbers, they can be ordered and thus the quantile range of a centroid can be mapped into an interval in $k$ whose size is the $k$-size of that centroid. The accuracy of quantile estimates made using a $t$-digest depends on the invariance of this constraint even as new data is added or $t$-digests are merged. This paper provides proofs of this invariance for four practically important scale functions.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    1
    References
    4
    Citations
    NaN
    KQI
    []