Inference of a Dyadic Measure and Its Simplicial Geometry from Binary Feature Data and Application to Data Quality

2019 
We propose a new method for representing data sets with an ordered set of binary features which summarizes both measure-theoretic and topological properties. The method does not require any assumption of metric space properties for the data. A data set with an ordered set of binary features is viewed as a dyadic set with a dyadic measure. We prove that dyadic sets with dyadic measures have a canonical set of binary features and determine canonical nerve simplicial complexes. The method computes the two related representations: multiscale parameters for the dyadic measure and the Betti numbers of the simplicial complex. The dyadic product formula representation formulated in previous work is exploited. The parameters characterize the relative skewness of the measure at dyadic scales and localities. The more abstract Betti number statistics summarize the simplicial geometry of the support of the measure. We prove that they provide a simple privacy property. Our methods are compared with other results for measures on sets with tree structures, recent multi-resolution theory, and computational topology. We illustrate the method on a data quality data set and propose future research directions.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    16
    References
    0
    Citations
    NaN
    KQI
    []