Complexity and Similarity for Sequences using LZ77-based conditional information measure

2019 
This work concerns the definition of conditional mutual information in the framework of Algorithmic Information Theory (AIT), which is of use when no probabilistic model of the data is available, or hard to devise. We introduce a practical way to construct a conditional mutual information quantity which respects the chain rule and the data processing inequalityThe proposed implementation, named SALZA, allows to accomplish various information-theoretic tasks on sequences. The algorithmic model of the data used in this work is that of the well-known Lempel-Ziv primitive: we assume new data is to be expressed in terms of references to prior data.SALZA enables a flexible specification of prior data and extracts information quantities based on the significance of the references to these prior data. The tool readily implements the computation of an information measure based on LZ77 and a universal classifier based on the Ziv-Merhav relative coder for the universal clustering of sequences.Illustration of the proposed implementation is provided on clustering and causality inference examples.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    7
    References
    1
    Citations
    NaN
    KQI
    []