SALZA: Practical Algorithmic Information Theory and Sharper Universal String Similarity

2020 
Practical algorithmic information theory often relies on using off-the-shelf compressors, which both limits the range of target applications and introduces inaccuracies caused by the compressors implementation constraints and the fact that they are only able to handle multiple strings through concatenation. We describe SALZA, a practical implementation of algorithmic information theory based on Lempel-Ziv routines, that is largely immune to the above issues. We focus on providing relationships that hold strictly in practice for strings of arbitrary length. We hope this work contributes to making clearer the links between Lempel-Ziv complexity, string similarity, and the use of compressors for computing information distances. The capabilities of SALZA are highlighted by computing a universal semi-distance on strings, the NSD, and by performing causal inference using the PC algorithm, an application for which off-the-shelf compressors are hardly usable. SALZA was designed to take advantage of multi-core machines.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    1
    Citations
    NaN
    KQI
    []