Achieving the Capacity of the DNA Storage Channel

2020 
Significant advances in biochemical technologies, such as synthesizing and sequencing devices, have made DNA a competitive medium for archival data storage. In this paper we analyze storage systems based on these macromolecules from an information theoretic perspective. Using an appropriate channel model for the synthesis and sequencing steps, we study the maximum achievable information density per nucleotide for reliable and error resilient data storage. The channel model features the main attributes that characterize DNA-based data storage. That is, information is synthesized onto many short DNA strands, and each strand is copied many times. Due to the storage and sequencing methods, the receiver draws strands from these synthesized strands in an uncontrollable manner, where it is possible that strands are drawn multiple times and also that some strands are not drawn at all. Additionally, due to imperfections, the obtained strands can contain errors. Here we prove the achievability of a recently published upper bound on the Shannon capacity of this channel for a large range of parameters by proposing and analyzing a decoder that clusters received strands according to their similarity and then efficiently estimates the original strands based on these clusters.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    21
    References
    9
    Citations
    NaN
    KQI
    []