An alternative description of power law correlations in DNA sequences

2019 
Abstract We analyze the coding sequence for the Homo Sapiens via a model which naturally embraces power law correlations (PLC) among the bases in DNA sequences of living organisms. This model is based on a principle of universal optimization, which is the core of all statistical arguments, being associated with the power law distribution function of the length of DNA, measured in base pairs (bp). This distribution provides a PLC parameter introduced through a nonadditive framework in which such parameter measures the PLC in the DNA sequence. The results show that the Short-Range-Correlations (SRC), always present in coding DNA sequences, are appropriately captured through the power law distribution, adequately describing the cumulative length distribution of DNA bases, in contrast with the case of the traditional exponential statistical model. We use an Empirical cumulative distribution function and the database of proteins compiled by the Ensembl Project to show that the power law distribution provides the best description of the data. A Bayesian analysis of the data further confirms this result.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    71
    References
    3
    Citations
    NaN
    KQI
    []