Approximation de Poisson du nombre de répétitions dans des chaînes de Markov d'ordre m ≥1 : Application à l'étude de significativité dans des séquences d'ADN

2006 
Genomes are dynamic and redondant structures which are regularly subject to mutations, deletions, duplications and inversions. In order to better understand the structure of genomes and their mecanism of evolution, it is important to make some statistical significance analyses of repeats. The goal of this thesis consists in studying the statistical significance of the number of repeats of by a given length t observed in a given sequence, denoted Nobst. This statistical study relies on the evaluation of the distribution of the random count Nt in some relevant random sequences. It will then allow to calculate the p-value. We start by studying the one-order Markov chain model and treat the general case of m-order Markov chain models m ≥1. We have used the Chen-Stein method to bound the approximation error when the number of repeats of length t is approximated by a Poisson variable. We show that this error converges to 0. To validate the Poisson approximation, some simulations were done. The calculation of the p-value has been implemented for several genomes.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []