Modelling haplotypes with respect to reference cohort variation graphs

2017 
Current statistical models of haplotypes are limited to cohorts of haplotypes which can be represented by arrays of values at linearly ordered bi- or multiallelic loci. These methods cannot model either structural variants or overlapping or nested variants. A variation graph is a mathematical structure can encode arbitrarily complex genetic variation. We present the first model which uses a variation graph representation of haplotypes. We present an algorithm to calculate the likelihood that a haplotype arose from a population through recombinations and demonstrate time complexity linear in haplotype length and sublinear in population size. We demonstrate mathematical extensions to allow modelling of mutations. Our results provide a starting point for haplotype inference on variation graphs. This is an essential step forward for clinical genomics and genetic epidemiology since it is the first haplotype model which can represent all sorts of variation in the population.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    11
    References
    4
    Citations
    NaN
    KQI
    []