SVJAM: Joint Analysis of Structural Variants Using Linked Read Sequencing Data

2021 
Linked-read whole genome sequencing methods, such as the 10x Chromium, attach a unique molecular barcode to each high molecular weight DNA molecule. The samples are then sequenced using short-read technology. During analysis, sequence reads sharing the same barcode are aligned to adjacent genomic locations. The pattern of barcode sharing between genomic regions allows the discovery of large structural variants (SVs) in the range of 1 Kb to a few Mb. Most SV calling methods for these data, such as LongRanger, analyze one sample at a time and often produces inconsistent results for the same genomic location across multiple samples. We developed a method, SVJAM, for joint calling of SVs, using data from 152 members of the BXD family of recombinant inbred strains of mice. Our method first collects candidate SV regions from single sample analysis, such as those produced by LongRanger. We then retrieve barcode overlapping data from all samples for each region. These data are organized as a high dimensional matrix. The dimension of this matrix is then reduced using principal component analysis. Samples projected onto a two dimensional space formed by the first two principal components forms two or three clusters based on their genotype, representing the reference, alternative, or heterozygotic alleles. We developed a novel distance measure for hierarchical clustering and rotating the axes to find the optimal clustering results. We also developed an algorithm to decide whether the pattern of sample distribution is best fitted with one, two, or three genotypes. For each sample, we calculate its membership score for each genotype. We compared results produced by SVJAM with LongRanger and few methods that rely on PacBio or Oxford Nanopore data. In a comparison of SVJAM with SV detected using long-read sequencing data for the DBA/2J strain, we found that our results recovered many SVs missed by LongRanger. We also found many SVs called by LongRanger were assigned with an incorrect SV type. Our algorithm also consistently identified heterozygotic regions.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    6
    References
    0
    Citations
    NaN
    KQI
    []