Detection of Copy Number Variations from Targeted Re-sequencing data

2013 
Copy Number Variations (CNVs) are an important class of genetic alterations and have been associated with several human diseases. Currently, most of the established methods detect CNVs from microarrays and whole genome re-sequencing experiments. Here we present a novel method for the identification of CNVs from targeted re-sequencing screenings, which are widely used for genotyping and for identifying putative pathogenic mutations. Our method exploits the read depth difference in sequencing coverage between samples. It is based on the assumption that amplified genes have higher coverage in the test samples compared to the control, whereas deleted genes have lower coverage. Since sequencing coverage is variable owing to probe specificity and efficiency within and between samples, we perform two types of coverage normalization. The first is a median normalization that reduces the coverage variation among targeted regions within the sample. The second is a cross-sample normalization that lessens coverage variation between samples. After coverage normalization, we calculate the log2ratio between fold changes of test and control to assess the differences in copy number and use principal component analysis (PCA) to remove possible false positives due to particularly low coverage in some genes. To establish the optimal value of PCA confidence interval that minimized false positives, we used the comparison between CNVs detected in whole genome and whole exome data for the same sample and established this value equal to 90%. We applied our CNV detection method to mouse and human hepatocellular carcinomas where it was instrumental to identify the driver amplification of the JNK pathway in these tumors (see abstract by Iannelli et al).
    • Correction
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []