On the Minimum Copy Number Generation Problem in Cancer Genomics

2018 
In cancer genomics, due to the fast somatic mutations (mainly random segment duplications and deletions), copy number profiles (CNPs), (i.e., a file containing each of the gene numbers) are used more often than genome themselves. On the other hand, algorithms with performance analysis for processing CNPs are lacking. In a recent CPM'16 paper, Shamir et al. studied the copy number transformation problem, which is to use the minimum number of duplications and deletions (on the CNPs) to convert one CNP to another, and gave a linear time algorithm. In this paper, we consider a slightly different problem which is called Minimum Copy Number Generation (MCNG), namely, given a genome G and a specific CNP C, use the minimum number of duplications and deletions on G to obtain some genome H which has a CNP C. We show that the problem is NP-hard if G is generic (i.e., contains duplicated genes) and when the duplications are tandem. On the other hand, when only tandem duplications are allowed, if G is exemplar (or is a permutation) and all components in C are power of two's, then the problem can be solved in time linear in the length of the input (or $|C|$) plus $O(|G|log |G]|)$ (the cost for sorting $|G|$ elements). That naturally extends to a practical heuristic algorithm for the problem (when G is exemplar and the components in C are arbitrary). We also show that two variations of the MCNG problem are at least as hard as Set Cover in terms of approximability and FPT tractability. For the general Minimum Copy Number Generation problem, i.e., when both (arbitrary) segment duplications and deletions are allowed, we also design a practical greedy algorithm, present some non-trivial cases and discuss the directions for future research.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    31
    References
    5
    Citations
    NaN
    KQI
    []