Genomic Problems Involving Copy Number Profiles: Complexity and Algorithms
Recently, due to the genomic sequence analysis in several types of cancer, the genomic data based on copy number profiles ( CNP for short) are getting more and more popular. A CNP is a vector where each component is a non-negative integer representing the number of copies of a specific gene or segment of interest. In this paper, we present two streams of results. The first is the negative results on two open problems regarding the computational complexity of the Minimum Copy Number Generation (MCNG) problem posed by Qingge et al. in 2018. It was shown by Qingge et al. that the problem is NP-hard if the duplications are tandem and they left the open question of whether the problem remains NP-hard if arbitrary duplications are used. We answer this question affirmatively in this paper; in fact, we prove that it is NP-hard to even obtain a constant factor approximation. We also prove that the parameterized version is W[1]-hard, answering another open question by Qingge et al. The other result is positive and is based on a new (and more general) problem regarding CNP's. The Copy Number Profile Conforming (CNPC) problem is formally defined as follows: given two CNP's C_1 and C_2, compute two strings S_1 and S_2 with cnp(S_1)=C_1 and cnp(S_2)=C_2 such that the distance between S_1 and S_2, d(S_1,S_2), is minimized. Here, d(S_1,S_2) is a very general term, which means it could be any genome rearrangement distance (like reversal, transposition, and tandem duplication, etc). We make the first step by showing that if d(S_1,S_2) is measured by the breakpoint distance then the problem is polynomially solvable.
READ FULL TEXT