On the Complexity of BWT-runs Minimization via Alphabet Reordering

11/08/2019
by   Jason Bentley, et al.
0

We present the first set of results on the computational complexity of minimizing BWT-runs via alphabet reordering. We prove that the decision version of this problem is NP-complete and cannot be solved in time 2^o(σ)n unless the Exponential Time Hypothesis fails, where σ is the size of the alphabet. Moreover, we show that optimization variations of this problem yield strong inapproximability results. In doing so we relate two previously disparate topics: the size of a path cover of a graph and the number of runs in the BWT of a text. This provides a surprising connection between problems on graphs and string compression. As a result we are able to prove (all assuming P ≠ NP): (i) No PTAS exists if we define the cost of a solution as exactly the number of runs exceeding σ; (ii) For all δ > 0, no polytime ϵ n^1/2-approximation algorithm exists for ϵ > 0 small enough if we consider the number of runs exceeding (1+δ)σ as the cost of a solution. In this case the problem is APX-hard as well. To the best of our knowledge these are the first ever inapproximability results pertaining to the BWT. In addition, by relating recent results in the field of dictionary compression, we demonstrate that if we define cost purely as the number of runs, we obtain a log^2 n-approximation algorithm. Finally, we provide an efficient algorithm for the more restricted problem of finding an optimal ordering on a subset of symbols (occurring only once) under ordering constraints which runs in optimal time for small values of σ. We also look at a version of the problem on the newly discovered class of graphs with BWT like properties called Wheeler graphs. Here also we show NP-hardness results on a related problem which we call Source Ordering.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset