Small space and streaming pattern matching with k edits

06/10/2021
by   Tomasz Kociumaka, et al.
0

In this work, we revisit the fundamental and well-studied problem of approximate pattern matching under edit distance. Given an integer k, a pattern P of length m, and a text T of length n ≥ m, the task is to find substrings of T that are within edit distance k from P. Our main result is a streaming algorithm that solves the problem in Õ(k^5) space and Õ(k^8) amortised time per character of the text, providing answers correct with high probability. (Hereafter, Õ(·) hides a poly(log n) factor.) This answers a decade-old question: since the discovery of a poly(klog n)-space streaming algorithm for pattern matching under Hamming distance by Porat and Porat [FOCS 2009], the existence of an analogous result for edit distance remained open. Up to this work, no poly(klog n)-space algorithm was known even in the simpler semi-streaming model, where T comes as a stream but P is available for read-only access. In this model, we give a deterministic algorithm that achieves slightly better complexity. In order to develop the fully streaming algorithm, we introduce a new edit distance sketch parametrised by integers n≥ k. For any string of length at most n, the sketch is of size Õ(k^2) and it can be computed with an Õ(k^2)-space streaming algorithm. Given the sketches of two strings, in Õ(k^3) time we can compute their edit distance or certify that it is larger than k. This result improves upon Õ(k^8)-size sketches of Belazzougui and Zhu [FOCS 2016] and very recent Õ(k^3)-size sketches of Jin, Nelson, and Wu [STACS 2021].

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset