Faster Approximate(d) Text-to-Pattern L1 Distance

01/28/2018
by   Przemysław Uznański, et al.
0

The problem of finding distance between pattern of length m and text of length n is a typical way of generalizing pattern matching to incorporate dissimilarity score. For both Hamming and L_1 distances only a super linear upper bound O(n√(m)) are known, which prompts the question of relaxing the problem: either by asking for 1 ±ε approximate distance (every distance is reported up to a multiplicative factor), or k-approximated distance (distances exceeding k are reported as ∞). We focus on L_1 distance, for which we show new algorithms achieving complexities respectively O(ε^-1 n) and O((m+k√(m)) · n/m). This is a significant improvement upon previous algorithms with runtime O(ε^-2 n) of Lipsky and Porat (Algorithmica 2011) and O(n√(k)) of Amir, Lipsky, Porat and Umanski (CPM 2005). We also provide a series of reductions, showing that if our upper bound for approximate L_1 distance is tight, then so is our upper bound for k-approximated L_1 distance, and if the latter is tight then so is k-approximated Hamming distance upper bound due to the result of Gawrychowski and Uznański (arXiv 2017).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/03/2018

Approximating Approximate Pattern Matching

Given a text T of length n and a pattern P of length m, the approximate ...
research
02/09/2020

Approximating Text-to-Pattern Distance via Dimensionality Reduction

Text-to-pattern distance is a fundamental problem in string matching, wh...
research
11/10/2017

Hamming distance completeness and sparse matrix multiplication

We investigate relations between (+,) vector products for binary integer...
research
11/16/2019

On q-ary Bent and Plateaued Functions

We obtain the following results. For any prime q the minimal Hamming dis...
research
01/01/2020

Approximating Text-to-Pattern Hamming Distances

We revisit a fundamental problem in string matching: given a pattern of ...
research
07/03/2019

Circular Pattern Matching with k Mismatches

The k-mismatch problem consists in computing the Hamming distance betwee...
research
11/18/2021

Hamming Distance Tolerant Content-Addressable Memory (HD-CAM) for Approximate Matching Applications

We propose a novel Hamming distance tolerant content-addressable memory ...

Please sign up or login with your details

Forgot password? Click here to reset