Fast Indexes for Gapped Pattern Matching

02/28/2020
by   Manuel Cáceres, et al.
0

We describe indexes for searching large data sets for variable-length-gapped (VLG) patterns. VLG patterns are composed of two or more subpatterns, between each adjacent pair of which is a gap-constraint specifying upper and lower bounds on the distance allowed between subpatterns. VLG patterns have numerous applications in computational biology (motif search), information retrieval (e.g., for language models, snippet generation, machine translation) and capture a useful subclass of the regular expressions commonly used in practice for searching source code. Our best approach provides search speeds several times faster than prior art across a broad range of patterns and texts.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/24/2020

Deep Graph Matching and Searching for Semantic Code Retrieval

Code retrieval is to find the code snippet from a large corpus of source...
research
08/16/2019

Efficient Online String Matching Based on Characters Distance Text Sampling

Searching for all occurrences of a pattern in a text is a fundamental pr...
research
07/15/2022

Matching Patterns with Variables Under Edit Distance

A pattern α is a string of variables and terminal letters. We say that α...
research
09/01/2021

Latin writing styles analysis with Machine Learning: New approach to old questions

In the Middle Ages texts were learned by heart and spread using oral mea...
research
06/19/2017

The E-Average Common Submatrix: Approximate Searching in a Restricted Neighborhood

This paper introduces a new (dis)similarity measure for 2D arrays, exten...
research
02/13/2023

SLIM: Sparsified Late Interaction for Multi-Vector Retrieval with Inverted Indexes

This paper introduces a method called Sparsified Late Interaction for Mu...

Please sign up or login with your details

Forgot password? Click here to reset