PHONI: Streamed Matching Statistics with Multi-Genome References

11/11/2020
by   Christina Boucher, et al.
0

Computing the matching statistics of patterns with respect to a text is a fundamental task in bioinformatics, but a formidable one when the text is a highly compressed genomic database. Bannai et al. gave an efficient solution for this case, which Rossi et al. recently implemented, but it uses two passes over the patterns and buffers a pointer for each character during the first pass. In this paper, we simplify their solution and make it streaming, at the cost of slowing it down slightly. This means that, first, we can compute the matching statistics of several long patterns (such as whole human chromosomes) in parallel while still using a reasonable amount of RAM; second, we can compute matching statistics online with low latency and thus quickly recognize when a pattern becomes incompressible relative to the database.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/31/2021

Computing Matching Statistics on Repetitive Texts

Computing the matching statistics of a string P[1..m] with respect to a ...
research
01/13/2023

Computing matching statistics on Wheeler DFAs

Matching statistics were introduced to solve the approximate string matc...
research
01/31/2020

Boundary solution based on rescaling method: recoup the first and second-order statistics of neuron network dynamics

There is a strong nexus between the network size and the computational r...
research
04/09/2023

Combinatorial Statistics on Pattern-avoiding Permutations

The study of Mahonian statistics dated back to 1915 when MacMahon showed...
research
09/14/2015

Natural scene statistics mediate the perception of image complexity

Humans are sensitive to complexity and regularity in patterns. The subje...
research
11/22/2019

HAL: Improved Text-Image Matching by Mitigating Visual Semantic Hubs

The hubness problem widely exists in high-dimensional embedding space an...
research
07/12/2020

Fiducial Matching for the Approximate Posterior: F-ABC

F-ABC is introduced, using universal sufficient statistics, unlike previ...

Please sign up or login with your details

Forgot password? Click here to reset