Efficient tree-structured categorical retrieval

06/02/2020
by   Djamal Belazzougui, et al.
0

We study a document retrieval problem in the new framework where D text documents are organized in a category tree with a pre-defined number h of categories. This situation occurs e.g. with taxomonic trees in biology or subject classification systems for scientific literature. Given a string pattern p and a category (level in the category tree), we wish to efficiently retrieve the t categorical units containing this pattern and belonging to the category. We propose several efficient solutions for this problem. One of them uses n(logσ(1+o(1))+log D+O(h)) + O(Δ) bits of space and O(|p|+t) query time, where n is the total length of the documents, σ the size of the alphabet used in the documents and Δ is the total number of nodes in the category tree. Another solution uses n(logσ(1+o(1))+O(log D))+O(Δ)+O(Dlog n) bits of space and O(|p|+tlog D) query time. We finally propose other solutions which are more space-efficient at the expense of a slight increase in query time.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/03/2021

Position Heaps for Cartesian-tree Matching on Strings and Tries

The Cartesian-tree pattern matching is a recently introduced scheme of p...
research
07/17/2018

Using statistical encoding to achieve tree succinctness never seen before

We propose a new succinct representation of labeled trees which represen...
research
06/28/2023

Approximate Cartesian Tree Matching: an Approach Using Swaps

Cartesian tree pattern matching consists of finding all the factors of a...
research
01/20/2022

JEDI: These aren't the JSON documents you're looking for... (Extended Version*)

The JavaScript Object Notation (JSON) is a popular data format used in d...
research
06/05/2018

Tree Path Majority Data Structures

We present the first solution to τ-majorities on tree paths. Given a tre...
research
01/20/2022

Cost-Effective Algorithms for Average-Case Interactive Graph Search

Interactive graph search (IGS) uses human intelligence to locate the tar...
research
06/13/2022

KATKA: A KRAKEN-like tool with k given at query time

We describe a new tool, KATKA, that stores a phylogenetic tree T such th...

Please sign up or login with your details

Forgot password? Click here to reset