Optimal-Time Queries on BWT-runs Compressed Indexes
Although a significant number of compressed indexes for highly repetitive strings have been proposed thus far, developing compressed indexes that support faster queries remains a challenge. Run-length Burrows-Wheeler transform (RLBWT) is a lossless data compression by a reversible permutation of an input string and run-length encoding, and it has become a popular research topic in string processing. R-index[Gagie et al., ACM'20] is an efficient compressed index on RLBWT whose space usage depends not on string length but the number of runs in an RLBWT, and it supports locate queries in an optimal time with ω(r) words for the number r of runs in the RLBWT of an input string. Following this line of research, we present the first compressed index on RLBWT, which we call r-index-f, that supports various queries including locate, count, extract queries, decompression and prefix search in the optimal time with smaller working space of O(r) words for small alphabets in this paper. We present efficient data structures for computing two important functions of LF and ϕ^-1 in constant time with O(r) words of space, which is a bit step forward in computation time from the previous best result of O(loglog n) time for string length n and O(r) words of space. Finally, We present algorithms for computing queries on RLBWT by leveraging those two data structures in optimal time with O(r) words of space.
READ FULL TEXT