Data structures to represent sets of k-long DNA sequences

03/29/2019

∙

The analysis of biological sequencing data has been one of the biggest applications of string algorithms. The approaches used in many such applications are based on the analysis of k-mers, which are short fixed-length strings present in a dataset. While these approaches are rather diverse, storing and querying k-mer sets has emerged as a shared underlying component. Sets of k-mers have unique features and applications that, over the last ten years, have resulted in many specialized approaches for their representation. In this survey, we give a unified presentation and comparison of the data structures that have been proposed to store and query k-mer sets. We hope this survey will not only serve as a resource for researchers in the field but also make the area more accessible to outsiders

READ FULL TEXT

Data structures to represent sets of k-long DNA sequences

Sign in with Google

Consider DeepAI Pro