Low-redundancy codes for correcting multiple short-duplication and edit errors

08/03/2022
by   Yuanyuan Tang, et al.
0

Due to its higher data density, longevity, energy efficiency, and ease of generating copies, DNA is considered a promising storage technology for satisfying future needs. However, a diverse set of errors including deletions, insertions, duplications, and substitutions may arise in DNA at different stages of data storage and retrieval. The current paper constructs error-correcting codes for simultaneously correcting short (tandem) duplications and at most p edits, where a short duplication generates a copy of a substring with length ≤ 3 and inserts the copy following the original substring, and an edit is a substitution, deletion, or insertion. Compared to the state-of-the-art codes for duplications only, the proposed codes correct up to p edits (in addition to duplications) at the additional cost of roughly 8p(log_q n)(1+o(1)) symbols of redundancy, thus achieving the same asymptotic rate, where q≥ 4 is the alphabet size and p is a constant. Furthermore, the time complexities of both the encoding and decoding processes are polynomial when p is a constant with respect to the code length.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/18/2020

Error-correcting Codes for Noisy Duplication Channels

Because of its high data density and longevity, DNA is emerging as a pro...
research
11/11/2020

Error-correcting Codes for Short Tandem Duplication and Substitution Errors

Due to its high data density and longevity, DNA is considered a promisin...
research
02/20/2023

Reconstruction of Sequences Distorted by Two Insertions

Reconstruction codes are generalizations of error-correcting codes that ...
research
08/30/2018

Asymptotically Optimal Codes Correcting Fixed-Length Duplication Errors in DNA Storage Systems

A (tandem) duplication of length k is an insertion of an exact copy of...
research
01/08/2018

Efficient Encoding/Decoding of Irreducible Words for Codes Correcting Tandem Duplications

Tandem duplication is the process of inserting a copy of a segment of DN...
research
10/15/2019

Optimal Codes Correcting a Single Indel / Edit for DNA-Based Data Storage

An indel refers to a single insertion or deletion, while an edit refers ...
research
04/07/2023

Iterative Soft Decoding Algorithm for DNA Storage Using Quality Score and Redecoding

Ever since deoxyribonucleic acid (DNA) was considered as a next-generati...

Please sign up or login with your details

Forgot password? Click here to reset