Capacity-Approaching Constrained Codes with Error Correction for DNA-Based Data Storage

01/09/2020
by   Tuan Thanh Nguyen, et al.
0

We propose coding techniques that limit the length of homopolymers runs, ensure the GC-content constraint, and are capable of correcting a single edit error in strands of nucleotides in DNA-based data storage systems. In particular, for given ℓ, ϵ > 0, we propose simple and efficient encoders/decoders that transform binary sequences into DNA base sequences (codewords), namely sequences of the symbols A, T, C and G, that satisfy the following properties: (i) Runlength constraint: the maximum homopolymer run in each codeword is at most ℓ, (ii) GC-content constraint: the GC-content of each codeword is within [0.5-ϵ, 0.5+ϵ], (iii) Error-correction: each codeword is capable of correcting a single deletion, or single insertion, or single substitution error. For practical values of ℓ and ϵ, we show that our encoders achieve much higher rates than existing results in the literature and approach the capacity. Our methods have low encoding/decoding complexity and limited error propagation.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro