Designing robust watermark barcodes for multiplex long-read sequencing
A method for designing sequencing barcodes that can withstand a large number of insertion, deletion and substitution errors and are suitable for use in multiplex single-molecule real-time sequencing is presented. The manuscript focuses on the design of barcodes for full-length single-pass reads, impaired by challenging error rates in the order of 11 is the first method to specifically address this problem without requiring upstream quality improvement. The proposed barcodes can multiplex hundreds or thousands of samples while achieving sample misassignment probabilities as low as 10^-7, and are designed to be compatible with chemical constraints imposed by the sequencing process. Software for constructing watermark barcode sets and demultiplexing barcoded reads, together with example sets of barcodes and synthetic barcoded reads, are freely available at www.cifasis-conicet.gov.ar/ezpeleta/NS-watermark.
READ FULL TEXT