Synthesis Cost-Optimal Targeted Mutant Protein Libraries

by   Dimitris Papamichail, et al.

Protein variant libraries produced by site-directed mutagenesis are a useful tool utilized by protein engineers to explore variants with potentially improved properties, such as activity and stability. These libraries are commonly built by selecting residue positions and alternative beneficial mutations for each position. All possible combinations are then constructed and screened, by incorporating degenerate codons at mutation sites. These degenerate codons often encode additional unwanted amino acids or even STOP codons. Our study aims to take advantage of annealing based recombination of oligonucleotides during synthesis and utilize multiple degenerate codons per mutation site to produce targeted protein libraries devoid of unwanted variants. Toward this goal we created an algorithm to calculate the minimum number of degenerate codons necessary to specify any given amino acid set, and a dynamic programming method that uses this algorithm to optimally partition a DNA target sequence with degeneracies into overlapping oligonucleotides, such that the total cost of synthesis of the target mutant protein library is minimized. Computational experiments show that, for a modest increase in DNA synthesis costs, beneficial variant yields in produced mutant libraries are increased by orders of magnitude, an effect particularly pronounced in large combinatorial libraries.


Partial Product Aware Machine Learning on DNA-Encoded Libraries

DNA encoded libraries (DELs) are used for rapid large-scale screening of...

Survival analysis of DNA mutation motifs with penalized proportional hazards

Antibodies, an essential part of our immune system, develop in an intric...

MutFormer: A context-dependent transformer-based model to predict pathogenic missense mutations

A missense mutation is a point mutation that results in a substitution o...

An Evolutionary Approach to Drug-Design Using a Novel Neighbourhood Based Genetic Algorithm

The present work provides a new approach to evolve ligand structures whi...

Computational Protein Design Using AND/OR Branch-and-Bound Search

The computation of the global minimum energy conformation (GMEC) is an i...

Assessing the Precision and Recall of msTALI as Applied to an Active-Site Study on Fold Families

Proteins execute various activities required by biological cells. Furthe...

An efficient graph generative model for navigating ultra-large combinatorial synthesis libraries

Virtual, make-on-demand chemical libraries have transformed early-stage ...

Please sign up or login with your details

Forgot password? Click here to reset