Optimizing FPGA-based Accelerator Design for Large-Scale Molecular Similarity Search

by   Hongwu Peng, et al.

Molecular similarity search has been widely used in drug discovery to identify structurally similar compounds from large molecular databases rapidly. With the increasing size of chemical libraries, there is growing interest in the efficient acceleration of large-scale similarity search. Existing works mainly focus on CPU and GPU to accelerate the computation of the Tanimoto coefficient in measuring the pairwise similarity between different molecular fingerprints. In this paper, we propose and optimize an FPGA-based accelerator design on exhaustive and approximate search algorithms. On exhaustive search using BitBound folding, we analyze the similarity cutoff and folding level relationship with search speedup and accuracy, and propose a scalable on-the-fly query engine on FPGAs to reduce the resource utilization and pipeline interval. We achieve a 450 million compounds-per-second processing throughput for a single query engine. On approximate search using hierarchical navigable small world (HNSW), a popular algorithm with high recall and query speed. We propose an FPGA-based graph traversal engine to utilize a high throughput register array based priority queue and fine-grained distance calculation engine to increase the processing capability. Experimental results show that the proposed FPGA-based HNSW implementation has a 103385 query per second (QPS) on the Chembl database with 0.92 recall and achieves a 35x speedup than the existing CPU implementation on average. To the best of our knowledge, our FPGA-based implementation is the first attempt to accelerate molecular similarity search algorithms on FPGA and has the highest performance among existing approaches.


page 1

page 2

page 3

page 4

page 5

page 6

page 7


HP-GNN: Generating High Throughput GNN Training Implementation on CPU-FPGA Heterogeneous Platform

Graph Neural Networks (GNNs) have shown great success in many applicatio...

An OpenCL 3D FFT for Molecular Dynamics Distributed Across Multiple FPGAs

3D FFTs are used to accelerate MD electrostatic forces computations but ...

Design and Implementation of High-throughput PCIe with DMA Architecture between FPGA and PowerPC

We designed and implemented a direct memory access (DMA) architecture of...

Co-design Hardware and Algorithm for Vector Search

Vector search has emerged as the foundation for large-scale information ...

FPScreen: A Rapid Similarity Search Tool for Massive Molecular Library Based on Molecular Fingerprint Comparison

We designed a fast similarity search engine for large molecular librarie...

From Research to Proof-of-Concept: Analysis of a Deployment of FPGAs on a Commercial Search Engine

FPGAs are quickly becoming available in the cloud as a one more heteroge...

FPGA-based Acceleration of FT Convolution for Pulsar Search Using OpenCL

The Square Kilometre Array (SKA) project will be the world largest radio...

Please sign up or login with your details

Forgot password? Click here to reset