LUT-NN: Towards Unified Neural Network Inference by Table Lookup

by   Xiaohu Tang, et al.

DNN inference requires huge effort of system development and resource cost. This drives us to propose LUT-NN, the first trial towards empowering deep neural network (DNN) inference by table lookup, to eliminate the diverse computation kernels as well as save running cost. Based on the feature similarity of each layer, LUT-NN can learn the typical features, named centroids, of each layer from the training data, precompute them with model weights, and save the results in tables. For future input, the results of the closest centroids with the input features can be directly read from the table, as the approximation of layer output. We propose the novel centroid learning technique for DNN, which enables centroid learning through backpropagation, and adapts three levels of approximation to minimize the model loss. By this technique, LUT-NN achieves comparable accuracy (<5 dataset, including CIFAR, ImageNet, and GLUE. LUT-NN simplifies the computing operators to only two: closest centroid search and table lookup. We implement them for Intel and ARM CPUs. The model size is reduced by up to 3.5x for CNN models and 7x for BERT. Latency-wise, the real speedup of LUT-NN is up to 7x for BERT and 2x for ResNet, much lower than theoretical results because of the current unfriendly hardware design for table lookup. We expect firstclass table lookup support in the future to unleash the potential of LUT-NN.


CMSIS-NN: Efficient Neural Network Kernels for Arm Cortex-M CPUs

Deep Neural Networks are becoming increasingly popular in always-on IoT ...

Instruction Set Architecture (ISA) for Processing-in-Memory DNN Accelerators

In this article, we introduce an instruction set architecture (ISA) for ...

Neural Networks for Latent Budget Analysis of Compositional Data

Compositional data are non-negative data collected in a rectangular matr...

Variance Based Samples Weighting for Supervised Deep Learning

In the context of supervised learning of a function by a Neural Network ...

Bucketed PCA Neural Networks with Neurons Mirroring Signals

The bucketed PCA neural network (PCA-NN) with transforms is developed he...

Accelerating In-Browser Deep Learning Inference on Diverse Edge Clients through Just-in-Time Kernel Optimizations

Web applications are increasingly becoming the primary platform for AI s...

NN-LUT: Neural Approximation of Non-Linear Operations for Efficient Transformer Inference

Non-linear operations such as GELU, Layer normalization, and Softmax are...

Please sign up or login with your details

Forgot password? Click here to reset