Keyword Spotting System and Evaluation of Pruning and Quantization Methods on Low-power Edge Microcontrollers
Keyword spotting (KWS) is beneficial for voice-based user interactions with low-power devices at the edge. The edge devices are usually always-on, so edge computing brings bandwidth savings and privacy protection. The devices typically have limited memory spaces, computational performances, power and costs, for example, Cortex-M based microcontrollers. The challenge is to meet the high computation and low-latency requirements of deep learning on these devices. This paper firstly shows our small-footprint KWS system running on STM32F7 microcontroller with Cortex-M7 core @216MHz and 512KB static RAM. Our selected convolutional neural network (CNN) architecture has simplified number of operations for KWS to meet the constraint of edge devices. Our baseline system generates classification results for each 37ms including real-time audio feature extraction part. This paper further evaluates the actual performance for different pruning and quantization methods on microcontroller, including different granularity of sparsity, skipping zero weights, weight-prioritized loop order, and SIMD instruction. The result shows that for microcontrollers, there are considerable challenges for accelerate unstructured pruned models, and the structured pruning is more friendly than unstructured pruning. The result also verified that the performance improvement for quantization and SIMD instruction.
READ FULL TEXT