Developing a Compressed Object Detection Model based on YOLOv4 for Deployment on Embedded GPU Platform of Autonomous System
Latest CNN-based object detection models are quite accurate but require a high-performance GPU to run in real-time. They still are heavy in terms of memory size and speed for an embedded system with limited memory space. Since the object detection for autonomous system is run on an embedded processor, it is preferable to compress the detection network as light as possible while preserving the detection accuracy. There are several popular lightweight detection models but their accuracy is too low for safe driving applications. Therefore, this paper proposes a new object detection model, referred as YOffleNet, which is compressed at a high ratio while minimizing the accuracy loss for real-time and safe driving application on an autonomous system. The backbone network architecture is based on YOLOv4, but we could compress the network greatly by replacing the high-calculation-load CSP DenseNet with the lighter modules of ShuffleNet. Experiments with KITTI dataset showed that the proposed YOffleNet is compressed by 4.7 times than the YOLOv4-s that could achieve as fast as 46 FPS on an embedded GPU system(NVIDIA Jetson AGX Xavier). Compared to the high compression ratio, the accuracy is reduced slightly to 85.8 showed a high potential to be deployed on the embedded system of the autonomous system for the real-time and accurate object detection applications.
READ FULL TEXT