Frame Fusion with Vehicle Motion Prediction for 3D Object Detection
In LiDAR-based 3D detection, history point clouds contain rich temporal information helpful for future prediction. In the same way, history detections should contribute to future detections. In this paper, we propose a detection enhancement method, namely FrameFusion, which improves 3D object detection results by fusing history frames. In FrameFusion, we ”forward” history frames to the current frame and apply weighted Non-Maximum-Suppression on dense bounding boxes to obtain a fused frame with merged boxes. To ”forward” frames, we use vehicle motion models to estimate the future pose of the bounding boxes. However, the commonly used constant velocity model fails naturally on turning vehicles, so we explore two vehicle motion models to address this issue. On Waymo Open Dataset, our FrameFusion method consistently improves the performance of various 3D detectors by about 2 vehicle level 2 APH with negligible latency and slightly enhances the performance of the temporal fusion method MPPNet. We also conduct extensive experiments on motion model selection.
READ FULL TEXT