Measuring and alleviating the discrepancies between the synthetic (sourc...
3D dense captioning aims to describe individual objects by natural langu...
3D scene understanding is a relatively emerging research field. In this
...
Medical imaging technologies, including computed tomography (CT) or ches...
Compared with the visual grounding in 2D images, the natural-language-gu...