Instruction tuning large language model (LLM) on image-text pairs has
ac...
We present a vision and language model named MultiModal-GPT to conduct
m...
One-to-one label assignment in object detection has successfully obviate...
In this paper, we aim to design an efficient real-time object detector t...
In this study, we dive deep into the unique challenges in semi-supervise...
End-to-end object detection is rapidly progressed after the emergence of...
Feature pyramid has been an efficient method to extract features at diff...