Large-scale vision-language models (VLMs) like CLIP successfully find
co...
Despite their impressive capabilities, diffusion-based text-to-image (T2...
We investigate the problem of reducing mistake severity for fine-grained...
High-quality calibrated uncertainty estimates are crucial for numerous
r...
The goal of open-world compositional zero-shot learning (OW-CZSL) is to
...
Multi-view Detection (MVD) is highly effective for occlusion reasoning a...
Class imbalance and noisy labels are the norm rather than the exception ...
There has been increasing interest in building deep hierarchy-aware
clas...
Multi-object tracking has seen a lot of progress recently, albeit with
s...
Recent works have proposed several long term tracking benchmarks and
hig...