The aim of audio-visual segmentation (AVS) is to precisely differentiate...
We present TransNormerLLM, the first linear attention-based Large Langua...
Relative positional encoding is widely used in vanilla and linear
transf...
Sequence modeling has important applications in natural language process...
We explore a new task for audio-visual-language modeling called fine-gra...
Linear transformers aim to reduce the quadratic space-time complexity of...
Vision Transformers have achieved impressive performance in video
classi...