In this work, we propose a new transformer-based regularization to bette...
We present TransNormerLLM, the first linear attention-based Large Langua...
Relative positional encoding is widely used in vanilla and linear
transf...
Composed image retrieval aims to find an image that best matches a given...
Sequence modeling has important applications in natural language process...
The Segment Anything Model (SAM) has demonstrated exceptional performanc...
Composed image retrieval searches for a target image based on a multi-mo...
Self-supervised audio-visual source localization aims to locate sound-so...
Linear transformers aim to reduce the quadratic space-time complexity of...
Vision Transformers have achieved impressive performance in video
classi...
Recently, numerous efficient Transformers have been proposed to reduce t...
Vision transformers have shown great success on numerous computer vision...
Transformer has shown great successes in natural language processing,
co...
Pixel-wise clean annotation is necessary for fully-supervised semantic
s...