Layered architectures have been widely used in robot systems. The majori...
We propose a distributed cooperative positioning algorithm using the ext...
Motivated by the efficiency and rapid convergence of pre-trained models ...
In recent years, low-carbon transportation has become an indispensable p...
We present SegGPT, a generalist model for segmenting everything in conte...
Large-scale text-to-image diffusion models achieve unprecedented success...
Contrastive language-image pre-training, CLIP for short, has gained
incr...
We launch EVA-02, a next-generation Transformer-based visual representat...
This paper proposes a unified diffusion framework (dubbed UniDiffuser) t...
Nowadays, the behavior tree is gaining popularity as a representation fo...
A large-scale deep model pre-trained on massive labeled or unlabeled dat...
Large deep learning models have achieved remarkable success in many
scen...
In-context learning, as a new paradigm in NLP, allows the model to rapid...
We launch EVA, a vision-centric foundation model to explore the limits o...
Frozen pretrained models have become a viable alternative to the
pretrai...
Vision transformers (ViT) have shown promise in various vision tasks
inc...
Real-word search and recommender systems usually adopt a multi-stage ran...
An important goal of self-supervised learning is to enable model pre-tra...
Masked image modeling (MIM) learns representations with remarkably good
...
Masked image modeling (MIM) as pre-training is shown to be effective for...
Rich user behavior data has been proven to be of great value for
Click-T...
Image classification, which classifies images by pre-defined categories,...
Full-reference (FR) image quality assessment (IQA) evaluates the visual
...
Robustness and discrimination power are two fundamental requirements in
...
Studies show that developers' answers to the mobile app users' feedbacks...
Building accurate and robust artificial intelligence systems for medical...
Recently, zero-shot image classification by vision-language pre-training...
This paper presents SimMIM, a simple framework for masked image modeling...
We present techniques for scaling Swin Transformer up to 3 billion param...
We introduce MixTraining, a new training paradigm for object detection t...
In this paper, to reduce the congestion rate at the city center and incr...
The vision community is witnessing a modeling shift from CNNs to
Transfo...
Designing novel protein sequences for a desired 3D topological fold is a...
We are witnessing a modeling shift from CNN to Transformers in computer
...
This paper presents a new vision Transformer, called Swin Transformer, t...
The success of deep denoisers on real-world color photographs usually re...
We propose ParaSCI, the first large-scale paraphrase dataset in the
scie...
The Non-Local Network (NLNet) presents a pioneering approach for capturi...
Contrastive learning methods for unsupervised visual representation lear...
Recent years have witnessed the great success of deep convolutional neur...
We investigate the task of learning blind image denoising networks from ...
Verification and regression are two general methodologies for prediction...
The interpretation of medical images is a challenging task, often compli...
This paper presents parametric instance classification (PIC) for unsuper...
The non-local block is a popular module for strengthening the context
mo...
This paper reviews the NTIRE 2020 challenge on real image denoising with...
How do humans recognize an object in a piece of video? Due to the
deteri...
This paper introduces a negative margin loss to metric learning based
fe...
A well-known issue of Batch Normalization is its significantly reduced
e...
Structural information about protein-protein interactions, often missing...