Building artificial intelligence (AI) systems on top of a set of foundat...
Training an image captioner without annotated image-sentence pairs has g...
Large language models (LLMs) are increasingly adopted for a variety of t...
Online movie review platforms are providing crowdsourced feedback for th...
Neural Radiance Fields (NeRF) have been widely adopted as practical and
...
We propose PAniC-3D, a system to reconstruct stylized 3D character heads...
Many top-down architectures for instance segmentation achieve significan...
Video-Language Pre-training models have recently significantly improved
...
Neural Radiance Fields (NeRF) methods have proved effective as compact,
...
Digital neuron reconstruction from 3D microscopy images is an essential
...
Twitter bot detection has become an increasingly important task to comba...
In audio-visual navigation (AVN), an intelligent agent needs to navigate...
Dense captioning in 3D point clouds is an emerging vision-and-language t...
Open-world instance segmentation is the task of grouping pixels into obj...
The support set is a key to providing conditional prior for fast adaptio...
We introduce PyTorchVideo, an open-source deep-learning library that pro...
Conventional video models rely on a single stream to capture the complex...
Attention Mechanism is a widely used method for improving the performanc...
Current state-of-the-art object detection and segmentation methods work ...
The standard way of training video models entails sampling at each itera...
We present a convolution-free approach to video classification built
exc...
Traditional change detection methods usually follow the image differenci...
We present a Chinese judicial reading comprehension (CJRC) dataset which...
Image-to-image translation is significant to many computer vision and ma...
In this paper, we introduce CAIL2019-SCM, Chinese AI and Law 2019 Simila...
Video classification methods often divide the video into short clips, do...
Motion is a salient cue to recognize actions in video. Modern action
rec...
Current fully-supervised video datasets consist of only a few hundred
th...
Group convolution has been shown to offer great computational savings in...
Chinese word usage errors often occur in non-native Chinese learners'
wr...
The goal of knowledge representation learning is to embed entities and
r...
In this paper, we give an overview of the Legal Judgment Prediction (LJP...
In this paper, we introduce the Chinese AI and Law
challenge dataset (CA...
We propose a lightweight neural network model, Deformable Volume Network...
This paper describes a procedure for the creation of large-scale video
d...
In this paper, we propose a model using generative adversarial net (GAN)...
In this paper we discuss several forms of spatiotemporal convolutions fo...
In a streaming environment, there is often a need for statistical predic...
This paper introduces a state-of-the-art video representation and applie...
Common statistical prediction models often require and assume stationari...