Person Search aims to simultaneously localize and recognize a target per...
Objective geometry quality assessment of point clouds is essential to
ev...
Knowledge distillation (KD) is essentially a process of transferring a
t...
Given a long untrimmed video and natural language queries, video groundi...
Instance segmentation aims to delineate each individual object of intere...
A Transformer-based deep direct sampling method is proposed for solving ...
Automated social behaviour analysis of mice has become an increasingly
p...
A transformed primal-dual (TPD) flow is developed for a class of nonline...
Recently, increasing efforts have been focused on Weakly Supervised Scen...
Brain tumor segmentation remains a challenge in medical image segmentati...
Nearly all existing scene graph generation (SGG) models have overlooked ...
Distinctive Image Captioning (DIC) – generating distinctive captions tha...
We investigate the problem of video Referring Expression Comprehension (...
Given an image and a reference caption, the image caption editing task a...
The deployment of the sensor nodes (SNs) always plays a decisive role in...
Data Augmentation (DA) – generating extra training samples beyond origin...
Speaker identification (SID) in the household scenario (e.g., for smart
...
Finite element de Rham complexes and finite element Stokes complexes wit...
Understanding how events described or shown in multimedia content relate...
Two-dimensional finite element complexes with various smoothness, includ...
Maxwell interface problems are of great importance in many electromagnet...
This report, commissioned by the WTW research network, investigates the ...
Reasoning about causal and temporal event relations in videos is a new
d...
Learning neural ODEs often requires solving very stiff ODE systems, prim...
We propose a novel framework to learn 3D point cloud semantics from 2D
m...
We study multimodal few-shot object detection (FSOD) in this paper, usin...
Few-shot object detection (FSOD), with the aim to detect novel objects u...
Universal Lesion Detection (ULD) in computed tomography plays an essenti...
Temporal Sentence Grounding in Videos (TSGV), which aims to ground a nat...
Finite element methods for Maxwell's equations are highly sensitive to t...
A unified construction of div-conforming finite element tensors, includi...
Grounded Situation Recognition (GSR), i.e., recognizing the salient acti...
Today's VidSGG models are all proposal-based methods, i.e., they first
g...
Recently C^m-conforming finite elements on simplexes in arbitrary dimens...
The beet cyst nematode (BCN) Heterodera schachtii is a plant pest respon...
Today's VQA models still tend to capture superficial linguistic correlat...
Given an untrimmed video and a natural language query, Natural Language ...
Video grounding aims to localize the temporal segment corresponding to a...
Video Visual Relation Detection (VidVRD), has received significant atten...
Weakly-Supervised Temporal Action Localization (WSTAL) aims to localize
...
We propose FMMformers, a class of efficient and flexible transformers
in...
This article presents an immersed virtual element method for solving a c...
Several div-conforming and divdiv-conforming finite elements for symmetr...
A finite element elasticity complex on tetrahedral meshes is devised. Th...
Speaker identification in the household scenario (e.g., for smart speake...
Centralized Training with Decentralized Execution (CTDE) has been a popu...
Weakly-Supervised Object Detection (WSOD) and Localization (WSOL), i.e.,...
We investigate what grade of sensor data is required for training an
imi...
In this work, we present a simple end-to-end trainable machine learning
...
The prevailing framework for matching multimodal inputs is based on a
tw...