b'Di Hu'

research

∙ 09/13/2023

Prompting Segmentation with Sound is Generalizable Audio-Visual Source Localizer

Never having seen an object and heard its sound simultaneously, can the ...

0 Yaoting Wang, et al. ∙

research

∙ 09/12/2023

Enhancing Multi-modal Cooperation via Fine-grained Modality Valuation

One primary topic of multi-modal learning is to jointly incorporate hete...

0 Yake Wei, et al. ∙

research

∙ 08/10/2023

Progressive Spatio-temporal Perception for Audio-Visual Question Answering

Audio-Visual Question Answering (AVQA) task aims to answer questions abo...

0 Guangyao Li, et al. ∙

research

∙ 06/15/2023

Towards Long Form Audio-visual Video Understanding

We live in a world filled with never-ending streams of multimodal inform...

0 Wenxuan Hou, et al. ∙

research

∙ 06/06/2023

Supervised Knowledge May Hurt Novel Class Discovery Performance

Novel class discovery (NCD) aims to infer novel categories in an unlabel...

0 Ziyun Li, et al. ∙

research

∙ 05/29/2023

Multi-Scale Attention for Audio Question Answering

Audio question answering (AQA), acting as a widely used proxy task to ex...

0 Guangyao Li, et al. ∙

research

∙ 04/16/2023

Robust Cross-Modal Knowledge Distillation for Unconstrained Videos

Cross-modal distillation has been widely used to transfer knowledge acro...

0 Wenke Xia, et al. ∙

research

∙ 03/09/2023

MMCosine: Multi-Modal Cosine Loss Towards Balanced Audio-Visual Fine-Grained Learning

Audio-visual learning helps to comprehensively understand the world by f...

0 Ruize Xu, et al. ∙

research

∙ 02/14/2023

Balanced Audiovisual Dataset for Imbalance Analysis

The imbalance problem is widespread in the field of machine learning, wh...

0 Wenke Xia, et al. ∙

research

∙ 02/07/2023

Revisiting Pre-training in Audio-Visual Learning

Pre-training technique has gained tremendous success in enhancing model ...

0 Ruoxuan Feng, et al. ∙

research

∙ 09/19/2022

A Closer Look at Novel Class Discovery from the Labeled Set

Novel class discovery (NCD) aims to infer novel categories in an unlabel...

0 Ziyun Li, et al. ∙

research

∙ 08/20/2022

Learning in Audio-visual Context: A Review, Analysis, and New Perspective

Sight and hearing are two senses that play a vital role in human communi...

2 Yake Wei, et al. ∙

research

∙ 08/10/2022

Dual Domain-Adversarial Learning for Audio-Visual Saliency Prediction

Both visual and auditory information are valuable to determine the salie...

2 Yingzi Fan, et al. ∙

research

∙ 03/29/2022

Balanced Multimodal Learning via On-the-fly Gradient Modulation

Multimodal learning helps to comprehensively understand the world, by in...

0 Xiaokang Peng, et al. ∙

research

∙ 03/26/2022

Learning to Answer Questions in Dynamic Audio-Visual Scenarios

In this paper, we focus on the Audio-Visual Question Answering (AVQA) ta...

10 Guangyao Li, et al. ∙

research

∙ 03/25/2022

SeCo: Separating Unknown Musical Visual Sounds with Consistency Guidance

Recent years have witnessed the success of deep learning on the visual s...

8 Xinchi Zhou, et al. ∙

research

∙ 03/09/2022

Inadequately Pre-trained Models are Better Feature Extractors

Pre-training has been a popular learning paradigm in deep learning era, ...

0 Andong Deng, et al. ∙

research

∙ 02/13/2022

Visual Sound Localization in the Wild by Cross-Modal Interference Erasing

The task of audio-visual sound source localization has been well studied...

5 Xian Liu, et al. ∙

research

∙ 12/22/2021

Class-aware Sounding Objects Localization via Audiovisual Correspondence

Audiovisual scenes are pervasive in our daily life. It is commonplace fo...

0 Di Hu, et al. ∙

research

∙ 10/11/2021

Parsing Data Formats of the Inputs and Outputs of Geographic Models with Code Analysis

Model web services provide an approach for implementing and facilitating...

0 Xinghua Cheng, et al. ∙

research

∙ 10/11/2021

Integrating Structural Description of Data Format Information into Programming to Auto-generate File Reading Programs

File reading is the basis for data sharing and scientific computing. How...

0 Xinghua Cheng, et al. ∙

research

∙ 08/02/2021

Self-supervised Audiovisual Representation Learning for Remote Sensing Data

Many current deep learning approaches make extensive use of backbone net...

3 Konrad Heidler, et al. ∙

research

∙ 06/02/2021

Not All Knowledge Is Created Equal

Mutual knowledge distillation (MKD) improves a model by distilling knowl...

15 Ziyun Li, et al. ∙

research

∙ 04/27/2021

Unsupervised Multi-Source Domain Adaptation for Person Re-Identification

Unsupervised domain adaptation (UDA) methods for person re-identificatio...

0 Zechen Bai, et al. ∙

research

∙ 04/05/2021

Cyclic Co-Learning of Sounding Object Visual Grounding and Sound Separation

There are rich synchronized audio and visual events in our daily life. I...

0 Yapeng Tian, et al. ∙

research

∙ 12/14/2020

Temporal Relational Modeling with Self-Supervision for Action Segmentation

Temporal relational modeling in video is essential for human action unde...

0 Dong Wang, et al. ∙

research

∙ 10/16/2020

Towards Accurate Knowledge Transfer via Target-awareness Representation Disentanglement

Fine-tuning deep neural networks pre-trained on large scale datasets is ...

11 Xingjian Li, et al. ∙

research

∙ 10/12/2020

Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching

Discriminatively localizing sounding objects in cocktail-party, i.e., mi...

0 Di Hu, et al. ∙

research

∙ 07/13/2020

Multiple Sound Sources Localization from Coarse to Fine

How to visually localize multiple sound sources in unconstrained videos ...

0 Rui Qian, et al. ∙

research

∙ 05/18/2020

Cross-Task Transfer for Multimodal Aerial Scene Recognition

Aerial scene recognition is a fundamental task in remote sensing and has...

8 Di Hu, et al. ∙

research

∙ 05/18/2020

Cross-Task Transfer for Geotagged Audiovisual Aerial Scene Recognition

Aerial scene recognition is a fundamental task in remote sensing and has...

0 Di Hu, et al. ∙

research

∙ 05/14/2020

Ambient Sound Helps: Audiovisual Crowd Counting in Extreme Conditions

Visual crowd counting has been recently studied as a way to enable peopl...

0 Di Hu, et al. ∙

research

∙ 01/26/2020

Curriculum Audiovisual Learning

Associating sound and its producer in complex audiovisual scene is a cha...

6 Di Hu, et al. ∙

research

∙ 04/19/2019

Listen to the Image

Visual-to-auditory sensory substitution devices can assist the blind in ...

0 Di Hu, et al. ∙

research

∙ 10/08/2018

Dense Multimodal Fusion for Hierarchically Joint Representation

Multiple modalities can provide more valuable information than single on...

0 Di Hu, et al. ∙

research

∙ 10/08/2018

Deep LDA Hashing

The conventional supervised hashing methods based on classification do n...

0 Di Hu, et al. ∙

research

∙ 07/09/2018

Deep Co-Clustering for Unsupervised Audiovisual Learning

The seen birds twitter, the running cars accompany with noise, people ta...

2 Di Hu, et al. ∙

research

∙ 08/19/2017

Image2song: Song Retrieval via Bridging Image Content and Lyric Words

Image is usually taken for expressing some kinds of emotions or purposes...

0 Xuelong Li, et al. ∙

research

∙ 08/17/2017

Deep Binary Reconstruction for Cross-modal Hashing

With the increasing demand of massive multimodal data storage and organi...

0 Xuelong Li, et al. ∙

Di Hu

Featured Co-authors

Sign in with Google

Consider DeepAI Pro