Haizhou Li

research

∙ 09/21/2023

AceGPT, Localizing Large Language Models in Arabic

This paper explores the imperative need and methodology for developing a...

0 Huang Huang, et al. ∙

research

∙ 09/21/2023

Leveraging In-the-Wild Data for Effective Self-Supervised Pretraining in Speaker Recognition

Current speaker recognition systems primarily rely on supervised approac...

0 Shuai Wang, et al. ∙

research

∙ 09/21/2023

FluentEditor: Text-based Speech Editing by Considering Acoustic and Prosody Consistency

Text-based speech editing (TSE) techniques are designed to enable users ...

0 Rui Liu, et al. ∙

research

∙ 09/21/2023

Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech

Prosodic phrasing is crucial to the naturalness and intelligibility of e...

0 Rui Liu, et al. ∙

research

∙ 09/19/2023

USED: Universal Speaker Extraction and Diarization

Speaker extraction and diarization are two crucial enabling techniques f...

0 Junyi Ao, et al. ∙

research

∙ 09/18/2023

Spiking-LEAF: A Learnable Auditory front-end for Spiking Neural Networks

Brain-inspired spiking neural networks (SNNs) have demonstrated great po...

0 Zeyang Song, et al. ∙

research

∙ 09/15/2023

Audio-Visual Active Speaker Extraction for Sparsely Overlapped Multi-talker Speech

Target speaker extraction aims to extract the speech of a specific speak...

0 Junjie Li, et al. ∙

research

∙ 09/14/2023

A Conversation is Worth A Thousand Recommendations: A Survey of Holistic Conversational Recommender Systems

Conversational recommender systems (CRS) generate recommendations throug...

0 Chuang Li, et al. ∙

research

∙ 09/13/2023

PIAVE: A Pose-Invariant Audio-Visual Speaker Extraction Network

It is common in everyday spoken communication that we look at the turnin...

0 Qinghua Liu, et al. ∙

research

∙ 08/28/2023

EEG-Derived Voice Signature for Attended Speaker Detection

Objective: Conventional EEG-based auditory attention detection (AAD) is ...

0 Hongxu Zhu, et al. ∙

research

∙ 08/25/2023

TC-LIF: A Two-Compartment Spiking Neuron Model for Long-term Sequential Modelling

The identification of sensory cues associated with potential opportuniti...

0 Shimin Zhang, et al. ∙

research

∙ 08/17/2023

CMB: A Comprehensive Medical Benchmark in Chinese

Large Language Models (LLMs) provide a possibility to make a great break...

0 Xidong Wang, et al. ∙

research

∙ 07/26/2023

GrammarGPT: Exploring Open-Source LLMs for Native Chinese Grammatical Error Correction with Supervised Fine-Tuning

Grammatical error correction aims to correct ungrammatical sentences aut...

0 Yaxin Fan, et al. ∙

research

∙ 07/21/2023

Is ChatGPT Involved in Texts? Measure the Polish Ratio to Detect ChatGPT-Generated Text

The remarkable capabilities of large-scale language models, such as Chat...

0 Lingyi Yang, et al. ∙

research

∙ 07/14/2023

Long Short-term Memory with Two-Compartment Spiking Neuron

The identification of sensory cues associated with potential opportuniti...

0 Shimin Zhang, et al. ∙

research

∙ 06/29/2023

High-Quality Automatic Voice Over with Accurate Alignment: Supervision through Self-Supervised Discrete Speech Units

The goal of Automatic Voice Over (AVO) is to generate speech in sync wit...

0 Junchen Lu, et al. ∙

research

∙ 06/06/2023

Constant Sequence Extension for Fast Search Using Weighted Hamming Distance

Representing visual data using compact binary codes is attracting increa...

0 Zhenyu Weng, et al. ∙

research

∙ 05/26/2023

A Hybrid Neural Coding Approach for Pattern Recognition with Spiking Neural Networks

The biological neural systems evolved to adapt to ecological environment...

0 Xinyi Chen, et al. ∙

research

∙ 05/25/2023

Betray Oneself: A Novel Audio DeepFake Detection Model via Mono-to-Stereo Conversion

Audio Deepfake Detection (ADD) aims to detect the fake audio generated b...

0 Rui Liu, et al. ∙

research

∙ 05/24/2023

HuatuoGPT, towards Taming Language Model to Be a Doctor

In this paper, we present HuatuoGPT, a large language model (LLM) for me...

0 Hongbo Zhang, et al. ∙

research

∙ 05/24/2023

Advancing Topic Segmentation and Outline Generation in Chinese Texts: The Paragraph-level Topic Representation, Corpus, and Benchmark

Topic segmentation and outline generation strive to divide a document in...

0 Feng Jiang, et al. ∙

research

∙ 05/23/2023

ADD 2023: the Second Audio Deepfake Detection Challenge

Audio deepfake detection is an emerging topic in the artificial intellig...

0 Jiangyan Yi, et al. ∙

research

∙ 05/23/2023

Topic-driven Distant Supervision Framework for Macro-level Discourse Parsing

Discourse parsing, the task of analyzing the internal rhetorical structu...

0 Feng Jiang, et al. ∙

research

∙ 05/22/2023

Target Active Speaker Detection with Audio-visual Cues

In active speaker detection (ASD), we would like to detect whether an on...

0 Yidi Jiang, et al. ∙

research

∙ 05/20/2023

Dynamic Transformers Provide a False Sense of Efficiency

Despite much success in natural language processing (NLP), pre-trained l...

0 Yiming Chen, et al. ∙

research

∙ 05/15/2023

Ripple sparse self-attention for monaural speech enhancement

The use of Transformer represents a recent success in speech enhancement...

0 Qiquan Zhang, et al. ∙

research

∙ 04/20/2023

Phoenix: Democratizing ChatGPT across Languages

This paper presents our efforts to democratize ChatGPT across language. ...

0 Zhihong Chen, et al. ∙

research

∙ 03/29/2023

Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert

Talking face generation, also known as speech-to-lip generation, reconst...

0 Jiadong Wang, et al. ∙

research

∙ 12/18/2022

PoE: a Panel of Experts for Generalized Automatic Dialogue Assessment

Chatbots are expected to be knowledgeable across multiple domains, e.g. ...

0 Chen Zhang, et al. ∙

research

∙ 12/17/2022

Relational Sentence Embedding for Flexible Semantic Matching

We present Relational Sentence Embedding (RSE), a new paradigm to furthe...

0 Bin Wang, et al. ∙

research

∙ 11/20/2022

Minimizing the Accumulated Trajectory Error to Improve Dataset Distillation

Model-based deep learning has achieved astounding successes due in part ...

0 Jiawei Du, et al. ∙

research

∙ 11/18/2022

Self-Transriber: Few-shot Lyrics Transcription with Self-training

The current lyrics transcription approaches heavily rely on supervised l...

0 Xiaoxue Gao, et al. ∙

research

∙ 10/31/2022

ImagineNET: Target Speaker Extraction with Intermittent Visual Cue through Embedding Inpainting

The speaker extraction technique seeks to single out the voice of a targ...

0 Zexu Pan, et al. ∙

research

∙ 10/30/2022

Generate, Discriminate and Contrast: A Semi-Supervised Sentence Representation Learning Framework

Most sentence embedding techniques heavily rely on expensive human-annot...

0 Yiming Chen, et al. ∙

research

∙ 10/30/2022

token2vec: A Joint Self-Supervised Pre-training Framework Using Unpaired Speech and Text

Self-supervised pre-training has been successful in both text and speech...

0 Xianghu Yue, et al. ∙

research

∙ 10/28/2022

Speaker recognition with two-step multi-modal deep cleansing

Neural network-based speaker recognition has achieved significant improv...

0 Ruijie Tao, et al. ∙

research

∙ 10/27/2022

Self-Supervised Training of Speaker Encoder with Multi-Modal Diverse Positive Pairs

We study a novel neural architecture and its training strategies of spea...

0 Ruijie Tao, et al. ∙

research

∙ 10/27/2022

Explicit Intensity Control for Accented Text-to-speech

Accented text-to-speech (TTS) synthesis seeks to generate speech with an...

0 Rui Liu, et al. ∙

research

∙ 10/27/2022

FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis

Conversational Text-to-Speech (TTS) aims to synthesis an utterance with ...

0 Yifan Hu, et al. ∙

research

∙ 10/27/2022

Exploiting modality-invariant feature for robust multimodal emotion recognition with missing modalities

Multimodal emotion recognition leverages complementary information acros...

0 Haolin Zuo, et al. ∙

research

∙ 10/25/2022

FineD-Eval: Fine-grained Automatic Dialogue-Level Evaluation

Recent model-based reference-free metrics for open-domain dialogue evalu...

0 Chen Zhang, et al. ∙

research

∙ 10/25/2022

Mixed Emotion Modelling for Emotional Voice Conversion

Emotional voice conversion (EVC) aims to convert the emotional state of ...

0 Kun Zhou, et al. ∙

research

∙ 10/21/2022

Analyzing and Evaluating Faithfulness in Dialogue Summarization

Dialogue summarization is abstractive in nature, making it suffer from f...

0 Bin Wang, et al. ∙

research

∙ 10/10/2022

Training Spiking Neural Networks with Local Tandem Learning

Spiking neural networks (SNNs) are shown to be more biologically plausib...

0 Qu Yang, et al. ∙

research

∙ 10/08/2022

CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning

Speech is the surface form of a finite set of phonetic units, which can ...

0 Chutong Meng, et al. ∙

research

∙ 09/24/2022

A Focused Study on Sequence Length for Dialogue Summarization

Output length is critical to dialogue summarization systems. The dialogu...

0 Bin Wang, et al. ∙

research

∙ 09/23/2022

The Kriston AI System for the VoxCeleb Speaker Recognition Challenge 2022

This technical report describes our system for track 1, 2 and 4 of the V...

0 Qutang Cai, et al. ∙

research

∙ 09/22/2022

Controllable Accented Text-to-Speech Synthesis

Accented text-to-speech (TTS) synthesis seeks to generate speech with an...

0 Rui Liu, et al. ∙

research

∙ 09/05/2022

Predict-and-Update Network: Audio-Visual Speech Recognition Inspired by Human Speech Perception

Audio and visual signals complement each other in human speech perceptio...

0 Jiadong Wang, et al. ∙

research

∙ 08/11/2022

Speech Synthesis with Mixed Emotions

Emotional speech synthesis aims to synthesize human voices with various ...

0 Kun Zhou, et al. ∙

Haizhou Li

Featured Co-authors

Sign in with Google

Consider DeepAI Pro