MFCCGAN: A Novel MFCC-Based Speech Synthesizer Using Adversarial Learning

In this paper, we introduce MFCCGAN as a novel speech synthesizer based on adversarial learning that adopts MFCCs as input and generates raw speech waveforms. Benefiting the GAN model capabilities, it produces speech with higher intelligibility than a rule-based MFCC-based speech synthesizer WORLD. We evaluated the model based on a popular intrusive objective speech intelligibility measure (STOI) and quality (NISQA score). Experimental results show that our proposed system outperforms Librosa MFCC- inversion (by an increase of about 26 rise of about 10 with conventional rule-based vocoder WORLD that used in the CycleGAN-VC family. However, WORLD needs additional data like F0. Finally, using perceptual loss in discriminators based on STOI could improve the quality more. WebMUSHRA-based subjective tests also show the quality of the proposed approach.

READ FULL TEXT
research
11/24/1998

Generating Segment Durations in a Text-To-Speech System: A Hybrid Rule-Based/Neural Network Approach

A combination of a neural network with rule firing information from a ru...
research
11/02/2020

Perceptually Guided End-to-End Text-to-Speech

Several fast text-to-speech (TTS) models have been proposed for real-tim...
research
04/11/2018

VR IQA NET: Deep Virtual Reality Image Quality Assessment using Adversarial Learning

In this paper, we propose a novel virtual reality image quality assessme...
research
07/05/2023

Going Retro: Astonishingly Simple Yet Effective Rule-based Prosody Modelling for Speech Synthesis Simulating Emotion Dimensions

We introduce two rule-based models to modify the prosody of speech synth...
research
10/07/2019

Adapting a FrameNet Semantic Parser for Spoken Language Understanding Using Adversarial Learning

This paper presents a new semantic frame parsing model, based on Berkele...
research
10/20/2020

RAN Cognitive Controller

Cognitive Autonomous Networks (CAN) deploys learning based Cognitive Fun...
research
11/14/2021

Towards Interpretability of Speech Pause in Dementia Detection using Adversarial Learning

Speech pause is an effective biomarker in dementia detection. Recent dee...

Please sign up or login with your details

Forgot password? Click here to reset