Epoch-Synchronous Overlap-Add (ESOLA) for Time- and Pitch-Scale Modification of Speech Signals

by   Sunil Rudresh, et al.
indian institute of science

Time- and pitch-scale modifications of speech signals find important applications in speech synthesis, playback systems, voice conversion, learning/hearing aids, etc.. There is a requirement for computationally efficient and real-time implementable algorithms. In this paper, we propose a high quality and computationally efficient time- and pitch-scaling methodology based on the glottal closure instants (GCIs) or epochs in speech signals. The proposed algorithm, termed as epoch-synchronous overlap-add time/pitch-scaling (ESOLA-TS/PS), segments speech signals into overlapping short-time frames and then the adjacent frames are aligned with respect to the epochs and the frames are overlap-added to synthesize time-scale modified speech. Pitch scaling is achieved by resampling the time-scaled speech by a desired sampling factor. We also propose a concept of epoch embedding into speech signals, which facilitates the identification and time-stamping of samples corresponding to epochs and using them for time/pitch-scaling to multiple scaling factors whenever desired, thereby contributing to faster and efficient implementation. The results of perceptual evaluation tests reported in this paper indicate the superiority of ESOLA over state-of-the-art techniques. ESOLA significantly outperforms the conventional pitch synchronous overlap-add (PSOLA) techniques in terms of perceptual quality and intelligibility of the modified speech. Unlike the waveform similarity overlap-add (WSOLA) or synchronous overlap-add (SOLA) techniques, the ESOLA technique has the capability to do exact time-scaling of speech with high quality to any desired modification factor within a range of 0.5 to 2. Compared to synchronous overlap-add with fixed synthesis (SOLAFS), the ESOLA is computationally advantageous and at least three times faster.


page 6

page 9


Puffin: pitch-synchronous neural waveform generation for fullband speech on modest devices

We present a neural vocoder designed with low-powered Alternative and Au...

High-Quality Vocoding Design with Signal Processing for Speech Synthesis and Voice Conversion

This Ph.D. thesis focuses on developing a system for high-quality speech...

StyleMelGAN: An Efficient High-Fidelity Adversarial Vocoder with Temporal Adaptive Normalization

In recent years, neural vocoders have surpassed classical speech generat...

Variational Auto-Encoder based Mandarin Speech Cloning

Speech cloning technology is becoming more sophisticated thanks to the a...

Efficient Independent Vector Extraction of Dominant Target Speech

The complete decomposition performed by blind source separation is compu...

Image scaling by de la Vallée-Poussin filtered interpolation

We present a new image scaling method both for downscaling and upscaling...

Please sign up or login with your details

Forgot password? Click here to reset