Nebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors
A F0 and voicing status estimation algorithm for speech analysis/synthesis is proposed. Instead of directly modeling speech signals, the proposed algorithm models the behavior of feature extractors under additive noise using a bank of Gaussian mixture models, trained on artificial data generated from Monte-Carlo simulations. The conditional distributions of F0 predicted by the GMMs are combined to generate a likelihood map, which is then smoothed by a Viterbi search to give the final F0 trajectory. The voicing decision is obtained based on the peak F0 likelihood. The proposed method achieves an average F0 gross error of 0.30
READ FULL TEXT