Evaluating User Perception of Speech Recognition System Quality with Semantic Distance Metric

by   Suyoun Kim, et al.

Measuring automatic speech recognition (ASR) system quality is critical for creating user-satisfying voice-driven applications. Word Error Rate (WER) has been traditionally used to evaluate ASR system quality; however, it sometimes correlates poorly with user perception of transcription quality. This is because WER weighs every word equally and does not consider semantic correctness which has a higher impact on user perception. In this work, we propose evaluating ASR output hypotheses quality with SemDist that can measure semantic correctness by using the distance between the semantic vectors of the reference and hypothesis extracted from a pre-trained language model. Our experimental results of 71K and 36K user annotated ASR output quality show that SemDist achieves higher correlation with user perception than WER. We also show that SemDist has higher correlation with downstream NLU tasks than WER.


page 1

page 2

page 3

page 4


Semantic Distance: A New Metric for ASR Performance Analysis Towards Spoken Language Understanding

Word Error Rate (WER) has been the predominant metric used to evaluate t...

Evaluating Automatic Speech Recognition Systems in Comparison With Human Perception Results Using Distinctive Feature Measures

This paper describes methods for evaluating automatic speech recognition...

Evaluating the Usability of Automatically Generated Captions for People who are Deaf or Hard of Hearing

The accuracy of Automated Speech Recognition (ASR) technology has improv...

WERd: Using Social Text Spelling Variants for Evaluating Dialectal Speech Recognition

We study the problem of evaluating automatic speech recognition (ASR) sy...

Assessing ASR Model Quality on Disordered Speech using BERTScore

Word Error Rate (WER) is the primary metric used to assess automatic spe...

Please sign up or login with your details

Forgot password? Click here to reset