OxfordVGG Submission to the EGO4D AV Transcription Challenge

07/18/2023
by   Jaesung Huh, et al.
0

This report presents the technical details of our submission on the EGO4D Audio-Visual (AV) Automatic Speech Recognition Challenge 2023 from the OxfordVGG team. We present WhisperX, a system for efficient speech transcription of long-form audio with word-level time alignment, along with two text normalisers which are publicly available. Our final submission obtained 56.0 leaderboard. All baseline codes and models are available on https://github.com/m-bain/whisperX.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset