Revealing Weaknesses of Vietnamese Language Models Through Unanswerable Questions in Machine Reading Comprehension

by   Son Quoc Tran, et al.

Although the curse of multilinguality significantly restricts the language abilities of multilingual models in monolingual settings, researchers now still have to rely on multilingual models to develop state-of-the-art systems in Vietnamese Machine Reading Comprehension. This difficulty in researching is because of the limited number of high-quality works in developing Vietnamese language models. In order to encourage more work in this research field, we present a comprehensive analysis of language weaknesses and strengths of current Vietnamese monolingual models using the downstream task of Machine Reading Comprehension. From the analysis results, we suggest new directions for developing Vietnamese language models. Besides this main contribution, we also successfully reveal the existence of artifacts in Vietnamese Machine Reading Comprehension benchmarks and suggest an urgent need for new high-quality benchmarks to track the progress of Vietnamese Machine Reading Comprehension. Moreover, we also introduced a minor but valuable modification to the process of annotating unanswerable questions for Machine Reading Comprehension from previous work. Our proposed modification helps improve the quality of unanswerable questions to a higher level of difficulty for Machine Reading Comprehension systems to solve.


page 1

page 2

page 3

page 4


YORC: Yoruba Reading Comprehension dataset

In this paper, we create YORC: a new multi-choice Yoruba Reading Compreh...

Decoding visemes: improving machine lipreading

To undertake machine lip-reading, we try to recognise speech from a visu...

The Impacts of Unanswerable Questions on the Robustness of Machine Reading Comprehension Models

Pretrained language models have achieved super-human performances on man...

Why Machine Reading Comprehension Models Learn Shortcuts?

Recent studies report that many machine reading comprehension (MRC) mode...

Collecting high-quality adversarial data for machine reading comprehension tasks with humans and models in the loop

We present our experience as annotators in the creation of high-quality,...

Ellipsis and Coreference Resolution as Question Answering

Coreference and many forms of ellipsis are similar to reading comprehens...

Please sign up or login with your details

Forgot password? Click here to reset