RuBQ: A Russian Dataset for Question Answering over Wikidata

05/21/2020
by   Vladislav Korablinov, et al.
0

The paper presents RuBQ, the first Russian knowledge base question answering (KBQA) dataset. The high-quality dataset consists of 1,500 Russian questions of varying complexity, their English machine translations, SPARQL queries to Wikidata, reference answers, as well as a Wikidata sample of triples containing entities with Russian labels. The dataset creation started with a large collection of question-answer pairs from online quizzes. The data underwent automatic filtering, crowd-assisted entity linking, automatic generation of SPARQL queries, and their subsequent in-house verification.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset