Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages

by   Wilhelmina Nekoto, et al.

Research in NLP lacks geographic diversity, and the question of how NLP can be scaled to low-resourced languages has not yet been adequately solved. "Low-resourced"-ness is a complex problem going beyond data availability and reflects systemic problems in society. In this paper, we focus on the task of Machine Translation (MT), that plays a crucial role for information accessibility and communication worldwide. Despite immense improvements in MT over the past decade, MT is centered around a few high-resourced languages. As MT researchers cannot solve the problem of low-resourcedness alone, we propose participatory research as a means to involve all necessary agents required in the MT development process. We demonstrate the feasibility and scalability of participatory research with a case study on MT for African languages. Its implementation leads to a collection of novel translation datasets, MT benchmarks for over 30 languages, with human evaluations for a third of them, and enables participants without formal training to make a unique scientific contribution. Benchmarks, models, data, code, and evaluation results are released under


page 1

page 2

page 3

page 4


Lanfrica: A Participatory Approach to Documenting Machine Translation Research on African Languages

Over the years, there have been campaigns to include the African languag...

It's not a Non-Issue: Negation as a Source of Error in Machine Translation

As machine translation (MT) systems progress at a rapid pace, questions ...

Neural Machine Translation for the Indigenous Languages of the Americas: An Introduction

Neural models have drastically advanced state of the art for machine tra...

In Search of Lost Edges: A Case Study on Reconstructing Financial Networks

To capture the systemic complexity of international financial systems, n...

Peru Mining: Analysis and Forecast of Mining Production in Peru Using Time Series and Data Science Techniques

Peruvian mining plays a crucial role in the country's economy, being one...

Facilitating Global Team Meetings Between Language-Based Subgroups: When and How Can Machine Translation Help?

Global teams frequently consist of language-based subgroups who put toge...

SilverAlign: MT-Based Silver Data Algorithm For Evaluating Word Alignment

Word alignments are essential for a variety of NLP tasks. Therefore, cho...

Please sign up or login with your details

Forgot password? Click here to reset