Learnings from Technological Interventions in a Low Resource Language: Enhancing Information Access in Gondi

11/29/2022
by   Devansh Mehta, et al.
0

The primary obstacle to developing technologies for low-resource languages is the lack of representative, usable data. In this paper, we report the deployment of technology-driven data collection methods for creating a corpus of more than 60,000 translations from Hindi to Gondi, a low-resource vulnerable language spoken by around 2.3 million tribal people in south and central India. During this process, we help expand information access in Gondi across 2 different dimensions (a) The creation of linguistic resources that can be used by the community, such as a dictionary, children's stories, Gondi translations from multiple sources and an Interactive Voice Response (IVR) based mass awareness platform; (b) Enabling its use in the digital domain by developing a Hindi-Gondi machine translation model, which is compressed by nearly 4 times to enable it's edge deployment on low-resource edge devices and in areas of little to no internet connectivity. We also present preliminary evaluations of utilizing the developed machine translation model to provide assistance to volunteers who are involved in collecting more data for the target language. Through these interventions, we not only created a refined and evaluated corpus of 26,240 Hindi-Gondi translations that was used for building the translation model but also engaged nearly 850 community members who can help take Gondi onto the internet.

READ FULL TEXT

page 11

page 13

research
04/21/2020

Learnings from Technological Interventions in a Low Resource Language: A Case-Study on Gondi

The primary obstacle to developing technologies for low-resource languag...
research
09/01/2021

Survey of Low-Resource Machine Translation

We present a survey covering the state of the art in low-resource machin...
research
04/19/2023

The eBible Corpus: Data and Model Benchmarks for Bible Translation for Low-Resource Languages

Efficiently and accurately translating a corpus into a low-resource lang...
research
03/23/2018

Leveraging translations for speech transcription in low-resource settings

Recently proposed data collection frameworks for endangered language doc...
research
03/21/2023

Optical Character Recognition and Transcription of Berber Signs from Images in a Low-Resource Language Amazigh

The Berber, or Amazigh language family is a low-resource North African v...
research
04/12/2022

Not always about you: Prioritizing community needs when developing endangered language technology

Languages are classified as low-resource when they lack the quantity of ...
research
09/20/2023

TRAVID: An End-to-End Video Translation Framework

In today's globalized world, effective communication with people from di...

Please sign up or login with your details

Forgot password? Click here to reset