Democratizing LLMs for Low-Resource Languages by Leveraging their English Dominant Abilities with Linguistically-Diverse Prompts

by   Xuan-Phi Nguyen, et al.

Large language models (LLMs) are known to effectively perform tasks by simply observing few exemplars. However, in low-resource languages, obtaining such hand-picked exemplars can still be challenging, where unsupervised techniques may be necessary. Moreover, competent generative capabilities of LLMs are observed only in high-resource languages, while their performances among under-represented languages fall behind due to pre-training data imbalance. To elicit LLMs' ability onto low-resource languages without any supervised data, we propose to assemble synthetic exemplars from a diverse set of high-resource languages to prompt the LLMs to translate from any language into English. These prompts are then used to create intra-lingual exemplars to perform tasks in the target languages. Our unsupervised prompting method performs on par with supervised few-shot learning in LLMs of different sizes for translations between English and 13 Indic and 21 African low-resource languages. We also show that fine-tuning a 7B model on data generated from our method helps it perform competitively with a 175B model. In non-English translation tasks, our method even outperforms supervised prompting by up to 3 chrF++ in many low-resource languages. When evaluated on zero-shot multilingual summarization, our method surpasses other English-pivoting baselines by up to 4 ROUGE-L and is also favored by GPT-4.


Refining Low-Resource Unsupervised Translation by Language Disentanglement of Multilingual Model

Numerous recent work on unsupervised machine translation (UMT) implies t...

FewShotTextGCN: K-hop neighborhood regularization for few-shot learning on graphs

We present FewShotTextGCN, a novel method designed to effectively utiliz...

Using fine-tuning and min lookahead beam search to improve Whisper

The performance of Whisper in low-resource languages is still far from p...

Improving Captioning for Low-Resource Languages by Cycle Consistency

Improving the captioning performance on low-resource languages by levera...

Resources and Few-shot Learners for In-context Learning in Slavic Languages

Despite the rapid recent progress in creating accurate and compact in-co...

Multilingual Bidirectional Unsupervised Translation Through Multilingual Finetuning and Back-Translation

We propose a two-stage training approach for developing a single NMT mod...

Chain-of-Dictionary Prompting Elicits Translation in Large Language Models

Large language models (LLMs) have shown surprisingly good performance in...

Please sign up or login with your details

Forgot password? Click here to reset