Multilingual Language Model Adaptive Fine-Tuning: A Study on African Languages

by   Jesujoba O. Alabi, et al.

Multilingual pre-trained language models (PLMs) have demonstrated impressive performance on several downstream tasks on both high resourced and low-resourced languages. However, there is still a large performance drop for languages unseen during pre-training, especially African languages. One of the most effective approaches to adapt to a new language is language adaptive fine-tuning (LAFT) – fine-tuning a multilingual PLM on monolingual texts of a language using the same pre-training objective. However, African languages with large monolingual texts are few, and adapting to each of them individually takes large disk space and limits the cross-lingual transfer abilities of the resulting models because they have been specialized for a single language. In this paper, we perform multilingual adaptive fine-tuning (MAFT) on 17 most-resourced African languages and three other high-resource languages widely spoken on the African continent – English, French, and Arabic to encourage cross-lingual transfer learning. Additionally, to further specialize the multilingual PLM, we removed vocabulary tokens from the embedding layer that corresponds to non-African writing scripts before MAFT, thus reducing the model size by around 50%. Our evaluation on two multilingual PLMs (AfriBERTa and XLM-R) and three NLP tasks (NER, news topic classification, and sentiment classification) shows that our approach is competitive to applying LAFT on individual languages while requiring significantly less disk space. Finally, we show that our adapted PLM also improves the zero-shot cross-lingual transfer abilities of parameter efficient fine-tuning methods.


page 1

page 2

page 3

page 4


Parameter-Efficient Cross-lingual Transfer of Vision and Language Models via Translation-based Alignment

Pre-trained vision and language models such as CLIP have witnessed remar...

Embedding structure matters: Comparing methods to adapt multilingual vocabularies to new languages

Pre-trained multilingual language models underpin a large portion of mod...

MasakhaNEWS: News Topic Classification for African languages

African languages are severely under-represented in NLP research due to ...

Analyzing and Reducing the Performance Gap in Cross-Lingual Transfer with Fine-tuning Slow and Fast

Existing research has shown that a multilingual pre-trained language mod...

Multilingual Relation Classification via Efficient and Effective Prompting

Prompting pre-trained language models has achieved impressive performanc...

On the Prunability of Attention Heads in Multilingual BERT

Large multilingual models, such as mBERT, have shown promise in crosslin...

Nearest Neighbour Few-Shot Learning for Cross-lingual Classification

Even though large pre-trained multilingual models (e.g. mBERT, XLM-R) ha...

Please sign up or login with your details

Forgot password? Click here to reset