AceGPT, Localizing Large Language Models in Arabic

by   Huang Huang, et al.

This paper explores the imperative need and methodology for developing a localized Large Language Model (LLM) tailored for Arabic, a language with unique cultural characteristics that are not adequately addressed by current mainstream models like ChatGPT. Key concerns additionally arise when considering cultural sensitivity and local values. To this end, the paper outlines a packaged solution, including further pre-training with Arabic texts, supervised fine-tuning (SFT) using native Arabic instructions and GPT-4 responses in Arabic, and reinforcement learning with AI feedback (RLAIF) using a reward model that is sensitive to local culture and values. The objective is to train culturally aware and value-aligned Arabic LLMs that can serve the diverse application-specific needs of Arabic-speaking communities. Extensive evaluations demonstrated that the resulting LLM called `AceGPT' is the SOTA open Arabic LLM in various benchmarks, including instruction-following benchmark (i.e., Arabic Vicuna-80 and Arabic AlpacaEval), knowledge benchmark (i.e., Arabic MMLU and EXAMs), as well as the newly-proposed Arabic cultural & value alignment benchmark. Notably, AceGPT outperforms ChatGPT in the popular Vicuna-80 benchmark when evaluated with GPT-4, despite the benchmark's limited scale. (NLU) benchmark (i.e., ALUE) Codes, data, and models are in


page 1

page 2

page 3

page 4


The Interplay of Variant, Size, and Task Type in Arabic Pre-trained Language Models

In this paper, we explore the effects of language variants, data sizes, ...

Having Beer after Prayer? Measuring Cultural Bias in Large Language Models

Are language models culturally biased? It is important that language mod...

Ashaar: Automatic Analysis and Generation of Arabic Poetry Using Deep Learning Approaches

Poetry holds immense significance within the cultural and traditional fa...

TARJAMAT: Evaluation of Bard and ChatGPT on Machine Translation of Ten Arabic Varieties

Large language models (LLMs) finetuned to follow human instructions have...

ORCA: A Challenging Benchmark for Arabic Language Understanding

Due to their crucial role in all NLP, several benchmarks have been propo...

The Ghost in the Machine has an American accent: value conflict in GPT-3

The alignment problem in the context of large language models must consi...

Dolphin: A Challenging and Diverse Benchmark for Arabic NLG

We present Dolphin, a novel benchmark that addresses the need for an eva...

Please sign up or login with your details

Forgot password? Click here to reset