Panda LLM: Training Data and Evaluation for Open-Sourced Chinese Instruction-Following Large Language Models

by   Fangkai Jiao, et al.

This project focuses on enhancing open-source large language models through instruction-tuning and providing comprehensive evaluations of their performance. We explore how various training data factors, such as quantity, quality, and linguistic distribution, influence the performance of instruction-tuned models trained on publicly accessible high-quality instruction datasets for both English and Chinese languages. Our goal is to supplement evaluation with quantitative analyses, providing valuable insights for the continued advancement of open-source chat models. Our model, data, and code are publicly available for others to use and build upon.


Chinese Open Instruction Generalist: A Preliminary Release

Instruction tuning is widely recognized as a key technique for building ...

INSTRUCTEVAL: Towards Holistic Evaluation of Instruction-Tuned Large Language Models

Instruction-tuned large language models have revolutionized natural lang...

From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning

In the realm of Large Language Models, the balance between instruction d...

VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use

We introduce VisIT-Bench (Visual InsTruction Benchmark), a benchmark for...

Poisoning Language Models During Instruction Tuning

Instruction-tuned LMs such as ChatGPT, FLAN, and InstructGPT are finetun...

ExpertPrompting: Instructing Large Language Models to be Distinguished Experts

The answering quality of an aligned large language model (LLM) can be dr...

Please sign up or login with your details

Forgot password? Click here to reset