FinEval: A Chinese Financial Domain Knowledge Evaluation Benchmark for Large Language Models

08/19/2023
by   Liwen Zhang, et al.
0

Large language models (LLMs) have demonstrated exceptional performance in various natural language processing tasks, yet their efficacy in more challenging and domain-specific tasks remains largely unexplored. This paper presents FinEval, a benchmark specifically designed for the financial domain knowledge in the LLMs. FinEval is a collection of high-quality multiple-choice questions covering Finance, Economy, Accounting, and Certificate. It includes 4,661 questions spanning 34 different academic subjects. To ensure a comprehensive model performance evaluation, FinEval employs a range of prompt types, including zero-shot and few-shot prompts, as well as answer-only and chain-of-thought prompts. Evaluating state-of-the-art Chinese and English LLMs on FinEval, the results show that only GPT-4 achieved an accuracy close to 70 in different prompt settings, indicating significant growth potential for LLMs in the financial domain knowledge. Our work offers a more comprehensive financial knowledge evaluation benchmark, utilizing data of mock exams and covering a wide range of evaluated LLMs.

READ FULL TEXT

page 5

page 8

05/21/2023

Evaluating the Performance of Large Language Models on GAOKAO Benchmark

Large language models have demonstrated remarkable performance across va...
05/23/2023

CGCE: A Chinese Generative Chat Evaluation Benchmark for General and Financial Domains

Generative chat models, such as ChatGPT and GPT-4, have revolutionized n...
08/09/2023

Evaluating the Generation Capabilities of Large Chinese Language Models

This paper presents CG-Eval, the first comprehensive evaluation of the g...
04/30/2023

Beyond Classification: Financial Reasoning in State-of-the-Art Language Models

Large Language Models (LLMs), consisting of 100 billion or more paramete...
02/16/2023

GLUECons: A Generic Benchmark for Learning Under Constraints

Recent research has shown that integrating domain knowledge into deep le...
05/23/2023

Is Information Extraction Solved by ChatGPT? An Analysis of Performance, Evaluation Criteria, Robustness and Errors

ChatGPT has stimulated the research boom in the field of large language ...
05/10/2023

Are ChatGPT and GPT-4 General-Purpose Solvers for Financial Text Analytics? An Examination on Several Typical Tasks

The most recent large language models such as ChatGPT and GPT-4 have gar...

Code Repositories

FinEval

FinEval是一个包含金融、经济、会计和证书等领域高质量多项选择题的集合。


view repo

Please sign up or login with your details

Forgot password? Click here to reset