MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models

06/23/2023
by   Chaoyou Fu, et al.
0

Multimodal Large Language Model (MLLM) relies on the powerful LLM to perform multimodal tasks, showing amazing emergent abilities in recent studies, such as writing poems based on an image. However, it is difficult for these case studies to fully reflect the performance of MLLM, lacking a comprehensive evaluation. In this paper, we fill in this blank, presenting the first MLLM Evaluation benchmark MME. It measures both perception and cognition abilities on a total of 14 subtasks. In order to avoid data leakage that may arise from direct use of public datasets for evaluation, the annotations of instruction-answer pairs are all manually designed. The concise instruction design allows us to fairly compare MLLMs, instead of struggling in prompt engineering. Besides, with such an instruction, we can also easily carry out quantitative statistics. A total of 10 advanced MLLMs are comprehensively evaluated on our MME, which not only suggests that existing MLLMs still have a large room for improvement, but also reveals the potential directions for the subsequent model optimization.

READ FULL TEXT

page 2

page 8

research
06/07/2023

INSTRUCTEVAL: Towards Holistic Evaluation of Instruction-Tuned Large Language Models

Instruction-tuned large language models have revolutionized natural lang...
research
09/04/2023

Are Emergent Abilities in Large Language Models just In-Context Learning?

Large language models have exhibited emergent abilities, demonstrating e...
research
06/23/2023

A Survey on Multimodal Large Language Models

Multimodal Large Language Model (MLLM) recently has been a new rising re...
research
06/15/2023

LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models

Large Vision-Language Models (LVLMs) have recently played a dominant rol...
research
07/07/2023

GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest

Instruction tuning large language model (LLM) on image-text pairs has ac...
research
08/25/2023

SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research

Recently, there has been growing interest in using Large Language Models...
research
07/31/2023

FinVis-GPT: A Multimodal Large Language Model for Financial Chart Analysis

In this paper, we propose FinVis-GPT, a novel multimodal large language ...

Please sign up or login with your details

Forgot password? Click here to reset