OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models

12/08/2022
by   Jinze Bai, et al.
0

Generalist models, which are capable of performing diverse multi-modal tasks in a task-agnostic way within a single model, have been explored recently. Being, hopefully, an alternative to approaching general-purpose AI, existing generalist models are still at an early stage, where modality and task coverage is limited. To empower multi-modal task-scaling and speed up this line of research, we release a generalist model learning system, OFASys, built on top of a declarative task interface named multi-modal instruction. At the core of OFASys is the idea of decoupling multi-modal task representations from the underlying model implementations. In OFASys, a task involving multiple modalities can be defined declaratively even with just a single line of code. The system automatically generates task plans from such instructions for training and inference. It also facilitates multi-task training for diverse multi-modal workloads. As a starting point, we provide presets of 7 different modalities and 23 highly-diverse example tasks in OFASys, with which we also develop a first-in-kind, single model, OFA+, that can handle text, image, speech, video, and motion data. The single OFA+ model achieves 95 in average with only 16 performance reliability of multi-modal task-scaling provided by OFASys. Available at https://github.com/OFA-Sys/OFASys

READ FULL TEXT

page 3

page 6

research
08/30/2023

LLaSM: Large Language and Speech Model

Multi-modal large language models have garnered significant interest rec...
research
06/05/2023

MM-DAG: Multi-task DAG Learning for Multi-modal Data – with Application for Traffic Congestion Analysis

This paper proposes to learn Multi-task, Multi-modal Direct Acyclic Grap...
research
09/02/2022

Multi-Modal Experience Inspired AI Creation

AI creation, such as poem or lyrics generation, has attracted increasing...
research
06/16/2023

Multi-task 3D building understanding with multi-modal pretraining

This paper explores various learning strategies for 3D building type cla...
research
01/27/2021

Multi-Modal Aesthetic Assessment for MObile Gaming Image

With the proliferation of various gaming technology, services, game styl...
research
06/14/2022

Codec at SemEval-2022 Task 5: Multi-Modal Multi-Transformer Misogynous Meme Classification Framework

In this paper we describe our work towards building a generic framework ...
research
09/22/2022

UniColor: A Unified Framework for Multi-Modal Colorization with Transformer

We propose the first unified framework UniColor to support colorization ...

Please sign up or login with your details

Forgot password? Click here to reset