CTP: Towards Vision-Language Continual Pretraining via Compatible Momentum Contrast and Topology Preservation

by   Hongguang Zhu, et al.

Vision-Language Pretraining (VLP) has shown impressive results on diverse downstream tasks by offline training on large-scale datasets. Regarding the growing nature of real-world data, such an offline training paradigm on ever-expanding data is unsustainable, because models lack the continual learning ability to accumulate knowledge constantly. However, most continual learning studies are limited to uni-modal classification and existing multi-modal datasets cannot simulate continual non-stationary data stream scenarios. To support the study of Vision-Language Continual Pretraining (VLCP), we first contribute a comprehensive and unified benchmark dataset P9D which contains over one million product image-text pairs from 9 industries. The data from each industry as an independent task supports continual learning and conforms to the real-world long-tail nature to simulate pretraining on web data. We comprehensively study the characteristics and challenges of VLCP, and propose a new algorithm: Compatible momentum contrast with Topology Preservation, dubbed CTP. The compatible momentum model absorbs the knowledge of the current and previous-task models to flexibly update the modal feature. Moreover, Topology Preservation transfers the knowledge of embedding across tasks while preserving the flexibility of feature adjustment. The experimental results demonstrate our method not only achieves superior performance compared with other baselines but also does not bring an expensive training burden. Dataset and codes are available at https://github.com/KevinLight831/CTP.


page 4

page 18


Lifelong Pretraining: Continually Adapting Language Models to Emerging Corpora

Pretrained language models (PTLMs) are typically learned over a large, s...

Renate: A Library for Real-World Continual Learning

Continual learning enables the incremental training of machine learning ...

Rethinking Momentum Knowledge Distillation in Online Continual Learning

Online Continual Learning (OCL) addresses the problem of training neural...

Visually Grounded Continual Learning of Compositional Semantics

Children's language acquisition from the visual world is a real-world ex...

Continual Learning via Inter-Task Synaptic Mapping

Learning from streaming tasks leads a model to catastrophically erase un...

TemporalWiki: A Lifelong Benchmark for Training and Evaluating Ever-Evolving Language Models

Language Models (LMs) become outdated as the world changes; they often f...

Exploring Data Redundancy in Real-world Image Classification through Data Selection

Deep learning models often require large amounts of data for training, l...

Please sign up or login with your details

Forgot password? Click here to reset