The CLEAR Benchmark: Continual LEArning on Real-World Imagery

by   Zhiqiu Lin, et al.

Continual learning (CL) is widely regarded as crucial challenge for lifelong AI. However, existing CL benchmarks, e.g. Permuted-MNIST and Split-CIFAR, make use of artificial temporal variation and do not align with or generalize to the real-world. In this paper, we introduce CLEAR, the first continual image classification benchmark dataset with a natural temporal evolution of visual concepts in the real world that spans a decade (2004-2014). We build CLEAR from existing large-scale image collections (YFCC100M) through a novel and scalable low-cost approach to visio-linguistic dataset curation. Our pipeline makes use of pretrained vision-language models (e.g. CLIP) to interactively build labeled datasets, which are further validated with crowd-sourcing to remove errors and even inappropriate images (hidden in original YFCC100M). The major strength of CLEAR over prior CL benchmarks is the smooth temporal evolution of visual concepts with real-world imagery, including both high-quality labeled data along with abundant unlabeled samples per time period for continual semi-supervised learning. We find that a simple unsupervised pre-training step can already boost state-of-the-art CL algorithms that only utilize fully-supervised data. Our analysis also reveals that mainstream CL evaluation protocols that train and test on iid data artificially inflate performance of CL system. To address this, we propose novel "streaming" protocols for CL that always test on the (near) future. Interestingly, streaming protocols (a) can simplify dataset curation since today's testset can be repurposed for tomorrow's trainset and (b) can produce more generalizable models with more accurate estimates of performance since all labeled data from each time-period is used for both training and testing (unlike classic iid train-test splits).


page 29

page 30

page 31

page 32

page 34

page 35

page 36

page 37


Self-Supervised Training Enhances Online Continual Learning

In continual learning, a system must incrementally learn from a non-stat...

Hypernetworks for Continual Semi-Supervised Learning

Learning from data sequentially arriving, possibly in a non i.i.d. way, ...

Continual Learning with Deep Streaming Regularized Discriminant Analysis

Continual learning is increasingly sought after in real world machine le...

A Distinct Unsupervised Reference Model From The Environment Helps Continual Learning

The existing continual learning methods are mainly focused on fully-supe...

ORDisCo: Effective and Efficient Usage of Incremental Unlabeled Data for Semi-supervised Continual Learning

Continual learning usually assumes the incoming data are fully labeled, ...

International Workshop on Continual Semi-Supervised Learning: Introduction, Benchmarks and Baselines

The aim of this paper is to formalize a new continual semi-supervised le...

Learning on non-stationary data with re-weighting

Many real-world learning scenarios face the challenge of slow concept dr...

Please sign up or login with your details

Forgot password? Click here to reset