DeVLBert: Learning Deconfounded Visio-Linguistic Representations

by   Shengyu Zhang, et al.

In this paper, we propose to investigate the problem of out-of-domain visio-linguistic pretraining, where the pretraining data distribution differs from that of downstream data on which the pretrained model will be fine-tuned. Existing methods for this problem are purely likelihood-based, leading to the spurious correlations and hurt the generalization ability when transferred to out-of-domain downstream tasks. By spurious correlation, we mean that the conditional probability of one token (object or word) given another one can be high (due to the dataset biases) without robust (causal) relationships between them. To mitigate such dataset biases, we propose a Deconfounded Visio-Linguistic Bert framework, abbreviated as DeVLBert, to perform intervention-based learning. We borrow the idea of the backdoor adjustment from the research field of causality and propose several neural-network based architectures for Bert-style out-of-domain pretraining. The quantitative results on three downstream tasks, Image Retrieval (IR), Zero-shot IR, and Visual Question Answering, show the effectiveness of DeVLBert by boosting generalization ability.


Are we pretraining it right? Digging deeper into visio-linguistic pretraining

Numerous recent works have proposed pretraining generic visio-linguistic...

A Balanced Data Approach for Evaluating Cross-Lingual Transfer: Mapping the Linguistic Blood Bank

We show that the choice of pretraining languages affects downstream cros...

Cost-effective Selection of Pretraining Data: A Case Study of Pretraining BERT on Social Media

Recent studies on domain-specific BERT models show that effectiveness on...

Explore and Exploit the Diverse Knowledge in Model Zoo for Domain Generalization

The proliferation of pretrained models, as a result of advancements in p...

DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining

The mixture proportions of pretraining data domains (e.g., Wikipedia, bo...

Towards Generalizable Graph Contrastive Learning: An Information Theory Perspective

Graph contrastive learning (GCL) emerges as the most representative appr...

Towards Understanding Why Mask-Reconstruction Pretraining Helps in Downstream Tasks

For unsupervised pretraining, mask-reconstruction pretraining (MRP) appr...

Please sign up or login with your details

Forgot password? Click here to reset