FairCLIP: Social Bias Elimination based on Attribute Prototype Learning and Representation Neutralization

by   Junyang Wang, et al.

The Vision-Language Pre-training (VLP) models like CLIP have gained popularity in recent years. However, many works found that the social biases hidden in CLIP easily manifest in downstream tasks, especially in image retrieval, which can have harmful effects on human society. In this work, we propose FairCLIP to eliminate the social bias in CLIP-based image retrieval without damaging the retrieval performance achieving the compatibility between the debiasing effect and the retrieval performance. FairCLIP is divided into two steps: Attribute Prototype Learning (APL) and Representation Neutralization (RN). In the first step, we extract the concepts needed for debiasing in CLIP. We use the query with learnable word vector prefixes as the extraction structure. In the second step, we first divide the attributes into target and bias attributes. By analysis, we find that both attributes have an impact on the bias. Therefore, we try to eliminate the bias by using Re-Representation Matrix (RRM) to achieve the neutralization of the representation. We compare the debiasing effect and retrieval performance with other methods, and experiments demonstrate that FairCLIP can achieve the best compatibility. Although FairCLIP is used to eliminate bias in image retrieval, it achieves the neutralization of the representation which is common to all CLIP downstream tasks. This means that FairCLIP can be applied as a general debiasing method for other fairness issues related to CLIP.


FashionViL: Fashion-Focused Vision-and-Language Representation Learning

Large-scale Vision-and-Language (V+L) pre-training for representation le...

Is Fairness Only Metric Deep? Evaluating and Addressing Subgroup Gaps in Deep Metric Learning

Deep metric learning (DML) enables learning with less supervision throug...

Semi-supervised Feature-Level Attribute Manipulation for Fashion Image Retrieval

With a growing demand for the search by image, many works have studied t...

Shielded Representations: Protecting Sensitive Attributes Through Iterative Gradient-Based Projection

Natural language processing models tend to learn and encode social biase...

ADEPT: A DEbiasing PrompT Framework

Several works have proven that finetuning is an applicable approach for ...

Scalable Visual Attribute Extraction through Hidden Layers of a Residual ConvNet

Visual attributes play an essential role in real applications based on i...

Benign Shortcut for Debiasing: Fair Visual Recognition via Intervention with Shortcut Features

Machine learning models often learn to make predictions that rely on sen...

Please sign up or login with your details

Forgot password? Click here to reset