COURIER: Contrastive User Intention Reconstruction for Large-Scale Pre-Train of Image Features

by   Jia-Qi Yang, et al.

With the development of the multi-media internet, visual characteristics have become an important factor affecting user interests. Thus, incorporating visual features is a promising direction for further performance improvements in click-through rate (CTR) prediction. However, we found that simply injecting the image embeddings trained with established pre-training methods only has marginal improvements. We attribute the failure to two reasons: First, The pre-training methods are designed for well-defined computer vision tasks concentrating on semantic features, and they cannot learn personalized interest in recommendations. Secondly, pre-trained image embeddings only containing semantic information have little information gain, considering we already have semantic features such as categories and item titles as inputs in the CTR prediction task. We argue that a pre-training method tailored for recommendation is necessary for further improvements. To this end, we propose a recommendation-aware image pre-training method that can learn visual features from user click histories. Specifically, we propose a user interest reconstruction module to mine visual features related to user interests from behavior histories. We further propose a contrastive training method to avoid collapsing of embedding vectors. We conduct extensive experiments to verify that our method can learn users' visual interests, and our method achieves 0.46% improvement in offline AUC and 0.88% improvement in Taobao online GMV with p-value<0.01.


page 3

page 9


UPRec: User-Aware Pre-training for Recommender Systems

Existing sequential recommendation methods rely on large amounts of trai...

Contrastive Pre-training for Sequential Recommendation

Sequential recommendation methods play a crucial role in modern recommen...

Personalized Prompts for Sequential Recommendation

Pre-training models have shown their power in sequential recommendation....

UserBERT: Contrastive User Model Pre-training

User modeling is critical for personalized web applications. Existing us...

Delving into E-Commerce Product Retrieval with Vision-Language Pre-training

E-commerce search engines comprise a retrieval phase and a ranking phase...

What Remains of Visual Semantic Embeddings

Zero shot learning (ZSL) has seen a surge in interest over the decade fo...

GIVL: Improving Geographical Inclusivity of Vision-Language Models with Pre-Training Methods

A key goal for the advancement of AI is to develop technologies that ser...

Please sign up or login with your details

Forgot password? Click here to reset