Disentangled Self-Attentive Neural Networks for Click-Through Rate Prediction
Click-through rate (CTR) prediction, which aims to predict the probability that whether of a user will click on an item, is an essential task for many online applications. Due to the nature of data sparsity and high dimensionality in CTR prediction, a key to making effective prediction is to model high-order feature interactions among feature fields. To explicitly model high-order feature interactions, an efficient way is to stack multihead self-attentive neural networks, which has achieved promising performance. However, one problem of the vanilla self-attentive network is that two terms, a whitened pairwise interaction term and a unary term, are coupled in the computation of the self-attention score, where the pairwise term contributes to learning the importance score for each feature interaction, while the unary term models the impact of one feature on all other features. We identify two factors, coupled gradient computation and shared transformations, impede the learning of both terms. To solve this problem, in this paper,we present a novel Disentangled Self-Attentive neural Network (DSAN) model for CTR prediction, which disentangles the two terms for facilitating learning feature interactions. We conduct extensive experiments framework using two real-world benchmark datasets. The results show that DSAN not only retains computational efficiency but obtains performance improvements over state-of-the-art baselines.
READ FULL TEXT