A Constant-time Adaptive Negative Sampling

12/31/2020
by   Shabnam Daghaghi, et al.
13

Softmax classifiers with a very large number of classes naturally occur in many applications such as natural language processing and information retrieval. The calculation of full-softmax is very expensive from the computational and energy perspective. There have been a variety of sampling approaches to overcome this challenge, popularly known as negative sampling (NS). Ideally, NS should sample negative classes from a distribution that is dependent on the input data, the current parameters, and the correct positive class. Unfortunately, due to the dynamically updated parameters and data samples, there does not exist any sampling scheme that is truly adaptive and also samples the negative classes in constant time every iteration. Therefore, alternative heuristics like random sampling, static frequency-based sampling, or learning-based biased sampling, which primarily trade either the sampling cost or the adaptivity of samples per iteration, are adopted. In this paper, we show a class of distribution where the sampling scheme is truly adaptive and provably generates negative samples in constant time. Our implementation in C++ on commodity CPU is significantly faster, in terms of wall clock time, compared to the most optimized TensorFlow implementations of standard softmax or other sampling approaches on modern GPUs (V100s).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/24/2019

Sampled Softmax with Random Fourier Features

The computational cost of training with softmax cross entropy loss grows...
research
02/15/2020

Extreme Classification via Adversarial Softmax Approximation

Training a classifier over a large number of classes, known as 'extreme ...
research
03/22/2018

Unbiased scalable softmax optimization

Recent neural network and language models rely on softmax distributions ...
research
03/30/2023

Efficient distributed representations beyond negative sampling

This article describes an efficient method to learn distributed represen...
research
10/31/2022

DUEL: Adaptive Duplicate Elimination on Working Memory for Self-Supervised Learning

In Self-Supervised Learning (SSL), it is known that frequent occurrences...
research
05/27/2021

Rethinking InfoNCE: How Many Negative Samples Do You Need?

InfoNCE loss is a widely used loss function for contrastive model traini...
research
04/10/2020

Efficient Sampled Softmax for Tensorflow

This short paper discusses an efficient implementation of sampled softma...

Please sign up or login with your details

Forgot password? Click here to reset