In this short note we consider random fully connected ReLU networks of w...
Privacy noise may negate the benefits of using adaptive optimizers in
di...
This paper studies the curious phenomenon for machine learning models wi...
Adaptive optimization methods have become the default solvers for many
m...
In contrast to SGD, adaptive gradient methods like Adam allow robust tra...
Distillation is the technique of training a "student" model based on exa...
Federated learning is a challenging optimization problem due to the
hete...
Transformer networks use pairwise attention to compute contextual embedd...
Knowledge distillation is a technique for improving the performance of a...
Modern retrieval problems are characterised by training sets with potent...
Attention based Transformer architecture has enabled significant advance...
Despite the widespread adoption of Transformer models for NLP tasks, the...
While stochastic gradient descent (SGD) is still the de facto algorithm ...
Federated learning is a key scenario in modern large-scale machine learn...
Privacy preserving machine learning algorithms are crucial for learning
...
Several recently proposed stochastic optimization methods that have been...
Adaptive methods such as Adam and RMSProp are widely used in deep learni...
We consider the problem of retrieving the most relevant labels for a giv...
A central challenge to using first-order methods for optimizing nonconve...
In this paper, we present two new communication-efficient methods for
di...
We study Frank-Wolfe methods for nonconvex stochastic and finite-sum
opt...
We analyze stochastic algorithms for optimizing nonconvex, nonsmooth
fin...
We study nonconvex finite-sum problems and analyze stochastic variance
r...
We analyze a fast incremental aggregated gradient method for optimizing
...
Nonparametric two sample testing is a decision theoretic problem that
in...
We study optimization algorithms based on variance reduction for stochas...
Nonparametric two sample testing deals with the question of consistently...
We consider the problem of selecting a subset of alternatives given nois...