We continue the investigation into the power of smaller Transformer-base...
In this paper, we investigate the impact of stochasticity and large step...
SGD (with momentum) and AdamW are the two most used optimizers for
fine-...
Training computer vision models usually requires collecting and labeling...
We provide a detailed evaluation of various image classification
archite...
We propose a synthetic task, LEGO (Learning Equality and Group Operation...
Data augmentation is a cornerstone of the machine learning pipeline, yet...
We study the function space characterization of the inductive bias resul...
Understanding generalization in deep learning is arguably one of the mos...
We provide a detailed asymptotic study of gradient flow trajectories and...
We present a direct (primal only) derivation of Mirror Descent as a "par...
A recent line of work studies overparametrized neural networks in the "k...
Normalization methods such as batch normalization are commonly used in
o...
A recent line of work studies overparametrized neural networks in the
"k...
With an eye toward understanding complexity control in deep learning, we...
We study the interplay between sequential decision making and avoiding
d...
We show that gradient descent on full-width linear convolutional network...
The implicit bias of gradient descent is not fully understood even in si...
We study the bias of generic optimization methods, including Mirror Desc...
We study implicit regularization when optimizing an underdetermined quad...
We propose a novel and efficient algorithm for the collaborative prefere...
This work proposes a new algorithm for automated and simultaneous phenot...
In this paper, we present a unified analysis of matrix completion under
...
We consider the matrix completion problem of recovering a structured mat...
We address the collective matrix completion problem of jointly recoverin...