The Adam optimizer is the standard choice in deep learning applications....
We identify empirical scaling laws for the cross-entropy loss in four
do...
Recent results in Reinforcement Learning (RL) have shown that agents wit...
We present label gradient alignment, a novel algorithm for semi-supervis...