Diffusion probabilistic models have quickly become a major approach for
...
We introduce GAUDI, a generative model capable of capturing the distribu...
There have been a lot of interest in the scaling properties of Transform...
Recent developments in large-scale machine learning suggest that by scal...
There remain many open questions pertaining to the scaling behaviour of
...
We focus on the problem of domain adaptation when the goal is shifting t...
Transformers do not scale very well to long sequence lengths largely bec...
Having the right inductive biases can be crucial in many tasks or scenar...
In the Transformer model, "self-attention" combines information from att...
The lack of annotated data in many languages is a well-known challenge w...
In this paper, we define and apply representational stability analysis
(...
Language-brain encoding experiments evaluate the ability of language mod...
Any system which performs goal-directed continual learning must not only...
We evaluate 8 different word embedding models on their usefulness for
pr...