Learning useful representations for shifting tasks and distributions

by   Jianyu Zhang, et al.

Does the dominant approach to learn representations (as a side effect of optimizing an expected cost for a single training distribution) remain a good approach when we are dealing with multiple distributions. Our thesis is that such scenarios are better served by representations that are "richer" than those obtained with a single optimization episode. This is supported by a collection of empirical results obtained with an apparently naïve ensembling technique: concatenating the representations obtained with multiple training episodes using the same data, model, algorithm, and hyper-parameters, but different random seeds. These independently trained networks perform similarly. Yet, in a number of scenarios involving new distributions, the concatenated representation performs substantially better than an equivalently sized network trained from scratch. This proves that the representations constructed by multiple training episodes are in fact different. Although their concatenation carries little additional information about the training task under the training distribution, it becomes substantially more informative when tasks or distributions change. Meanwhile, a single training episode is unlikely to yield such a redundant representation because the optimization process has no reason to accumulate features that do not incrementally improve the training performance.


Speech representation learning: Learning bidirectional encoders with single-view, multi-view, and multi-task methods

This thesis focuses on representation learning for sequence data over ti...

Preventing Catastrophic Forgetting in Continual Learning of New Natural Language Tasks

Multi-Task Learning (MTL) is widely-accepted in Natural Language Process...

Learning Internal Representations (PhD Thesis)

Most machine learning theory and practice is concerned with learning a s...

Learning about an exponential amount of conditional distributions

We introduce the Neural Conditioner (NC), a self-supervised machine able...

One Deep Music Representation to Rule Them All? : A comparative analysis of different representation learning strategies

Inspired by the success of deploying deep learning in the fields of Comp...

Structured (De)composable Representations Trained with Neural Networks

The paper proposes a novel technique for representing templates and inst...

Please sign up or login with your details

Forgot password? Click here to reset