DeepAI AI Chat
Log In Sign Up

CiwGAN and fiwGAN: Encoding information in acoustic data to model lexical learning with Generative Adversarial Networks

by   Gašper Beguš, et al.
University of Washington

How can deep neural networks encode information that corresponds to words in human speech into raw acoustic data? This paper proposes two neural network architectures for modeling unsupervised lexical learning from raw acoustic inputs, ciwGAN (Categorical InfoWaveGAN) and fiwGAN (Featural InfoWaveGAN), that combine a DCGAN architecture for audio data (WaveGAN; arXiv:1705.07904) with InfoGAN (arXiv:1606.03657), and propose a new latent space structure that can model featural learning simultaneously with a higher level classification. The architectures introduce a network that learns to retrieve latent codes from generated audio outputs. Lexical learning is thus modeled as emergent from an architecture that forces a deep neural network to output data such that unique information is retrievable from its acoustic outputs. The networks trained on lexical items from TIMIT learn to encode unique information corresponding to lexical items in the form of categorical variables. By manipulating these variables, the network outputs specific lexical items. Innovative outputs suggest that phonetic and phonological representations learned by the network can be productively recombined and directly paralleled to productivity in human speech: a fiwGAN network trained on 'suit' and 'dark' outputs innovative 'start', even though it never saw 'start' or even a [st] sequence in the training data. We also argue that setting latent featural codes to values well beyond training range results in almost categorical generation of prototypical lexical items and reveals underlying values of each latent code. Probing deep neural networks trained on well understood dependencies in speech bears implications for latent space interpretability, understanding how deep neural networks learn meaningful representations, as well as a potential for unsupervised text-to-speech generation in the GAN framework.


page 10

page 15

page 25


Generative Adversarial Phonology: Modeling unsupervised phonetic and phonological learning with neural networks

Training deep neural networks on well-understood dependencies in speech ...

Identity-Based Patterns in Deep Convolutional Networks: Generative Adversarial Phonology and Reduplication

Identity-based patterns for which a computational model needs to output ...

Basic syntax from speech: Spontaneous concatenation in unsupervised deep neural networks

Computational models of syntax are predominantly text-based. Here we pro...

Approaching an unknown communication system by latent space exploration and causal inference

This paper proposes a methodology for discovering meaningful properties ...

Articulation GAN: Unsupervised modeling of articulatory learning

Generative deep neural networks are widely used for speech synthesis, bu...

Artificial sound change: Language change and deep convolutional neural networks in iterative learning

This paper proposes a framework for modeling sound change that combines ...