Diet deep generative audio models with structured lottery

by   Philippe Esling, et al.

Deep learning models have provided extremely successful solutions in most audio application fields. However, the high accuracy of these models comes at the expense of a tremendous computation cost. This aspect is almost always overlooked in evaluating the quality of proposed models. However, models should not be evaluated without taking into account their complexity. This aspect is especially critical in audio applications, which heavily relies on specialized embedded hardware with real-time constraints. In this paper, we build on recent observations that deep models are highly overparameterized, by studying the lottery ticket hypothesis on deep generative audio models. This hypothesis states that extremely efficient small sub-networks exist in deep models and would provide higher accuracy than larger models if trained in isolation. However, lottery tickets are found by relying on unstructured masking, which means that resulting models do not provide any gain in either disk size or inference time. Instead, we develop here a method aimed at performing structured trimming. We show that this requires to rely on global selection and introduce a specific criterion based on mutual information. First, we confirm the surprising result that smaller models provide higher accuracy than their large counterparts. We further show that we can remove up to 95 weights without significant degradation in accuracy. Hence, we can obtain very light models for generative audio across popular methods such as Wavenet, SING or DDSP, that are up to 100 times smaller with commensurate accuracy. We study the theoretical bounds for embedding these models on Raspberry Pi and Arduino, and show that we can obtain generative models on CPU with equivalent quality as large GPU models. Finally, we discuss the possibility of implementing deep generative audio models on embedded platforms.


page 1

page 2

page 3

page 4


Problems using deep generative models for probabilistic audio source separation

Recent advancements in deep generative modeling make it possible to lear...

Ultra-light deep MIR by trimming lottery tickets

Current state-of-the-art results in Music Information Retrieval are larg...

Expectation-Propogation for the Generative Aspect Model

The generative aspect model is an extension of the multinomial model for...

Continuous descriptor-based control for deep audio synthesis

Despite significant advances in deep models for music generation, the us...

Performing Structured Improvisations with pre-trained Deep Learning Models

The quality of outputs produced by deep generative models for music have...

EBJR: Energy-Based Joint Reasoning for Adaptive Inference

State-of-the-art deep learning models have achieved significant performa...

A Multi-Objective Approach for Sustainable Generative Audio Models

In recent years, the deep learning community has largely focused on the ...

Please sign up or login with your details

Forgot password? Click here to reset