Musika! Fast Infinite Waveform Music Generation

by   Marco Pasini, et al.

Fast and user-controllable music generation could enable novel ways of composing or performing music. However, state-of-the-art music generation systems require large amounts of data and computational resources for training, and are slow at inference. This makes them impractical for real-time interactive use. In this work, we introduce Musika, a music generation system that can be trained on hundreds of hours of music using a single consumer GPU, and that allows for much faster than real-time generation of music of arbitrary length on a consumer CPU. We achieve this by first learning a compact invertible representation of spectrogram magnitudes and phases with adversarial autoencoders, then training a Generative Adversarial Network (GAN) on this representation for a particular music domain. A latent coordinate system enables generating arbitrarily long sequences of excerpts in parallel, while a global context vector allows the music to remain stylistically coherent through time. We perform quantitative evaluations to assess the quality of the generated samples and showcase options for user control in piano and techno music generation. We release the source code and pretrained autoencoder weights at, such that a GAN can be trained on a new music domain with a single GPU in a matter of hours.


page 3

page 5


Moûsai: Text-to-Music Generation with Long-Context Latent Diffusion

The recent surge in popularity of diffusion models for image generation ...

Stack-and-Delay: a new codebook pattern for music generation

In language modeling based music generation, a generated waveform is rep...

Video Background Music Generation with Controllable Music Transformer

In this work, we address the task of video background music generation. ...

Interactive Music Generation with Positional Constraints using Anticipation-RNNs

Recurrent Neural Networks (RNNS) are now widely used on sequence generat...

Vis2Mus: Exploring Multimodal Representation Mapping for Controllable Music Generation

In this study, we explore the representation mapping from the domain of ...

Can GAN originate new electronic dance music genres? – Generating novel rhythm patterns using GAN with Genre Ambiguity Loss

Since the introduction of deep learning, researchers have proposed conte...

TOAD-GAN: Coherent Style Level Generation from a Single Example

In this work, we present TOAD-GAN (Token-based One-shot Arbitrary Dimens...

Please sign up or login with your details

Forgot password? Click here to reset