Musical Voice Separation as Link Prediction: Modeling a Musical Perception Task as a Multi-Trajectory Tracking Problem

by   Emmanouil Karystinaios, et al.

This paper targets the perceptual task of separating the different interacting voices, i.e., monophonic melodic streams, in a polyphonic musical piece. We target symbolic music, where notes are explicitly encoded, and model this task as a Multi-Trajectory Tracking (MTT) problem from discrete observations, i.e., notes in a pitch-time space. Our approach builds a graph from a musical piece, by creating one node for every note, and separates the melodic trajectories by predicting a link between two notes if they are consecutive in the same voice/stream. This kind of local, greedy prediction is made possible by node embeddings created by a heterogeneous graph neural network that can capture inter- and intra-trajectory information. Furthermore, we propose a new regularization loss that encourages the output to respect the MTT premise of at most one incoming and one outgoing link for every node, favouring monophonic (voice) trajectories; this loss function might also be useful in other general MTT scenarios. Our approach does not use domain-specific heuristics, is scalable to longer sequences and a higher number of voices, and can handle complex cases such as voice inversions and overlaps. We reach new state-of-the-art results for the voice separation task in classical music of different styles.


page 1

page 2

page 3

page 4


From Note-Level to Chord-Level Neural Network Models for Voice Separation in Symbolic Music

Music is often experienced as a progression of concurrent streams of not...

Cadence Detection in Symbolic Classical Music using Graph Neural Networks

Cadences are complex structures that have been driving music from the be...

Music Transcription Based on Bayesian Piece-Specific Score Models Capturing Repetitions

Most work on models for music transcription has focused on describing lo...

Modeling Singing F0 With Neural Network Driven Transition-Sustain Models

This study focuses on generating fundamental frequency (F0) curves of si...

Improving Polyphonic Music Models with Feature-Rich Encoding

This paper explores sequential modeling of polyphonic music with deep ne...

Deep Karaoke: Extracting Vocals from Musical Mixtures Using a Convolutional Deep Neural Network

Identification and extraction of singing voice from within musical mixtu...

Using voice note-taking to promote learners' conceptual understanding

Though recent technological advances have enabled note-taking through di...

Please sign up or login with your details

Forgot password? Click here to reset