A Domain-Knowledge-Inspired Music Embedding Space and a Novel Attention Mechanism for Symbolic Music Modeling

by   Z. Guo, et al.

Following the success of the transformer architecture in the natural language domain, transformer-like architectures have been widely applied to the domain of symbolic music recently. Symbolic music and text, however, are two different modalities. Symbolic music contains multiple attributes, both absolute attributes (e.g., pitch) and relative attributes (e.g., pitch interval). These relative attributes shape human perception of musical motifs. These important relative attributes, however, are mostly ignored in existing symbolic music modeling methods with the main reason being the lack of a musically-meaningful embedding space where both the absolute and relative embeddings of the symbolic music tokens can be efficiently represented. In this paper, we propose the Fundamental Music Embedding (FME) for symbolic music based on a bias-adjusted sinusoidal encoding within which both the absolute and the relative attributes can be embedded and the fundamental musical properties (e.g., translational invariance) are explicitly preserved. Taking advantage of the proposed FME, we further propose a novel attention mechanism based on the relative index, pitch and onset embeddings (RIPO attention) such that the musical domain knowledge can be fully utilized for symbolic music modeling. Experiment results show that our proposed model: RIPO transformer which utilizes FME and RIPO attention outperforms the state-of-the-art transformers (i.e., music transformer, linear transformer) in a melody completion task. Moreover, using the RIPO transformer in a downstream music generation task, we notice that the notorious degeneration phenomenon no longer exists and the music generated by the RIPO transformer outperforms the music generated by state-of-the-art transformer models in both subjective and objective evaluations.


From Words to Music: A Study of Subword Tokenization Techniques in Symbolic Music Generation

Subword tokenization has been widely successful in text-based natural la...

PiRhDy: Learning Pitch-, Rhythm-, and Dynamics-aware Embeddings for Symbolic Music

Definitive embeddings remain a fundamental challenge of computational mu...

Pitchclass2vec: Symbolic Music Structure Segmentation with Chord Embeddings

Structure perception is a fundamental aspect of music cognition in human...

Byte Pair Encoding for Symbolic Music

The symbolic music modality is nowadays mostly represented as discrete a...

ProgGP: From GuitarPro Tablature Neural Generation To Progressive Metal Production

Recent work in the field of symbolic music generation has shown value in...

An Comparative Analysis of Different Pitch and Metrical Grid Encoding Methods in the Task of Sequential Music Generation

Pitch and meter are two fundamental music features for symbolic music ge...

Embedding Calibration for Music Semantic Similarity using Auto-regressive Transformer

One of the advantages of using natural language processing (NLP) technol...

Please sign up or login with your details

Forgot password? Click here to reset