GiantMIDI-Piano: A large-scale MIDI dataset for classical piano music

10/11/2020
by   Qiuqiang Kong, et al.
0

Symbolic music datasets are important for music information retrieval and musical analysis. However, there is a lack of large-scale symbolic dataset for classical piano music. In this article, we create a GiantMIDI-Piano dataset containing 10,854 unique piano solo pieces composed by 2,786 composers. The dataset is collected as follows, we extract music piece names and composer names from the International Music Score Library Project (IMSLP). We search and download their corresponding audio recordings from the internet. We apply a convolutional neural network to detect piano solo pieces. Then, we transcribe those piano solo recordings to Musical Instrument Digital Interface (MIDI) files using our recently proposed high-resolution piano transcription system. Each transcribed MIDI file contains onset, offset, pitch and velocity attributes of piano notes, and onset and offset attributes of sustain pedals. GiantMIDI-Piano contains 34,504,873 transcribed notes, and contains metadata information of each music piece. To our knowledge, GiantMIDI-Piano is the largest classical piano MIDI dataset so far. We analyses the statistics of GiantMIDI-Piano including the nationalities, the number and duration of works of composers. We show the chroma, interval, trichord and tetrachord frequencies of six composers from different eras to show that GiantMIDI-Piano can be used for musical analysis. Our piano solo detection system achieves an accuracy of 89%, and the piano note transcription achieves an onset F1 of 96.72% evaluated on the MAESTRO dataset. GiantMIDI-Piano achieves an alignment error rate (ER) of 0.154 to the manually input MIDI files, comparing to MAESTRO with an alignment ER of 0.061 to the manually input MIDI files. We release the source code of acquiring the GiantMIDI-Piano dataset at https://github.com/bytedance/GiantMIDI-Piano.

READ FULL TEXT

page 4

page 5

page 6

research
10/05/2020

High-resolution Piano Transcription with Pedals by Regressing Onsets and Offsets Times

Automatic music transcription (AMT) is the task of transcribing audio re...
research
09/26/2022

HSD: A hierarchical singing annotation dataset

Commonly music has an obvious hierarchical structure, especially for the...
research
06/24/2019

A Convolutional Approach to Melody Line Identification in Symbolic Scores

In many musical traditions, the melody line is of primary significance i...
research
04/28/2022

Unaligned Supervision For Automatic Music Transcription in The Wild

Multi-instrument Automatic Music Transcription (AMT), or the decoding of...
research
03/07/2023

At Your Fingertips: Extracting Piano Fingering Instructions from Videos

Piano fingering – knowing which finger to use to play each note in a mus...
research
07/27/2021

PKSpell: Data-Driven Pitch Spelling and Key Signature Estimation

We present PKSpell: a data-driven approach for the joint estimation of p...
research
08/03/2021

An analysis of Iranian Music Intervals based on Pitch Histogram

Since the early twentieth century, intervals and tuning systems have bee...

Please sign up or login with your details

Forgot password? Click here to reset