The COUGHVID crowdsourcing dataset: A corpus for the study of large-scale cough analysis algorithms

by   Lara Orlandic, et al.

Cough audio signal classification has been successfully used to diagnose a variety of respiratory conditions, and there has been significant interest in leveraging Machine Learning (ML) to provide widespread COVID-19 screening. However, there is currently no validated database of cough sounds with which to train such ML models. The COUGHVID dataset provides over 20,000 crowdsourced cough recordings representing a wide range of subject ages, genders, geographic locations, and COVID-19 statuses. First, we filtered the dataset using our open-sourced cough detection algorithm. Second, experienced pulmonologists labeled more than 2,000 recordings to diagnose medical abnormalities present in the coughs, thereby contributing one of the largest expert-labeled cough datasets in existence that can be used for a plethora of cough audio classification tasks. Finally, we ensured that coughs labeled as symptomatic and COVID-19 originate from countries with high infection rates, and that their expert labels are consistent. As a result, the COUGHVID dataset contributes a wealth of cough recordings for training ML models to address the world's most urgent health crises.


A large-scale and PCR-referenced vocal audio dataset for COVID-19

The UK COVID-19 Vocal Audio Dataset is designed for the training and eva...

COVID-19 Detection System: A Comparative Analysis of System Performance Based on Acoustic Features of Cough Audio Signals

A wide range of respiratory diseases, such as cold and flu, asthma, and ...

Virufy: Global Applicability of Crowdsourced and Clinical Datasets for AI Detection of COVID-19 from Cough

Rapid and affordable methods of testing for COVID-19 infections are esse...

Audio-based AI classifiers show no evidence of improved COVID-19 screening over simple symptoms checkers

Recent work has reported that AI classifiers trained on audio recordings...

HumBug Zooniverse: a crowd-sourced acoustic mosquito dataset

Mosquitoes are the only known vector of malaria, which leads to hundreds...

Machine Learning Research Towards Combating COVID-19: Virus Detection, Spread Prevention, and Medical Assistance

COVID-19 was first discovered in December 2019 and has continued to rapi...

Please sign up or login with your details

Forgot password? Click here to reset