Spectrum-BERT: Pre-training of Deep Bidirectional Transformers for Spectral Classification of Chinese Liquors

by   Yansong Wang, et al.

Spectral detection technology, as a non-invasive method for rapid detection of substances, combined with deep learning algorithms, has been widely used in food detection. However, in real scenarios, acquiring and labeling spectral data is an extremely labor-intensive task, which makes it impossible to provide enough high-quality data for training efficient supervised deep learning models. To better leverage limited samples, we apply pre-training fine-tuning paradigm to the field of spectral detection for the first time and propose a pre-training method of deep bidirectional transformers for spectral classification of Chinese liquors, abbreviated as Spectrum-BERT. Specifically, first, to retain the model's sensitivity to the characteristic peak position and local information of the spectral curve, we innovatively partition the curve into multiple blocks and obtain the embeddings of different blocks, as the feature input for the next calculation. Second, in the pre-training stage, we elaborately design two pre-training tasks, Next Curve Prediction (NCP) and Masked Curve Model (MCM), so that the model can effectively utilize unlabeled samples to capture the potential knowledge of spectral data, breaking the restrictions of the insufficient labeled samples, and improving the applicability and performance of the model in practical scenarios. Finally, we conduct a large number of experiments on the real liquor spectral dataset. In the comparative experiments, the proposed Spectrum-BERT significantly outperforms the baselines in multiple metrics and this advantage is more significant on the imbalanced dataset. Moreover, in the parameter sensitivity experiment, we also analyze the model performance under different parameter settings, to provide a reference for subsequent research.


page 1

page 5

page 12


Multi-Task Bidirectional Transformer Representations for Irony Detection

Supervised deep learning requires large amounts of training data. In the...

TCBERT: A Technical Report for Chinese Topic Classification BERT

Bidirectional Encoder Representations from Transformers or BERT <cit.> h...

SCAI: A Spectral data Classification framework with Adaptive Inference for the IoT platform

Currently, it is a hot research topic to realize accurate, efficient, an...

Med-BERT: pre-trained contextualized embeddings on large-scale structured electronic health records for disease prediction

Deep learning (DL) based predictive models from electronic health record...

CSDR-BERT: a pre-trained scientific dataset match model for Chinese Scientific Dataset Retrieval

As the number of open and shared scientific datasets on the Internet inc...

SatMAE: Pre-training Transformers for Temporal and Multi-Spectral Satellite Imagery

Unsupervised pre-training methods for large vision models have shown to ...

Machine learning in spectral domain

Deep neural networks are usually trained in the space of the nodes, by a...

Please sign up or login with your details

Forgot password? Click here to reset