A Million Tweets Are Worth a Few Points: Tuning Transformers for Customer Service Tasks

04/16/2021
by   Amir Hadifar, et al.
0

In online domain-specific customer service applications, many companies struggle to deploy advanced NLP models successfully, due to the limited availability of and noise in their datasets. While prior research demonstrated the potential of migrating large open-domain pretrained models for domain-specific tasks, the appropriate (pre)training strategies have not yet been rigorously evaluated in such social media customer service settings, especially under multilingual conditions. We address this gap by collecting a multilingual social media corpus containing customer service conversations (865k tweets), comparing various pipelines of pretraining and finetuning approaches, applying them on 5 different end tasks. We show that pretraining a generic multilingual transformer model on our in-domain dataset, before finetuning on specific end tasks, consistently boosts performance, especially in non-English settings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/02/2020

Cost-effective Selection of Pretraining Data: A Case Study of Pretraining BERT on Social Media

Recent studies on domain-specific BERT models show that effectiveness on...
research
07/03/2023

ALBERTI, a Multilingual Domain Specific Language Model for Poetry Analysis

The computational analysis of poetry is limited by the scarcity of tools...
research
06/09/2023

FPDM: Domain-Specific Fast Pre-training Technique using Document-Level Metadata

Pre-training Transformers has shown promising results on open-domain and...
research
09/14/2021

MDAPT: Multilingual Domain Adaptive Pretraining in a Single Model

Domain adaptive pretraining, i.e. the continued unsupervised pretraining...
research
08/25/2018

Churn Intent Detection in Multilingual Chatbot Conversations and Social Media

We propose a new method to detect when users express the intent to leave...
research
11/24/2021

Improving Customer Service Chatbots with Attention-based Transfer Learning

With growing societal acceptance and increasing cost efficiency due to m...
research
06/22/2023

Unveiling Global Narratives: A Multilingual Twitter Dataset of News Media on the Russo-Ukrainian Conflict

The ongoing Russo-Ukrainian conflict has been a subject of intense media...

Please sign up or login with your details

Forgot password? Click here to reset