Mono vs Multilingual Transformer-based Models: a Comparison across Several Language Tasks

07/19/2020
by   Diego de Vargas Feijo, et al.
0

BERT (Bidirectional Encoder Representations from Transformers) and ALBERT (A Lite BERT) are methods for pre-training language models which can later be fine-tuned for a variety of Natural Language Understanding tasks. These methods have been applied to a number of such tasks (mostly in English), achieving results that outperform the state-of-the-art. In this paper, our contribution is twofold. First, we make available our trained BERT and Albert model for Portuguese. Second, we compare our monolingual and the standard multilingual models using experiments in semantic textual similarity, recognizing textual entailment, textual category classification, sentiment analysis, offensive comment detection, and fake news detection, to assess the effectiveness of the generated language representations. The results suggest that both monolingual and multilingual models are able to achieve state-of-the-art and the advantage of training a single language model, if any, is small.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/19/2022

Mono vs Multilingual BERT for Hate Speech Detection and Text Classification: A Case Study in Marathi

Transformers are the most eminent architectures used for a vast range of...
research
04/18/2022

Exploring Dimensionality Reduction Techniques in Multilingual Transformers

Both in scientific literature and in industry,, Semantic and context-awa...
research
05/17/2019

Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language

The paper introduces methods of adaptation of multilingual masked langua...
research
11/15/2019

Evaluating robustness of language models for chief complaint extraction from patient-generated text

Automated classification of chief complaints from patient-generated text...
research
03/24/2021

Czert – Czech BERT-like Model for Language Representation

This paper describes the training process of the first Czech monolingual...
research
05/06/2021

Adapting Monolingual Models: Data can be Scarce when Language Similarity is High

For many (minority) languages, the resources needed to train large model...
research
01/09/2022

Semantic and sentiment analysis of selected Bhagavad Gita translations using BERT-based language framework

It is well known that translations of songs and poems not only breaks rh...

Please sign up or login with your details

Forgot password? Click here to reset