Adapting BigScience Multilingual Model to Unseen Languages

04/11/2022
by   Zheng Xin Yong, et al.
0

We benchmark different strategies of adding new languages (German and Korean) into the BigScience's pretrained multilingual language model with 1.3 billion parameters that currently supports 13 languages. We investigate the factors that affect the language adaptability of the model and the trade-offs between computational costs and expected performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/27/2022

HiJoNLP at SemEval-2022 Task 2: Detecting Idiomaticity of Multiword Expressions using Multilingual Pretrained Language Models

This paper describes an approach to detect idiomaticity only from the co...
research
10/09/2020

Multichannel Generative Language Model: Learning All Possible Factorizations Within and Across Channels

A channel corresponds to a viewpoint or transformation of an underlying ...
research
10/09/2019

Is Multilingual BERT Fluent in Language Generation?

The multilingual BERT model is trained on 104 languages and meant to ser...
research
07/03/2023

ALBERTI, a Multilingual Domain Specific Language Model for Poetry Analysis

The computational analysis of poetry is limited by the scarcity of tools...
research
01/13/2023

Multilingual Detection of Check-Worthy Claims using World Languages and Adapter Fusion

Check-worthiness detection is the task of identifying claims, worthy to ...
research
05/22/2023

GPT-SW3: An Autoregressive Language Model for the Nordic Languages

This paper details the process of developing the first native large gene...
research
05/14/2021

Methods Included: Standardizing Computational Reuse and Portability with the Common Workflow Language

A widely used standard for portable multilingual data analysis pipelines...

Please sign up or login with your details

Forgot password? Click here to reset