FlauBERT: Unsupervised Language Model Pre-training for French

by   Hang Le, et al.

Language models have become a key step to achieve state-of-the-art results in many different Natural Language Processing (NLP) tasks. Leveraging the huge amount of unlabeled texts nowadays available, they provide an efficient way to pre-train continuous word representations that can be fine-tuned for a downstream task, along with their contextualization at the sentence level. This has been widely demonstrated for English using contextualized word representations such as OpenAI GPT (Radford et al., 2018), BERT (Devlin et al., 2019), or XLNet (Yang et al., 2019b). In this paper, we introduce and share FlauBERT, a model learned on a very large and heterogeneous French corpus. Models of different sizes are trained using the new CNRS (French National Centre for Scientific Research) Jean Zay supercomputer. We apply our French language models to complex NLP tasks (natural language inference, parsing, word sense disambiguation) and show that most of the time they outperform other pre-training approaches. Different versions of FlauBERT as well as a unified evaluation protocol for the downstream tasks are shared with the research community for further reproducible experiments in French NLP.


page 1

page 2

page 3

page 4


GREEK-BERT: The Greeks visiting Sesame Street

Transformer-based language models, such as BERT and its variants, have a...

Alternative Weighting Schemes for ELMo Embeddings

ELMo embeddings (Peters et. al, 2018) had a huge impact on the NLP commu...

What do you learn from context? Probing for sentence structure in contextualized word representations

Contextualized representation models such as ELMo (Peters et al., 2018a)...

DeBERTa: Decoding-enhanced BERT with Disentangled Attention

Recent progress in pre-trained neural language models has significantly ...

When FastText Pays Attention: Efficient Estimation of Word Representations using Constrained Positional Weighting

Since the seminal work of Mikolov et al. (2013a) and Bojanowski et al. (...

The role of context in neural pitch accent detection in English

Prosody is a rich information source in natural language, serving as a m...

Please sign up or login with your details

Forgot password? Click here to reset