We introduce three simple randomized variants of byte pair encoding (BPE...
This preprint describes work in progress on LR-Sum, a new
permissively-l...
We present a new corpus of Twitter data annotated for codeswitching and
...
This work presents a new resource for borrowing identification and analy...
This preprint describes work in progress on ParaNames, a multilingual
pa...
In this position paper, we describe our perspective on how meaningful
re...
We present a new multilingual corpus containing text in 44 languages, ma...
This paper summarizes the main findings of the ADoBo 2021 shared task,
p...
To address what we believe is a looming crisis of unreproducible evaluat...
While traditional corpus-level evaluation metrics for machine translatio...
This work supports further development of language technology for the
la...
We propose the Tough Mentions Recall (TMR) metrics to supplement traditi...
We take a step towards addressing the under-representation of the Africa...
This paper evaluates the performance of several modern subword segmentat...