This work introduces Zambezi Voice, an open-source multilingual speech
r...
Neural machine translation (NMT) systems exhibit limited robustness in
h...
We present BIG-C (Bemba Image Grounded Conversations), a large multimoda...
The wide accessibility of social media has provided linguistically
under...
Despite the major advances in NLP, significant disparities in NLP system...
Knowing the language of an input text/audio is a necessary first step fo...
This report describes GMU's sentiment analysis system for the SemEval-20...
The Perso-Arabic scripts are a family of scripts that are widely adopted...
One of the major challenges that under-represented and endangered langua...
There has been recent interest in improving optical character recognitio...
An ongoing challenge in current natural language processing is how its m...
Pretrained language models (PLMs) often fail to fairly represent target ...
One of the challenges of language teaching is how to organize the rules
...
Large pretrained multilingual models, trained on dozens of languages, ha...
Mapuzugun is the language of the Mapuche people. Due to political and
hi...
The Universal Morphology (UniMorph) project is a collaborative effort
pr...
Each language has its own complex systems of word, phrase, and sentence
...
Recent work by Søgaard (2020) showed that, treebank size aside, overlap
...
As language technologies become more ubiquitous, there are increasing ef...
Much of the existing linguistic data in many languages of the world is l...
Natural language processing (NLP) systems have become a central technolo...
Question answering (QA) systems are now available through numerous comme...
Human knowledge is collectively encoded in the roughly 6500 languages sp...
Transliteration is very common on social media, but transliterated text ...
As neural machine translation (NMT) systems become an important part of
...
Automated source code summarization is a popular software engineering
re...
Question answering (QA) in English has been widely explored, but multili...
Models pre-trained on multiple languages have shown significant promise ...
Text generation systems are ubiquitous in natural language processing
ap...
Predicting user intent and detecting the corresponding slots from text a...
We present a preprocessed, ready-to-use automatic speech recognition cor...
There is little to no data available to build natural language processin...
Active learning (AL) uses a data selection algorithm to select useful
tr...
Spelling normalization for low resource languages is a challenging task
...
Language models (LMs) have proven surprisingly successful at capturing
f...
As machine translation (MT) systems progress at a rapid pace, questions ...
Creating a descriptive grammar of a language is an indispensable step fo...
The COVID-19 pandemic is the worst pandemic to strike the world in over ...
A broad goal in natural language processing (NLP) is to develop a system...
The performance of neural machine translation systems is commonly evalua...
Given the complexity of combinations of tasks, languages, and domains in...
Despite recent advances in natural language processing and other languag...
We propose a method of curating high-quality comparable training data fo...
We introduce a new resource, AlloVera, which provides mappings from 218
...
Back-translation has proven to be an effective method to utilize monolin...
We present the first resource focusing on the verbal inflectional morpho...
Multilingual models can improve language processing, particularly for lo...
Current grammatical error correction (GEC) models typically consider the...
Toxic content detection aims to identify content that can offend or harm...
We present a resource for computational experiments on Mapudungun, a
pol...