Joint Optimization of Tokenization and Downstream Model

05/26/2021
by   Tatsuya Hiraoka, et al.
0

Since traditional tokenizers are isolated from a downstream task and model, they cannot output an appropriate tokenization depending on the task and model, although recent studies imply that the appropriate tokenization improves the performance. In this paper, we propose a novel method to find an appropriate tokenization to a given downstream model by jointly optimizing a tokenizer and the model. The proposed method has no restriction except for using loss values computed by the downstream model to train the tokenizer, and thus, we can apply the proposed method to any NLP task. Moreover, the proposed method can be used to explore the appropriate tokenization for an already trained model as post-processing. Therefore, the proposed method is applicable to various situations. We evaluated whether our method contributes to improving performance on text classification in three languages and machine translation in eight language pairs. Experimental results show that our proposed method improves the performance by determining appropriate tokenizations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/21/2023

Downstream Task-Oriented Neural Tokenizer Optimization with Vocabulary Restriction as Post Processing

This paper proposes a method to optimize tokenization for the performanc...
research
08/18/2023

Differentiable Retrieval Augmentation via Generative Language Modeling for E-commerce Query Intent Classification

Retrieval augmentation, which enhances downstream models by a knowledge ...
research
04/17/2022

Global-Supervised Contrastive Loss and View-Aware-Based Post-Processing for Vehicle Re-Identification

In this paper, we propose a Global-Supervised Contrastive loss and a vie...
research
02/20/2021

An Attention Ensemble Approach for Efficient Text Classification of Indian Languages

The recent surge of complex attention-based deep learning architectures ...
research
10/01/2019

Machine Translation for Machines: the Sentiment Classification Use Case

We propose a neural machine translation (NMT) approach that, instead of ...
research
11/25/2017

Complex Structure Leads to Overfitting: A Structure Regularization Decoding Method for Natural Language Processing

Recent systems on structured prediction focus on increasing the level of...
research
04/13/2022

A Novel Approach to Train Diverse Types of Language Models for Health Mention Classification of Tweets

Health mention classification deals with the disease detection in a give...

Please sign up or login with your details

Forgot password? Click here to reset