Investigation of Large-Margin Softmax in Neural Language Modeling

05/20/2020
by   Jingjing Huo, et al.
0

To encourage intra-class compactness and inter-class separability among trainable feature vectors, large-margin softmax methods are developed and widely applied in the face recognition community. The introduction of the large-margin concept into the softmax is reported to have good properties such as enhanced discriminative power, less overfitting and well-defined geometric intuitions. Nowadays, language modeling is commonly approached with neural networks using softmax and cross entropy. In this work, we are curious to see if introducing large-margins to neural language models would improve the perplexity and consequently word error rate in automatic speech recognition. Specifically, we first implement and test various types of conventional margins following the previous works in face recognition. To address the distribution of natural language data, we then compare different strategies for word vector norm-scaling. After that, we apply the best norm-scaling setup in combination with various margins and conduct neural language models rescoring experiments in automatic speech recognition. We find that although perplexity is slightly deteriorated, neural language models with large-margin softmax can yield word error rate similar to that of the standard softmax baseline. Finally, expected margins are analyzed through visualization of word vectors, showing that the syntactic and semantic relationships are also preserved.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/29/2018

CosFace: Large Margin Cosine Loss for Deep Face Recognition

Face recognition has achieved revolutionary advancement owing to the adv...
research
08/27/2018

Large Margin Neural Language Model

We propose a large margin criterion for training neural language models....
research
04/21/2021

On Sampling-Based Training Criteria for Neural Language Modeling

As the vocabulary size of modern word-based language models becomes ever...
research
11/10/2019

Improved Large-margin Softmax Loss for Speaker Diarisation

Speaker diarisation systems nowadays use embeddings generated from speec...
research
10/04/2017

Syntactic and Semantic Features For Code-Switching Factored Language Models

This paper presents our latest investigations on different features for ...
research
10/07/2016

A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks

We consider the two related problems of detecting if an example is miscl...
research
07/23/2017

Language modeling with Neural trans-dimensional random fields

Trans-dimensional random field language models (TRF LMs) have recently b...

Please sign up or login with your details

Forgot password? Click here to reset