Revisit Multinomial Logistic Regression in Deep Learning: Data Dependent Model Initialization for Image Recognition

by   Bowen Cheng, et al.

We study in this paper how to initialize the parameters of multinomial logistic regression (a fully connected layer followed with softmax and cross entropy loss), which is widely used in deep neural network (DNN) models for classification problems. As logistic regression is widely known not having a closed-form solution, it is usually randomly initialized, leading to several deficiencies especially in transfer learning where all the layers except for the last task-specific layer are initialized using a pre-trained model. The deficiencies include slow convergence speed, possibility of stuck in local minimum, and the risk of over-fitting. To address those deficiencies, we first study the properties of logistic regression and propose a closed-form approximate solution named regularized Gaussian classifier (RGC). Then we adopt this approximate solution to initialize the task-specific linear layer and demonstrate superior performance over random initialization in terms of both accuracy and convergence speed on various tasks and datasets. For example, for image classification, our approach can reduce the training time by 10 times and achieve 3.2 detection, our approach can also be 10 times faster in training for the same accuracy, or 5 training.


page 1

page 2

page 3

page 4


Inverse classification with logistic and softmax classifiers: efficient optimization

In recent years, a certain type of problems have become of interest wher...

Detecting floodwater on roadways from image data with handcrafted features and deep transfer learning

Detecting roadway segments inundated due to floodwater has important app...

Language Recognition using Time Delay Deep Neural Network

This work explores the use of a monolingual Deep Neural Network (DNN) mo...

Information-theoretical label embeddings for large-scale image classification

We present a method for training multi-label, massively multi-class imag...

A proof of convergence of multi-class logistic regression network

This paper revisits the special type of a neural network known under two...

Deep vs. Shallow Learning: A Benchmark Study in Low Magnitude Earthquake Detection

While deep learning models have seen recent high uptake in the geoscienc...

Deep Embedding Kernel

In this paper, we propose a novel supervised learning method that is cal...

Please sign up or login with your details

Forgot password? Click here to reset