On Accelerated Perceptrons and Beyond

10/17/2022
by   Guanghui Wang, et al.
0

The classical Perceptron algorithm of Rosenblatt can be used to find a linear threshold function to correctly classify n linearly separable data points, assuming the classes are separated by some margin γ > 0. A foundational result is that Perceptron converges after Ω(1/γ^2) iterations. There have been several recent works that managed to improve this rate by a quadratic factor, to Ω(√(log n)/γ), with more sophisticated algorithms. In this paper, we unify these existing results under one framework by showing that they can all be described through the lens of solving min-max problems using modern acceleration techniques, mainly through optimistic online learning. We then show that the proposed framework also lead to improved results for a series of problems beyond the standard Perceptron setting. Specifically, a) For the margin maximization problem, we improve the state-of-the-art result from O(log t/t^2) to O(1/t^2), where t is the number of iterations; b) We provide the first result on identifying the implicit bias property of the classical Nesterov's accelerated gradient descent (NAG) algorithm, and show NAG can maximize the margin with an O(1/t^2) rate; c) For the classical p-norm Perceptron problem, we provide an algorithm with Ω(√((p-1)log n)/γ) convergence rate, while existing algorithms suffer the Ω((p-1)/γ^2) convergence rate.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/11/2019

A refined primal-dual analysis of the implicit bias

Recent work shows that gradient descent on linearly separable data is im...
research
06/03/2023

Accelerated Quasi-Newton Proximal Extragradient: Faster Rate for Smooth Convex Optimization

In this paper, we propose an accelerated quasi-Newton proximal extragrad...
research
03/05/2018

Convergence of Gradient Descent on Separable Data

The implicit bias of gradient descent is not fully understood even in si...
research
05/15/2015

Margins, Kernels and Non-linear Smoothed Perceptrons

We focus on the problem of finding a non-linear classification function ...
research
10/28/2021

Tractability from overparametrization: The example of the negative perceptron

In the negative perceptron problem we are given n data points ( x_i,y_i)...
research
12/21/2019

Bandit Multiclass Linear Classification for the Group Linear Separable Case

We consider the online multiclass linear classification under the bandit...
research
01/05/2022

Convergence and Complexity of Stochastic Block Majorization-Minimization

Stochastic majorization-minimization (SMM) is an online extension of the...

Please sign up or login with your details

Forgot password? Click here to reset