Solving Regularized Exp, Cosh and Sinh Regression Problems

03/28/2023
by   Zhihang Li, et al.
0

In modern machine learning, attention computation is a fundamental task for training large language models such as Transformer, GPT-4 and ChatGPT. In this work, we study exponential regression problem which is inspired by the softmax/exp unit in the attention mechanism in large language models. The standard exponential regression is non-convex. We study the regularization version of exponential regression problem which is a convex problem. We use approximate newton method to solve in input sparsity time. Formally, in this problem, one is given matrix A ∈ℝ^n × d, b ∈ℝ^n, w ∈ℝ^n and any of functions exp, cosh and sinh denoted as f. The goal is to find the optimal x that minimize 0.5 f(Ax) - b _2^2 + 0.5 diag(w) A x _2^2. The straightforward method is to use the naive Newton's method. Let nnz(A) denote the number of non-zeros entries in matrix A. Let ω denote the exponent of matrix multiplication. Currently, ω≈ 2.373. Let ϵ denote the accuracy error. In this paper, we make use of the input sparsity and purpose an algorithm that use log ( x_0 - x^*_2 / ϵ) iterations and O(nnz(A) + d^ω ) per iteration time to solve the problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/28/2023

Solving Attention Kernel Regression Problem via Pre-conditioner

Large language models have shown impressive performance in many tasks. O...
research
02/26/2023

Fast Attention Requires Bounded Entries

In modern machine learning, inner product attention computation is a fun...
research
04/10/2023

Randomized and Deterministic Attention Sparsification Algorithms for Over-parameterized Feature Dimension

Large language models (LLMs) have shown their power in different areas. ...
research
05/01/2023

An Iterative Algorithm for Rescaled Hyperbolic Functions Regression

Large language models (LLMs) have numerous real-life applications across...
research
08/16/2023

Convergence of Two-Layer Regression with Nonlinear Units

Large language models (LLMs), such as ChatGPT and GPT4, have shown outst...
research
11/26/2022

Faster Algorithm for Structured John Ellipsoid Computation

Computing John Ellipsoid is a fundamental problem in machine learning an...
research
07/05/2023

In-Context Learning for Attention Scheme: from Single Softmax Regression to Multiple Softmax Regression via a Tensor Trick

Large language models (LLMs) have brought significant and transformative...

Please sign up or login with your details

Forgot password? Click here to reset