ODE Transformer: An Ordinary Differential Equation-Inspired Model for Neural Machine Translation

04/06/2021
by   Bei Li, et al.
0

It has been found that residual networks are an Euler discretization of solutions to Ordinary Differential Equations (ODEs). In this paper, we explore a deeper relationship between Transformer and numerical methods of ODEs. We show that a residual block of layers in Transformer can be described as a higher-order solution to ODEs. This leads us to design a new architecture (call it ODE Transformer) analogous to the Runge-Kutta method that is well motivated in ODEs. As a natural extension to Transformer, ODE Transformer is easy to implement and parameter efficient. Our experiments on three WMT tasks demonstrate the genericity of this model, and large improvements in performance over several strong baselines. It achieves 30.76 and 44.11 BLEU scores on the WMT'14 En-De and En-Fr test data. This sets a new state-of-the-art on the WMT'14 En-Fr task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/22/2020

N-ODE Transformer: A Depth-Adaptive Variant of the Transformer Using Neural Ordinary Differential Equations

We use neural ordinary differential equations to formulate a variant of ...
research
11/05/2022

Discovering ordinary differential equations that govern time-series

Natural laws are often described through differential equations yet find...
research
10/17/2019

Fully Quantized Transformer for Improved Translation

State-of-the-art neural machine translation methods employ massive amoun...
research
12/12/2022

A Neural ODE Interpretation of Transformer Layers

Transformer layers, which use an alternating pattern of multi-head atten...
research
06/10/2019

ANODEV2: A Coupled Neural ODE Evolution Framework

It has been observed that residual networks can be viewed as the explici...
research
09/27/2021

Abstraction, Reasoning and Deep Learning: A Study of the "Look and Say" Sequence

The ability to abstract, count, and use System 2 reasoning are well-know...
research
06/06/2019

Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View

The Transformer architecture is widely used in natural language processi...

Please sign up or login with your details

Forgot password? Click here to reset