research
          
      
      ∙
      09/30/2020
    Learning Hard Retrieval Cross Attention for Transformer
The Transformer translation model that based on the multi-head attention...
          
            research
          
      
      ∙
      07/13/2020
    Transformer with Depth-Wise LSTM
Increasing the depth of models allows neural models to model complicated...
          
            research
          
      
      ∙
      06/25/2020
    Learning Source Phrase Representations for Neural Machine Translation
The Transformer translation model (Vaswani et al., 2017) based on a mult...
          
            research
          
      
      ∙
      05/05/2020
    Dynamically Adjusting Transformer Batch Size by Monitoring Gradient Direction Change
The choice of hyper-parameters affects the performance of neural models....
          
            research
          
      
      ∙
      03/21/2020
    Analyzing Word Translation of Transformer Layers
The Transformer translation model is popular for its effective paralleli...
          
            research
          
      
      ∙
      11/08/2019
    Why Deep Transformers are Difficult to Converge? From Computation Order to Lipschitz Restricted Parameter Initialization
The Transformer translation model employs residual connection and layer ...
          
            research
          
      
      ∙
      08/09/2019
    UdS Submission for the WMT 19 Automatic Post-Editing Task
In this paper, we describe our submission to the English-German APE shar...
          
            research
          
      
      ∙
      03/18/2019
     
             
  
  
     
                             
                             share
 share