Transformer Models for Type Inference in the Simply Typed Lambda Calculus: A Case Study in Deep Learning for Code

by   Brando Miranda, et al.

Despite a growing body of work at the intersection of deep learning and formal languages, there has been relatively little systematic exploration of transformer models for reasoning about typed lambda calculi. This is an interesting area of inquiry for two reasons. First, typed lambda calculi are the lingua franc of programming languages. A set of heuristics that relate various typed lambda calculi to effective neural architectures would provide a systematic method for mapping language features (e.g., polymorphism, subtyping, inheritance, etc.) to architecture choices. Second, transformer models are widely used in deep learning architectures applied to code, but the design and hyperparameter space for them is large and relatively unexplored in programming language applications. Therefore, we suggest a benchmark that allows us to explore exactly this through perhaps the simplest and most fundamental property of a programming language: the relationship between terms and types. Consequently, we begin this inquiry of transformer architectures for typed lambda calculi by exploring the effect of transformer warm-up and optimizer selection in the task of type inference: i.e., predicting the types of lambda calculus terms using only transformers. We find that the optimization landscape is difficult even in this simple setting. One particular experimental finding is that optimization by Adafactor converges much faster compared to the optimization by Adam and RAdam. We conjecture that such different performance of optimizers might be related to the difficulties of generalization over formally generated dataset.


The Undecidability of Typability in the Lambda-Pi-Calculus

The set of pure terms which are typable in the λΠ-calculus in a given co...

Thinking Like Transformers

What is the computational model behind a Transformer? Where recurrent ne...

CoTexT: Multi-task Learning with Code-Text Transformer

We present CoTexT, a pre-trained, transformer-based encoder-decoder mode...

Typing Composable Coroutines

Coroutine, as a powerful programming construct, is widely used in asynch...

Generalization in NLI: Ways (Not) To Go Beyond Simple Heuristics

Much of recent progress in NLU was shown to be due to models' learning d...

Performance vs Programming Effort between Rust and C on Multicore Architectures: Case Study in N-Body

Historically, Fortran and C have been the default programming languages ...

Optimizing Space of Parallel Processes

This paper is a contribution to exploring and analyzing space-improvemen...

Please sign up or login with your details

Forgot password? Click here to reset