BenchCLAMP: A Benchmark for Evaluating Language Models on Semantic Parsing

06/21/2022
by   Subhro Roy, et al.
0

We introduce BenchCLAMP, a Benchmark to evaluate Constrained LAnguage Model Parsing, which produces semantic outputs based on the analysis of input text through constrained decoding of a prompted or fine-tuned language model. Developers of pretrained language models currently benchmark on classification, span extraction and free-text generation tasks. Semantic parsing is neglected in language model evaluation because of the complexity of handling task-specific architectures and representations. Recent work has shown that generation from a prompted or fine-tuned language model can perform well at semantic parsing when the output is constrained to be a valid semantic representation. BenchCLAMP includes context-free grammars for six semantic parsing datasets with varied output meaning representations, as well as a constrained decoding interface to generate outputs covered by these grammars. We provide low, medium, and high resource splits for each dataset, allowing accurate comparison of various language models under different data regimes. Our benchmark supports both prompt-based learning as well as fine-tuning, and provides an easy-to-use toolkit for language model developers to evaluate on semantic parsing.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/24/2023

AMR Parsing with Instruction Fine-tuned Pre-trained Language Models

Instruction fine-tuned language models on a collection of instruction an...
research
05/23/2023

Flexible Grammar-Based Constrained Decoding for Language Models

LLMs have shown impressive few-shot performance across many tasks. Howev...
research
09/10/2021

PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models

Large pre-trained language models for textual data have an unconstrained...
research
11/16/2022

Towards Computationally Verifiable Semantic Grounding for Language Models

The paper presents an approach to semantic grounding of language models ...
research
08/19/2023

DUAW: Data-free Universal Adversarial Watermark against Stable Diffusion Customization

Stable Diffusion (SD) customization approaches enable users to personali...
research
01/25/2023

Explaining Large Language Model-Based Neural Semantic Parsers (Student Abstract)

While large language models (LLMs) have demonstrated strong capability i...
research
04/17/2023

An Evaluation on Large Language Model Outputs: Discourse and Memorization

We present an empirical evaluation of various outputs generated by nine ...

Please sign up or login with your details

Forgot password? Click here to reset