research
∙
08/14/2023
Thresh: A Unified, Customizable and Deployable Platform for Fine-Grained Text Evaluation
Fine-grained, span-level human evaluation has emerged as a reliable and ...
research
∙
05/23/2023
Dancing Between Success and Failure: Edit-level Simplification Evaluation using SALSA
Large language models (e.g., GPT-3.5) are uniquely capable of producing ...
research
∙
12/19/2022