research
          
      
      ∙
      08/14/2023
    Thresh: A Unified, Customizable and Deployable Platform for Fine-Grained Text Evaluation
Fine-grained, span-level human evaluation has emerged as a reliable and ...
          
            research
          
      
      ∙
      05/23/2023
    Dancing Between Success and Failure: Edit-level Simplification Evaluation using SALSA
Large language models (e.g., GPT-3.5) are uniquely capable of producing ...
          
            research
          
      
      ∙
      12/19/2022