xSLUE: A Benchmark and Analysis Platform for Cross-Style Language Understanding and Evaluation
Every natural text is written in some style. The style is formed by a complex combination of different stylistic factors, including formality markers, emotions, metaphors, etc. Some factors implicitly reflect the author's personality, while others are explicitly controlled by the author's choices in order to achieve some personal or social goal. One cannot form a complete understanding of a text and its author without considering these factors. The factors combine and co-vary in complex ways to form styles. Studying the nature of the covarying combinations sheds light on stylistic language in general, sometimes called cross-style language understanding. This paper provides a benchmark corpus (xSLUE) with an online platform (http://xslue.com) for cross-style language understanding and evaluation. The benchmark contains text in 15 different styles and 23 classification tasks. For each task, we provide the fine-tuned classifier for further analysis. Our analysis shows that some styles are highly dependent on each other (e.g., impoliteness and offense), and some domains (e.g., tweets, political debates) are stylistically more diverse than others (e.g., academic manuscripts). We discuss the technical challenges of cross-style understanding and potential directions for future research: cross-style modeling which shares the internal representation for low-resource or low-performance styles and other applications such as cross-style generation.
READ FULL TEXT