Extending Text Informativeness Measures to Passage Interestingness Evaluation (Language Model vs. Word Embedding)

Standard informativeness measures used to evaluate Automatic Text Summarization mostly rely on n-gram overlapping between the automatic summary and the reference summaries. These measures differ from the metric they use (cosine, ROUGE, Kullback-Leibler, Logarithm Similarity, etc.) and the bag of terms they consider (single words, word n-grams, entities, nuggets, etc.). Recent word embedding approaches offer a continuous alternative to discrete approaches based on the presence/absence of a text unit. Informativeness measures have been extended to Focus Information Retrieval evaluation involving a user's information need represented by short queries. In particular for the task of CLEF-INEX Tweet Contextualization, tweet contents have been considered as queries. In this paper we define the concept of Interestingness as a generalization of Informativeness, whereby the information need is diverse and formalized as an unknown set of implicit queries. We then study the ability of state of the art Informativeness measures to cope with this generalization. Lately we show that with this new framework, standard word embeddings outperforms discrete measures only on uni-grams, however bi-grams seems to be a key point of interestingness evaluation. Lastly we prove that the CLEF-INEX Tweet Contextualization 2012 Logarithm Similarity measure provides best results.


page 1

page 2

page 3

page 4


Text classification with word embedding regularization and soft similarity measure

Since the seminal work of Mikolov et al., word embeddings have become th...

Toward Word Embedding for Personalized Information Retrieval

This paper presents preliminary works on using Word Embedding (word2vec)...

Enhancing Translation Language Models with Word Embedding for Information Retrieval

In this paper, we explore the usage of Word Embedding semantic resources...

Better Summarization Evaluation with Word Embeddings for ROUGE

ROUGE is a widely adopted, automatic evaluation measure for text summari...

Latent Semantic Analysis Approach for Document Summarization Based on Word Embeddings

Since the amount of information on the internet is growing rapidly, it i...

Word Embedding based Edit Distance

Text similarity calculation is a fundamental problem in natural language...

ANALOGICAL - A New Benchmark for Analogy of Long Text for Large Language Models

Over the past decade, analogies, in the form of word-level analogies, ha...

Please sign up or login with your details

Forgot password? Click here to reset