Bridging the Gap: Incorporating a Semantic Similarity Measure for Effectively Mapping PubMed Queries to Documents

by   Sun Kim, et al.

The main approach of traditional information retrieval (IR) is to examine how many words from a query appear in a document. A drawback of this approach, however, is that it may fail to detect relevant documents where no or only few words from a query are found. The semantic analysis methods such as LSA (latent semantic analysis) and LDA (latent Dirichlet allocation) have been proposed to address the issue, but their performance is not superior compared to common IR approaches. Here we present a query-document similarity measure motivated by the Word Mover's Distance. Unlike other similarity measures, the proposed method relies on neural word embeddings to compute the distance between words. This process helps identify related words when no direct matches are found between a query and a document. Our method is efficient and straightforward to implement. The experimental results on TREC Genomics data show that our approach outperforms the BM25 ranking function by an average of 12 average precision. Furthermore, for a real-world dataset collected from the PubMed search logs, we combine the semantic measure with BM25 using a learning to rank method, which leads to improved ranking scores by up to 25 experiment demonstrates that the proposed approach and BM25 nicely complement each other and together produce superior performance.


page 1

page 2

page 3

page 4


Beyond [CLS] through Ranking by Generation

Generative models for Information Retrieval, where ranking of documents ...

Affect Enriched Word Embeddings for News Information Retrieval

Distributed representations of words have shown to be useful to improve ...

Toward a Deep Neural Approach for Knowledge-Based IR

This paper tackles the problem of the semantic gap between a document an...

Understanding Differential Search Index for Text Retrieval

The Differentiable Search Index (DSI) is a novel information retrieval (...

A New Parallel Algorithm for Sinkhorn Word-Movers Distance and Its Performance on PIUMA and Xeon CPU

The Word Movers Distance (WMD) measures the semantic dissimilarity betwe...

An Efficient Shared-memory Parallel Sinkhorn-Knopp Algorithm to Compute the Word Mover's Distance

The Word Mover's Distance (WMD) is a metric that measures the semantic d...

Consistency and Variation in Kernel Neural Ranking Model

This paper studies the consistency of the kernel-based neural ranking mo...

Please sign up or login with your details

Forgot password? Click here to reset