Representation Learning for Recommender Systems with Application to the Scientific Literature

by   Robin Brochier, et al.

The scientific literature is a large information network linking various actors (laboratories, companies, institutions, etc.). The vast amount of data generated by this network constitutes a dynamic heterogeneous attributed network (HAN), in which new information is constantly produced and from which it is increasingly difficult to extract content of interest. In this article, I present my first thesis works in partnership with an industrial company, Digital Scientific Research Technology. This later offers a scientific watch tool, Peerus, addressing various issues, such as the real time recommendation of newly published papers or the search for active experts to start new collaborations. To tackle this diversity of applications, a common approach consists in learning representations of the nodes and attributes of this HAN and use them as features for a variety of recommendation tasks. However, most works on attributed network embedding pay too little attention to textual attributes and do not fully take advantage of recent natural language processing techniques. Moreover, proposed methods that jointly learn node and document representations do not provide a way to effectively infer representations for new documents for which network information is missing, which happens to be crucial in real time recommender systems. Finally, the interplay between textual and graph data in text-attributed heterogeneous networks remains an open research direction.


page 1

page 2

page 3

page 4


Scientific Paper Recommendation: A Survey

Globally, recommendation services have become important due to the fact ...

Research Commentary on Recommendations with Side Information: A Survey and Research Directions

Recommender systems have become an essential tool to help resolve the in...

Contextual Document Similarity for Content-based Literature Recommender Systems

To cope with the ever-growing information overload, an increasing number...

MIReAD: Simple Method for Learning High-quality Representations from Scientific Documents

Learning semantically meaningful representations from scientific documen...

An unsupervised cluster-level based method for learning node representations of heterogeneous graphs in scientific papers

Learning knowledge representation of scientific paper data is a problem ...

New Datasets and a Benchmark of Document Network Embedding Methods for Scientific Expert Finding

The scientific literature is growing faster than ever. Finding an expert...

Measuring Diversity in Heterogeneous Information Networks

Diversity is a concept relevant to numerous domains of research as diver...

Please sign up or login with your details

Forgot password? Click here to reset