SVP-CF: Selection via Proxy for Collaborative Filtering Data

07/11/2021
by   Noveen Sachdeva, et al.
0

We study the practical consequences of dataset sampling strategies on the performance of recommendation algorithms. Recommender systems are generally trained and evaluated on samples of larger datasets. Samples are often taken in a naive or ad-hoc fashion: e.g. by sampling a dataset randomly or by selecting users or items with many interactions. As we demonstrate, commonly-used data sampling schemes can have significant consequences on algorithm performance – masking performance deficiencies in algorithms or altering the relative performance of algorithms, as compared to models trained on the complete dataset. Following this observation, this paper makes the following main contributions: (1) characterizing the effect of sampling on algorithm performance, in terms of algorithm and dataset characteristics (e.g. sparsity characteristics, sequential dynamics, etc.); and (2) designing SVP-CF, which is a data-specific sampling strategy, that aims to preserve the relative performance of models after sampling, and is especially suited to long-tail interaction data. Detailed experiments show that SVP-CF is more accurate than commonly used sampling schemes in retaining the relative ranking of different recommendation algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/13/2022

On Sampling Collaborative Filtering Datasets

We study the practical consequences of dataset sampling strategies on th...
research
07/16/2018

A Distributed Collaborative Filtering Algorithm Using Multiple Data Sources

Collaborative Filtering (CF) is one of the most commonly used recommenda...
research
07/27/2021

A Case Study on Sampling Strategies for Evaluating Neural Sequential Item Recommendation Models

At the present time, sequential item recommendation models are compared ...
research
09/05/2019

Assessing Fashion Recommendations: A Multifaceted Offline Evaluation Approach

Fashion is a unique domain for developing recommender systems (RS). Pers...
research
07/26/2020

Exploring Data Splitting Strategies for the Evaluation of Recommendation Models

Effective methodologies for evaluating recommender systems are critical,...
research
06/08/2023

Safe Collaborative Filtering

Excellent tail performance is crucial for modern machine learning tasks,...
research
10/14/2020

Robust Ranking of Equivalent Algorithms via Relative Performance

In scientific computing, it is common that one target computation can be...

Please sign up or login with your details

Forgot password? Click here to reset