The Impact of Feature Quantity on Recommendation Algorithm Performance: A Movielens-100K Case Study

by   Lukas Wegmeth, et al.

Recent model-based Recommender Systems (RecSys) algorithms emphasize on the use of features, also called side information, in their design similar to algorithms in Machine Learning (ML). In contrast, some of the most popular and traditional algorithms for RecSys solely focus on a given user-item-rating relation without including side information. The goal of this case study is to provide a performance comparison and assessment of RecSys and ML algorithms when side information is included. We chose the Movielens-100K data set since it is a standard for comparing RecSys algorithms. We compared six different feature sets with varying quantities of features which were generated from the baseline data and evaluated on a total of 19 RecSys algorithms, baseline ML algorithms, Automated Machine Learning (AutoML) pipelines, and state-of-the-art RecSys algorithms that incorporate side information. The results show that additional features benefit all algorithms we evaluated. However, the correlation between feature quantity and performance is not monotonous for AutoML and RecSys. In these categories, an analysis of feature importance revealed that the quality of features matters more than quantity. Throughout our experiments, the average performance on the feature set with the lowest number of features is about 6 of the Root Mean Squared Error. An interesting observation is that AutoML outperforms matrix factorization-based RecSys algorithms when additional features are used. Almost all algorithms that can include side information have higher performance when using the highest quantity of features. In the other cases, the performance difference is negligible (<1 positive trend for the effect of feature quantity as well as the important effects of feature quality on the evaluated algorithms.


Facilitating Machine Learning Model Comparison and Explanation Through A Radial Visualisation

Building an effective Machine Learning (ML) model for a data set is a di...

Quality of Data in Machine Learning

A common assumption exists according to which machine learning models im...

MDE for Machine Learning-Enabled Software Systems: A Case Study and Comparison of MontiAnna ML-Quadrat

In this paper, we propose to adopt the MDE paradigm for the development ...

Bi-convolution matrix factorization algorithm based on improved ConvMF

With the rapid development of information technology, "information overl...

Improving Recommender Systems Beyond the Algorithm

Recommender systems rely heavily on the predictive accuracy of the learn...

Insect cyborgs: Biological feature generators improve machine learning accuracy on limited data

Despite many successes, machine learning (ML) methods such as neural net...

Predicting IMDb Rating of TV Series with Deep Learning: The Case of Arrow

Context: The number of TV series offered nowadays is very high. Due to i...

Please sign up or login with your details

Forgot password? Click here to reset