Field-wise Learning for Multi-field Categorical Data

12/01/2020
by   Zhibin Li, et al.
0

We propose a new method for learning with multi-field categorical data. Multi-field categorical data are usually collected over many heterogeneous groups. These groups can reflect in the categories under a field. The existing methods try to learn a universal model that fits all data, which is challenging and inevitably results in learning a complex model. In contrast, we propose a field-wise learning method leveraging the natural structure of data to learn simple yet efficient one-to-one field-focused models with appropriate constraints. In doing this, the models can be fitted to each category and thus can better capture the underlying differences in data. We present a model that utilizes linear models with variance and low-rank constraints, to help it generalize better and reduce the number of parameters. The model is also interpretable in a field-wise manner. As the dimensionality of multi-field categorical data can be very high, the models applied to such data are mostly over-parameterized. Our theoretical analysis can potentially explain the effect of over-parametrization on the generalization of our model. It also supports the variance constraints in the learning objective. The experiment results on two large-scale datasets show the superior performance of our model, the trend of the generalization error bound, and the interpretability of learning outcomes. Our code is available at https://github.com/lzb5600/Field-wise-Learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/02/2023

Diffusion Probabilistic Models for Scene-Scale 3D Categorical Data

In this paper, we learn a diffusion model to generate 3D data on a scene...
research
02/26/2020

Supervised Categorical Metric Learning with Schatten p-Norms

Metric learning has been successful in learning new metrics adapted to n...
research
10/13/2022

Improving Out-of-Distribution Generalization by Adversarial Training with Structured Priors

Deep models often fail to generalize well in test domains when the data ...
research
10/06/2020

Gaussian Process Models with Low-Rank Correlation Matrices for Both Continuous and Categorical Inputs

We introduce a method that uses low-rank approximations of cross-correla...
research
09/13/2019

Fast Low-rank Metric Learning for Large-scale and High-dimensional Data

Low-rank metric learning aims to learn better discrimination of data sub...
research
02/12/2018

Augment and Reduce: Stochastic Inference for Large Categorical Distributions

Categorical distributions are ubiquitous in machine learning, e.g., in c...
research
11/12/2019

FLEN: Leveraging Field for Scalable CTR Prediction

Click-Through Rate (CTR) prediction has been an indispensable component ...

Please sign up or login with your details

Forgot password? Click here to reset