OA-Mine: Open-World Attribute Mining for E-Commerce Products with Weak Supervision

04/29/2022
by   Xinyang Zhang, et al.
9

Automatic extraction of product attributes from their textual descriptions is essential for online shopper experience. One inherent challenge of this task is the emerging nature of e-commerce products – we see new types of products with their unique set of new attributes constantly. Most prior works on this matter mine new values for a set of known attributes but cannot handle new attributes that arose from constantly changing data. In this work, we study the attribute mining problem in an open-world setting to extract novel attributes and their values. Instead of providing comprehensive training data, the user only needs to provide a few examples for a few known attribute types as weak supervision. We propose a principled framework that first generates attribute value candidates and then groups them into clusters of attributes. The candidate generation step probes a pre-trained language model to extract phrases from product titles. Then, an attribute-aware fine-tuning method optimizes a multitask objective and shapes the language model representation to be attribute-discriminative. Finally, we discover new attributes and values through the self-ensemble of our framework, which handles the open-world challenge. We run extensive experiments on a large distantly annotated development set and a gold standard human-annotated test set that we collected. Our model significantly outperforms strong baselines and can generalize to unseen attributes and product types.

READ FULL TEXT
research
05/26/2023

Towards Open-World Product Attribute Mining: A Lightly-Supervised Approach

We present a new task setting for attribute mining on e-commerce product...
research
06/23/2023

Product Information Extraction using ChatGPT

Structured product data in the form of attribute/value pairs is the foun...
research
06/09/2023

A Unified Generative Approach to Product Attribute-Value Identification

Product attribute-value identification (PAVI) has been studied to link p...
research
09/12/2023

SAGE: Structured Attribute Value Generation for Billion-Scale Product Catalogs

We introduce SAGE; a Generative LLM for inferring attribute values for p...
research
06/01/2018

OpenTag: Open Attribute Value Extraction from Product Profiles

Extraction of missing attribute values is to find values describing an a...
research
06/28/2022

Adaptive Multi-view Rule Discovery for Weakly-Supervised Compatible Products Prediction

On e-commerce platforms, predicting if two products are compatible with ...
research
04/19/2021

LaTeX-Numeric: Language-agnostic Text attribute eXtraction for E-commerce Numeric Attributes

In this paper, we present LaTeX-Numeric - a high-precision fully-automat...

Please sign up or login with your details

Forgot password? Click here to reset