Learning Visual Features from Snapshots for Web Search

10/19/2017
by   Yixing Fan, et al.
0

When applying learning to rank algorithms to Web search, a large number of features are usually designed to capture the relevance signals. Most of these features are computed based on the extracted textual elements, link analysis, and user logs. However, Web pages are not solely linked texts, but have structured layout organizing a large variety of elements in different styles. Such layout itself can convey useful visual information, indicating the relevance of a Web page. For example, the query-independent layout (i.e., raw page layout) can help identify the page quality, while the query-dependent layout (i.e., page rendered with matched query words) can further tell rich structural information (e.g., size, position and proximity) of the matching signals. However, such visual information of layout has been seldom utilized in Web search in the past. In this work, we propose to learn rich visual features automatically from the layout of Web pages (i.e., Web page snapshots) for relevance ranking. Both query-independent and query-dependent snapshots are considered as the new inputs. We then propose a novel visual perception model inspired by human's visual search behaviors on page viewing to extract the visual features. This model can be learned end-to-end together with traditional human-crafted features. We also show that such visual features can be efficiently acquired in the online setting with an extended inverted indexing scheme. Experiments on benchmark collections demonstrate that learning visual features from Web page snapshots can significantly improve the performance of relevance ranking in ad-hoc Web retrieval tasks.

READ FULL TEXT
research
08/17/2022

TangibleGrid: Tangible Web Layout Design for Blind Users

We present TangibleGrid, a novel device that allows blind users to under...
research
03/07/2019

ViTOR: Learning to Rank Webpages Based on Visual Features

The visual appearance of a webpage carries valuable information about it...
research
10/22/2018

An Efficient Bandit Algorithm for Realtime Multivariate Optimization

Optimization is commonly employed to determine the content of web pages,...
research
04/27/2018

An Element Sensitive Saliency Model with Position Prior Learning for Web Pages

Understanding human visual attention is important for multimedia applica...
research
06/14/2016

Using Fuzzy Logic to Leverage HTML Markup for Web Page Representation

The selection of a suitable document representation approach plays a cru...
research
02/14/2018

Web-Scale Responsive Visual Search at Bing

In this paper, we introduce a web-scale general visual search system dep...
research
09/03/2021

Navigating the Mise-en-Page: Interpretive Machine Learning Approaches to the Visual Layouts of Multi-Ethnic Periodicals

This paper presents a computational method of analysis that draws from m...

Please sign up or login with your details

Forgot password? Click here to reset