Making Changes in Webpages Discoverable: A Change-Text Search Interface for Web Archives

04/30/2023
by   Lesley Frew, et al.
0

Webpages change over time, and web archives hold copies of historical versions of webpages. Users of web archives, such as journalists, want to find and view changes on webpages over time. However, the current search interfaces for web archives do not support this task. For the web archives that include a full-text search feature, multiple versions of the same webpage that match the search query are shown individually without enumerating changes, or are grouped together in a way that hides changes. We present a change text search engine that allows users to find changes in webpages. We describe the implementation of the search engine backend and frontend, including a tool that allows users to view the changes between two webpage versions in context as an animation. We evaluate the search engine with U.S. federal environmental webpages that changed between 2016 and 2020. The change text search results page can clearly show when terms and phrases were added or removed from webpages. The inverted index can also be queried to identify salient and frequently deleted terms in a corpus.

READ FULL TEXT

page 2

page 11

page 12

page 14

research
03/06/2022

Random Forest Classifier based Scheduler Optimization for Search Engine Web Crawlers

The backbone of every search engine is the set of web crawlers, which go...
research
03/06/2022

Adaptive technique for web page change detection using multi-threaded crawlers

World Wide Web is getting dense as many new web pages and resources are ...
research
11/28/2013

An Alternate Approach for Designing a Domain Specific Image Search Prototype Using Histogram

Everyone knows that thousand of words are represented by a single image....
research
08/29/2018

Accessibility or Usability of InteractSE? A Heuristic Based Approach to Evaluate Proposed Search Engine for the Visually Impaired Users

Internet is the main source of information nowadays. The search engines ...
research
02/27/2023

The ROOTS Search Tool: Data Transparency for LLMs

ROOTS is a 1.6TB multilingual text corpus developed for the training of ...
research
05/12/2015

Release Early, Release Often: Predicting Change in Versioned Knowledge Organization Systems on the Web

The Semantic Web is built on top of Knowledge Organization Systems (KOS)...
research
02/25/2020

Abstractive Snippet Generation

An abstractive snippet is an originally created piece of text to summari...

Please sign up or login with your details

Forgot password? Click here to reset