Making Changes in Webpages Discoverable: A Change-Text Search Interface for Web Archives

by   Lesley Frew, et al.

Webpages change over time, and web archives hold copies of historical versions of webpages. Users of web archives, such as journalists, want to find and view changes on webpages over time. However, the current search interfaces for web archives do not support this task. For the web archives that include a full-text search feature, multiple versions of the same webpage that match the search query are shown individually without enumerating changes, or are grouped together in a way that hides changes. We present a change text search engine that allows users to find changes in webpages. We describe the implementation of the search engine backend and frontend, including a tool that allows users to view the changes between two webpage versions in context as an animation. We evaluate the search engine with U.S. federal environmental webpages that changed between 2016 and 2020. The change text search results page can clearly show when terms and phrases were added or removed from webpages. The inverted index can also be queried to identify salient and frequently deleted terms in a corpus.


page 2

page 11

page 12

page 14


Random Forest Classifier based Scheduler Optimization for Search Engine Web Crawlers

The backbone of every search engine is the set of web crawlers, which go...

Adaptive technique for web page change detection using multi-threaded crawlers

World Wide Web is getting dense as many new web pages and resources are ...

An Alternate Approach for Designing a Domain Specific Image Search Prototype Using Histogram

Everyone knows that thousand of words are represented by a single image....

Accessibility or Usability of InteractSE? A Heuristic Based Approach to Evaluate Proposed Search Engine for the Visually Impaired Users

Internet is the main source of information nowadays. The search engines ...

The ROOTS Search Tool: Data Transparency for LLMs

ROOTS is a 1.6TB multilingual text corpus developed for the training of ...

Release Early, Release Often: Predicting Change in Versioned Knowledge Organization Systems on the Web

The Semantic Web is built on top of Knowledge Organization Systems (KOS)...

Abstractive Snippet Generation

An abstractive snippet is an originally created piece of text to summari...

Please sign up or login with your details

Forgot password? Click here to reset