Actions speak louder than words: Semi-supervised learning for browser fingerprinting detection

03/09/2020
by   Sarah Bird, et al.
0

As online tracking continues to grow, existing anti-tracking and fingerprinting detection techniques that require significant manual input must be augmented. Heuristic approaches to fingerprinting detection are precise but must be carefully curated. Supervised machine learning techniques proposed for detecting tracking require manually generated label-sets. Seeking to overcome these challenges, we present a semi-supervised machine learning approach for detecting fingerprinting scripts. Our approach is based on the core insight that fingerprinting scripts have similar patterns of API access when generating their fingerprints, even though their access patterns may not match exactly. Using this insight, we group scripts by their JavaScript (JS) execution traces and apply a semi-supervised approach to detect new fingerprinting scripts. We detail our methodology and demonstrate its ability to identify the majority of scripts (≥94.9 show that the approach expands beyond detecting known scripts by surfacing candidate scripts that are likely to include fingerprinting. Through an analysis of these candidate scripts we discovered fingerprinting scripts that were missed by heuristics and for which there are no heuristics. In particular, we identified over one hundred device-class fingerprinting scripts present on hundreds of domains. To the best of our knowledge, this is the first time device-class fingerprinting has been measured in the wild. These successes illustrate the power of a sparse vector representation and semi-supervised learning to complement and extend existing tracking detection techniques.

READ FULL TEXT
research
01/27/2023

Semi-Supervised Machine Learning: a Homological Approach

In this paper we describe the mathematical foundations of a new approach...
research
08/29/2019

Solve fraud detection problem by using graph based learning methods

The credit cards' fraud transactions detection is the important problem ...
research
04/24/2018

Semi-Supervised Learning with Declaratively Specified Entropy Constraints

We propose a technique for declaratively specifying strategies for semi-...
research
05/16/2023

Semi-Supervised Object Detection for Sorghum Panicles in UAV Imagery

The sorghum panicle is an important trait related to grain yield and pla...
research
03/24/2019

Exploiting Synthetically Generated Data with Semi-Supervised Learning for Small and Imbalanced Datasets

Data augmentation is rapidly gaining attention in machine learning. Synt...
research
01/01/2023

Trojaning semi-supervised learning model via poisoning wild images on the web

Wild images on the web are vulnerable to backdoor (also called trojan) p...

Please sign up or login with your details

Forgot password? Click here to reset