Learning from Limited Heterogeneous Training Data: Meta-Learning for Unsupervised Zero-Day Web Attack Detection across Web Domains

by   Peiyang Li, et al.
Tsinghua University

Recently unsupervised machine learning based systems have been developed to detect zero-day Web attacks, which can effectively enhance existing Web Application Firewalls (WAFs). However, prior arts only consider detecting attacks on specific domains by training particular detection models for the domains. These systems require a large amount of training data, which causes a long period of time for model training and deployment. In this paper, we propose RETSINA, a novel meta-learning based framework that enables zero-day Web attack detection across different domains in an organization with limited training data. Specifically, it utilizes meta-learning to share knowledge across these domains, e.g., the relationship between HTTP requests in heterogeneous domains, to efficiently train detection models. Moreover, we develop an adaptive preprocessing module to facilitate semantic analysis of Web requests across different domains and design a multi-domain representation method to capture semantic correlations between different domains for cross-domain model training. We conduct experiments using four real-world datasets on different domains with a total of 293M Web requests. The experimental results demonstrate that RETSINA outperforms the existing unsupervised Web attack detection methods with limited training data, e.g., RETSINA needs only 5-minute training data to achieve comparable detection performance to the existing methods that train separate models for different domains using 1-day training data. We also conduct real-world deployment in an Internet company. RETSINA captures on average 126 and 218 zero-day attack requests per day in two domains, respectively, in one month.


Meta-Learning for Low-Resource Unsupervised Neural MachineTranslation

Unsupervised machine translation, which utilizes unpaired monolingual co...

Discriminative Adversarial Domain Generalization with Meta-learning based Cross-domain Validation

The generalization capability of machine learning models, which refers t...

Few Shot Dialogue State Tracking using Meta-learning

Dialogue State Tracking (DST) forms a core component of automated chatbo...

From Zero-Shot Machine Learning to Zero-Day Attack Detection

The standard ML methodology assumes that the test samples are derived fr...

A Student-Teacher Architecture for Dialog Domain Adaptation under the Meta-Learning Setting

Numerous new dialog domains are being created every day while collecting...

Programmable Neural Network Trojan for Pre-Trained Feature Extractor

Neural network (NN) trojaning attack is an emerging and important attack...

A Study of Newly Observed Hostnames and DNS Tunneling in the Wild

The domain name system (DNS) is a crucial backbone of the Internet and m...

Please sign up or login with your details

Forgot password? Click here to reset