JABBERWOCK: A Tool for WebAssembly Dataset Generation and Its Application to Malicious Website Detection

by   Chika Komiya, et al.

Machine learning is often used for malicious website detection, but an approach incorporating WebAssembly as a feature has not been explored due to a limited number of samples, to the best of our knowledge. In this paper, we propose JABBERWOCK (JAvascript-Based Binary EncodeR by WebAssembly Optimization paCKer), a tool to generate WebAssembly datasets in a pseudo fashion via JavaScript. Loosely speaking, JABBERWOCK automatically gathers JavaScript code in the real world, convert them into WebAssembly, and then outputs vectors of the WebAssembly as samples for malicious website detection. We also conduct experimental evaluations of JABBERWOCK in terms of the processing time for dataset generation, comparison of the generated samples with actual WebAssembly samples gathered from the Internet, and an application for malicious website detection. Regarding the processing time, we show that JABBERWOCK can construct a dataset in 4.5 seconds per sample for any number of samples. Next, comparing 10,000 samples output by JABBERWOCK with 168 gathered WebAssembly samples, we believe that the generated samples by JABBERWOCK are similar to those in the real world. We then show that JABBERWOCK can provide malicious website detection with 99% F1-score because JABBERWOCK makes a gap between benign and malicious samples as the reason for the above high score. We also confirm that JABBERWOCK can be combined with an existing malicious website detection tool to improve F1-scores. JABBERWOCK is publicly available via GitHub (https://github.com/c-chocolate/Jabberwock).


page 1

page 8


Low-Quality Training Data Only? A Robust Framework for Detecting Encrypted Malicious Network Traffic

Machine learning (ML) is promising in accurately detecting malicious flo...

Coswara: A website application enabling COVID-19 screening by analysing respiratory sound samples and health symptoms

The COVID-19 pandemic has accelerated research on design of alternative,...

PLOD: An Abbreviation Detection Dataset for Scientific Documents

The detection and extraction of abbreviations from unstructured texts ca...

Feature Engineering Using File Layout for Malware Detection

Malware detection on binary executables provides a high availability to ...

A Review of Data-driven Approaches for Malicious Website Detection

The detection of malicious websites has become a critical issue in cyber...

Detecting unknown HTTP-based malicious communication behavior via generated adversarial flows and hierarchical traffic features

Malicious communication behavior is the network communication behavior g...

The GANfather: Controllable generation of malicious activity to improve defence systems

Machine learning methods to aid defence systems in detecting malicious a...

Please sign up or login with your details

Forgot password? Click here to reset