ALOHA: Auxiliary Loss Optimization for Hypothesis Augmentation

03/13/2019
by   Ethan M. Rudd, et al.
0

Malware detection is a popular application of Machine Learning for Information Security (ML-Sec), in which an ML classifier is trained to predict whether a given file is malware or benignware. Parameters of this classifier are typically optimized such that outputs from the model over a set of input samples most closely match the samples' true malicious/benign (1/0) target labels. However, there are often a number of other sources of contextual metadata for each malware sample, beyond an aggregate malicious/benign label, including multiple labeling sources and malware type information (e.g., ransomware, trojan, etc.), which we can feed to the classifier as auxiliary prediction targets. In this work, we fit deep neural networks to multiple additional targets derived from metadata in a threat intelligence feed for Portable Executable (PE) malware and benignware, including a multi-source malicious/benign loss, a count loss on multi-source detections, and a semantic malware attribute tag loss. We find that incorporating multiple auxiliary loss terms yields a marked improvement in performance on the main detection task. We also demonstrate that these gains likely stem from a more informed neural network representation and are not due to a regularization artifact of multi-target learning. Our auxiliary loss architecture yields a significant reduction in detection error rate (false negatives) of 42.6 positive rate (FPR) of 10^-3 when compared to a similar model with only one target, and a decrease of 53.8

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/16/2019

Learning from Context: Exploiting and Interpreting File Path Information for Better Malware Detection

Machine learning (ML) used for static portable executable (PE) malware d...
research
12/14/2020

SOREL-20M: A Large Scale Benchmark Dataset for Malicious PE Detection

In this paper we describe the SOREL-20M (Sophos/ReversingLabs-20 Million...
research
05/15/2019

SMART: Semantic Malware Attribute Relevance Tagging

With the rapid proliferation and increased sophistication of malicious s...
research
10/28/2022

A Deep Dive into VirusTotal: Characterizing and Clustering a Massive File Feed

Online scanners analyze user-submitted files with a large number of secu...
research
06/15/2021

Evading Malware Classifiers via Monte Carlo Mutant Feature Discovery

The use of Machine Learning has become a significant part of malware det...
research
12/05/2022

Efficient Malware Analysis Using Metric Embeddings

In this paper, we explore the use of metric learning to embed Windows PE...
research
08/20/2022

Quo Vadis: Hybrid Machine Learning Meta-Model based on Contextual and Behavioral Malware Representations

We propose a hybrid machine learning architecture that simultaneously em...

Please sign up or login with your details

Forgot password? Click here to reset