Shell Language Processing: Unix command parsing for Machine Learning

07/06/2021
by   Dmitrijs Trizna, et al.
0

In this article, we present a Shell Language Preprocessing (SLP) library, which implements tokenization and encoding directed on the parsing of Unix and Linux shell commands. We describe the rationale behind the need for a new approach with specific examples when conventional Natural Language Processing (NLP) pipelines fail. Furthermore, we evaluate our methodology on a security classification task against widely accepted information and communications technology (ICT) tokenization techniques and achieve significant improvement of an F1-score from 0.392 to 0.874.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset