Attack vs Benign Network Intrusion Traffic Classification
Intrusion detection systems (IDS) are used to monitor networks or systems for attack activity or policy violations. Such a system should be able to successfully identify anomalous deviations from normal traffic behavior. Here we discuss the machine learning approach to building an anomaly-based IDS using the CSE-CIC-IDS2018 dataset. Since the publication of this dataset a relatively large number of papers have been published, most of them presenting IDS architectures and results based on complex machine learning methods, like deep neural networks, gradient boosting classifiers, or hidden Markov models. Here we show that similar results can be obtained using a very simple nearest neighbor classification approach, avoiding the inherent complications of training such complex models. The advantages of the nearest neighbor algorithm are: (1) it is very simple to implement; (2) it is extremely robust; (3) it has no parameters, and therefore it cannot overfit the data. This result also shows that currently there is a trend of developing over-engineered solutions in the machine learning community. Such solutions are based on complex methods, like deep learning neural networks, without even considering baseline solutions corresponding to simple, but efficient methods.
READ FULL TEXT