What are the attackers doing now? Automating cyber threat intelligence extraction from text on pace with the changing threat landscape: A survey

09/14/2021
by   Md. Rayhanur Rahman, et al.
0

Cybersecurity researchers have contributed to the automated extraction of CTI from textual sources, such as threat reports and online articles, where cyberattack strategies, procedures, and tools are described. The goal of this article is to aid cybersecurity researchers understand the current techniques used for cyberthreat intelligence extraction from text through a survey of relevant studies in the literature. We systematically collect "CTI extraction from text"-related studies from the literature and categorize the CTI extraction purposes. We propose a CTI extraction pipeline abstracted from these studies. We identify the data sources, techniques, and CTI sharing formats utilized in the context of the proposed pipeline. Our work finds ten types of extraction purposes, such as extraction indicators of compromise extraction, TTPs (tactics, techniques, procedures of attack), and cybersecurity keywords. We also identify seven types of textual sources for CTI extraction, and textual data obtained from hacker forums, threat reports, social media posts, and online news articles have been used by almost 90 language processing along with both supervised and unsupervised machine learning techniques such as named entity recognition, topic modelling, dependency parsing, supervised classification, and clustering are used for CTI extraction. We observe the technical challenges associated with these studies related to obtaining available clean, labelled data which could assure replication, validation, and further extension of the studies. As we find the studies focusing on CTI information extraction from text, we advocate for building upon the current CTI extraction work to help cybersecurity practitioners with proactive decision making such as threat prioritization, automated threat modelling to utilize knowledge from past cybersecurity incidents.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/05/2022

From Threat Reports to Continuous Threat Intelligence: A Comparison of Attack Technique Extraction Methods from Textual Artifacts

The cyberthreat landscape is continuously evolving. Hence, continuous mo...
research
07/18/2022

Towards Automated Classification of Attackers' TTPs by combining NLP with ML Techniques

The increasingly sophisticated and growing number of threat actors along...
research
07/29/2022

GoodFATR: A Platform for Automated Threat Report Collection and IOC Extraction

To adapt to a constantly evolving landscape of cyber threats, organizati...
research
01/05/2022

Monitoring Energy Trends through Automatic Information Extraction

Energy research is of crucial public importance but the use of computer ...
research
11/19/2022

AiCEF: An AI-assisted Cyber Exercise Content Generation Framework Using Named Entity Recognition

Content generation that is both relevant and up to date with the current...
research
03/17/2023

STIXnet: A Novel and Modular Solution for Extracting All STIX Objects in CTI Reports

The automatic extraction of information from Cyber Threat Intelligence (...
research
01/03/2018

Social Media Analysis based on Semanticity of Streaming and Batch Data

Languages shared by people differ in different regions based on their ac...

Please sign up or login with your details

Forgot password? Click here to reset