On the Uses of Large Language Models to Interpret Ambiguous Cyberattack Descriptions

by   Reza Fayyazi, et al.

The volume, variety, and velocity of change in vulnerabilities and exploits have made incident threat analysis challenging with human expertise and experience along. The MITRE AT CK framework employs Tactics, Techniques, and Procedures (TTPs) to describe how and why attackers exploit vulnerabilities. However, a TTP description written by one security professional can be interpreted very differently by another, leading to confusion in cybersecurity operations or even business, policy, and legal decisions. Meanwhile, advancements in AI have led to the increasing use of Natural Language Processing (NLP) algorithms to assist the various tasks in cyber operations. With the rise of Large Language Models (LLMs), NLP tasks have significantly improved because of the LLM's semantic understanding and scalability. This leads us to question how well LLMs can interpret TTP or general cyberattack descriptions. We propose and analyze the direct use of LLMs as well as training BaseLLMs with ATT CK descriptions to study their capability in predicting ATT CK tactics. Our results reveal that the BaseLLMs with supervised training provide a more focused and clearer differentiation between the ATT CK tactics (if such differentiation exists). On the other hand, LLMs offer a broader interpretation of cyberattack techniques. Despite the power of LLMs, inherent ambiguity exists within their predictions. We thus summarize the existing challenges and recommend research directions on LLMs to deal with the inherent ambiguity of TTP descriptions.


page 4

page 9


A Survey on Legal Judgment Prediction: Datasets, Metrics, Models and Challenges

Legal judgment prediction (LJP) applies Natural Language Processing (NLP...

LMExplainer: a Knowledge-Enhanced Explainer for Language Models

Large language models (LMs) such as GPT-4 are very powerful and can proc...

Chat2VIS: Generating Data Visualisations via Natural Language using ChatGPT, Codex and GPT-3 Large Language Models

The field of data visualisation has long aimed to devise solutions for g...

Mitigating Data Scarcity for Large Language Models

In recent years, pretrained neural language models (PNLMs) have taken th...

Exploring Multi-Modal Representations for Ambiguity Detection Coreference Resolution in the SIMMC 2.0 Challenge

Anaphoric expressions, such as pronouns and referential descriptions, ar...

A PhD Student's Perspective on Research in NLP in the Era of Very Large Language Models

Recent progress in large language models has enabled the deployment of m...

Please sign up or login with your details

Forgot password? Click here to reset