Model Leeching: An Extraction Attack Targeting LLMs

09/19/2023

∙

Model Leeching is a novel extraction attack targeting Large Language Models (LLMs), capable of distilling task-specific knowledge from a target LLM into a reduced parameter model. We demonstrate the effectiveness of our attack by extracting task capability from ChatGPT-3.5-Turbo, achieving 73 (EM) similarity, and SQuAD EM and F1 accuracy scores of 75 respectively for only 50 in API cost. We further demonstrate the feasibility of adversarial attack transferability from an extracted model extracted via Model Leeching to perform ML attack staging against a target LLM, resulting in an 11

READ FULL TEXT

Model Leeching: An Extraction Attack Targeting LLMs

Sign in with Google

Consider DeepAI Pro