Model Leeching: An Extraction Attack Targeting LLMs

09/19/2023
by   Lewis Birch, et al.
0

Model Leeching is a novel extraction attack targeting Large Language Models (LLMs), capable of distilling task-specific knowledge from a target LLM into a reduced parameter model. We demonstrate the effectiveness of our attack by extracting task capability from ChatGPT-3.5-Turbo, achieving 73 (EM) similarity, and SQuAD EM and F1 accuracy scores of 75 respectively for only 50 in API cost. We further demonstrate the feasibility of adversarial attack transferability from an extracted model extracted via Model Leeching to perform ML attack staging against a target LLM, resulting in an 11

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset