An Embarrassingly Simple Detector for Model Extraction Attacks in Large Language Model API Traffic
Two new research papers highlight vulnerabilities in current defenses against AI model extraction attacks. One paper proposes a simple yet effective detector that analyzes traffic window distributions to identify deviations from normal API usage, achieving high detection rates with low false positives. The second paper demonstrates that existing defenses, which often assume single-client attacks, can be bypassed by coordinated, multi-client strategies, rendering them ineffective against sophisticated adversaries. AI
IMPACT Highlights critical security gaps in LLM deployment, necessitating new defense architectures beyond single-client assumptions.