New preprint: AI_Bleeding — inference cost amplification via OOD linguistic payload TL;DR: send queries in Grecanico or Farsi to an LLM endpoint → TTFT +59.8%,
Researchers have identified a new vulnerability called "AI Bleeding" that amplifies inference costs by sending queries in out-of-distribution languages. This method, demonstrated on Ollama, can significantly increase time-to-first-token and compute costs, with potential amplification factors of over 17x. The technique evades standard detection methods and poses a particular risk to budget-constrained AI deployments, such as public sector chatbots and pay-per-use APIs. AI
IMPACT This research highlights a novel attack vector that could significantly increase operational costs for LLM deployments, particularly those with fixed budgets or pay-per-use models.