Brief · PulseAugur

TOOL · Mastodon — mastodon.social English(EN) · 2h

New preprint: AI_Bleeding — inference cost amplification via OOD linguistic payload TL;DR: send queries in Grecanico or Farsi to an LLM endpoint → TTFT +59.8%,

Researchers have identified a new vulnerability called "AI Bleeding" that amplifies inference costs by sending queries in out-of-distribution languages. This method, demonstrated on Ollama, can significantly increase time-to-first-token and compute costs, with potential amplification factors of over 17x. The technique evades standard detection methods and poses a particular risk to budget-constrained AI deployments, such as public sector chatbots and pay-per-use APIs. AI

IMPACT This research highlights a novel attack vector that could significantly increase operational costs for LLM deployments, particularly those with fixed budgets or pay-per-use models.

Ollama
Farsi
AI Bleeding