New 'AI Bleeding' attack amplifies LLM inference costs via OOD languages

By PulseAugur Editorial · [1 sources] · 2026-06-02 10:21

Researchers have identified a new vulnerability called "AI Bleeding" that amplifies inference costs by sending queries in out-of-distribution languages. This method, demonstrated on Ollama, can significantly increase time-to-first-token and compute costs, with potential amplification factors of over 17x. The technique evades standard detection methods and poses a particular risk to budget-constrained AI deployments, such as public sector chatbots and pay-per-use APIs. AI

IMPACT This research highlights a novel attack vector that could significantly increase operational costs for LLM deployments, particularly those with fixed budgets or pay-per-use models.

RANK_REASON The cluster describes a new research paper detailing a novel vulnerability and its technical implications. [lever_c_demoted from research: ic=1 ai=1.0]

Read on Mastodon — mastodon.social →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

Mastodon — mastodon.social TIER_1 English(EN) · [email protected] · 2026-06-02 10:21

New preprint: AI_Bleeding — inference cost amplification via OOD linguistic payload TL;DR: send queries in Grecanico or Farsi to an LLM endpoint → TTFT +59.8%,

New preprint: AI_Bleeding — inference cost amplification via OOD linguistic payload TL;DR: send queries in Grecanico or Farsi to an LLM endpoint → TTFT +59.8%, compute cost +2.8%, statistically significant. No vuln, no volumetric signature, evades all standard detection. Worst ca…

COVERAGE [1]

New preprint: AI_Bleeding — inference cost amplification via OOD linguistic payload TL;DR: send queries in Grecanico or Farsi to an LLM endpoint → TTFT +59.8%,

RELATED ENTITIES

RELATED TOPICS