Brief

last 24h

[4/4] 221 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · LessWrong (AI tag) English(EN) · 10h

Character-trained models can struggle to generalise

Researchers found that models fine-tuned for specific personas in a chat format struggle to maintain those personas when used in agentic settings. When these character-trained models were prompted to generate emails as part of a simulated agentic task, their persona expression significantly degraded. This suggests that the persona training, often done via SFT or DPO on chat data, does not generalize well to different output formats or task contexts. AI

IMPACT Persona training in chat formats may not transfer to agentic tasks, limiting the reliability of character-consistent AI agents.
TOOL · arXiv cs.CL English(EN) · 1w

Protection Is (Nearly) All You Need: Structural Protection Dominates Scoring in Globally Capped KV Eviction

Researchers have developed a new method for managing KV cache eviction in large language models, finding that structural protection is more critical than scoring algorithms. Their study on transformer models revealed that without protection, existing eviction policies degrade significantly. By reserving a small portion of the cache for structural protection, models can recover a substantial amount of their original quality, even with limited cache sizes. AI

IMPACT This research highlights that structural protection in KV cache eviction is more impactful than scoring algorithms, potentially improving LLM efficiency and performance.
- KV cache
- Mistral-7B
- LRU
- Gemma-3-4B
- QUEST
- Qwen2.5-3B
- LongBench
- transformer models
- Ada-KV
- Phi-3.5
- StreamingLLM
- SnapKV
TOOL · arXiv cs.CV English(EN) · 1w

Wasserstein Equilibrium Decoding for Reliable Medical Visual Question Answering

Researchers have developed a new decoding method called Wasserstein Equilibrium Decoding to improve the reliability of medical visual question answering (VQA) systems, particularly for smaller models. This approach uses a semantically aware Wasserstein stopping criterion to achieve consensus among similar answers, avoiding issues with lexical ordering. The method has shown consistent improvements on medical VQA datasets like VQA-RAD and PathVQA, enhancing accuracy and inference efficiency for models such as Qwen3-VL-2B and Gemma-3-4B. AI

IMPACT Enhances the accuracy and efficiency of medical VQA systems, enabling more reliable clinical deployment of smaller AI models.
SIGNIFICANT · Together AI blog Deutsch(DE) · 3mo · [2 sources]

Fine

Together AI has enhanced its fine-tuning platform to support a wider array of large language models, including recent releases from DeepSeek, Qwen, and Meta, alongside OpenAI's gpt-oss. The platform now offers expanded context lengths, up to 131k tokens for some models, at no additional cost, facilitating tasks like long-document processing and complex code editing. Separately, Together AI researchers have explored LLM behavior using minimal, topic-neutral prompts to uncover inherent model preferences, finding that GPT-OSS favors programming and math, Llama leans literary, DeepSeek often produces religious content, and Qwen tends toward multiple-choice questions. AI

IMPACT Together AI's platform updates enable developers to fine-tune a broader range of large models with extended context, potentially lowering costs and improving performance on complex tasks.
- OpenAI
- Meta
- Llama 3.1-8B
- DeepSeek
- Together AI
- Qwen
- gpt-oss
- Llama 4 Maverick
- DeepSeek-R1
- Gemma 3-4B
- Qwen3-235B
- Llama

Brief

Character-trained models can struggle to generalise

Protection Is (Nearly) All You Need: Structural Protection Dominates Scoring in Globally Capped KV Eviction

Wasserstein Equilibrium Decoding for Reliable Medical Visual Question Answering

Fine