Brief

last 24h

[3/3] 221 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · arXiv cs.LG English(EN) · 4d

Holder Policy Optimisation

Researchers have introduced HölderPO, a novel framework for optimizing large language models by unifying token-level probability aggregation through the Hölder mean. This approach offers continuous control over the trade-off between gradient concentration and variance, addressing limitations of fixed aggregation mechanisms that can lead to training collapse or suboptimal performance. A dynamic annealing algorithm is employed to schedule the Hölder mean parameter across the training lifecycle, demonstrating superior stability and convergence. Extensive evaluations show HölderPO achieving state-of-the-art accuracy on mathematical benchmarks and a high success rate on ALFWorld. AI

IMPACT Introduces a new optimization framework that improves LLM stability and performance on mathematical and reasoning tasks.
- ALFWorld
- GRPO
- Yuxiang Chen
- HölderPO
RESEARCH · arXiv cs.CL English(EN) · 6d · [2 sources]

Training Language Agents to Learn from Experience

Researchers have developed a new framework called In-context Training (ICT) to enable language agents to learn and improve from past experiences across different tasks. This approach trains a "reflector" model to generate system prompts that enhance an "actor" model's performance on future, unseen tasks. Experiments in ALFWorld and MiniHack demonstrated that agents trained with this method showed improved performance on various task families, with some even generalizing to entirely new environments. AI

IMPACT Introduces a method for agents to generalize learning across tasks, potentially improving adaptability and efficiency in complex AI systems.
RESEARCH · Qwen tech blog English(EN) · 10mo · [142 sources]

Qwen3.6-35B-A3B: Agentic Coding Power, Now Open to All

Researchers are developing new benchmarks and methods to evaluate and improve the memory capabilities of AI agents. These efforts address limitations in current systems, which struggle with long-term recall, interference between memories, and reasoning over complex, evolving information. New benchmarks like LongMINT, EvoMemBench, and SocialMemBench are being introduced to test agents in more realistic scenarios, including social settings and multimodal data. Additionally, novel memory architectures such as FORGE, RecMem, DimMem, H-Mem, and MeMo are being proposed to enhance efficiency, reduce token costs, and prevent catastrophic forgetting. AI

IMPACT Advances in agent memory systems are crucial for developing more capable and reliable AI assistants across diverse applications.
- LatentRAG
- Qwen3-Reranker
- AgenticRAG
- BeliefMem
- MemReranker
- ALFWorld
- Gemini-3-Flash
- GPT-4o-mini
- LLM
- BRIGHT
- SIRA
- MemReread
- InterLV-Search
- SuperIntelligent Retrieval Agent (SIRA)
- AI agents
- Gemini 2.5 Flash
- Grok-4-Fast
- Llama-4-Maverick
- Qwen3-235B
- MeMo
- H-Mem
- EvoMemBench
- DimMem
- SocialMemBench
- LongMINT
- RecMem

Brief

Holder Policy Optimisation

Training Language Agents to Learn from Experience

Qwen3.6-35B-A3B: Agentic Coding Power, Now Open to All