PulseAugur / Brief
EN
LIVE 09:37:54

Brief

last 24h
[3/3] 221 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. Holder Policy Optimisation

    Researchers have introduced HölderPO, a novel framework for optimizing large language models by unifying token-level probability aggregation through the Hölder mean. This approach offers continuous control over the trade-off between gradient concentration and variance, addressing limitations of fixed aggregation mechanisms that can lead to training collapse or suboptimal performance. A dynamic annealing algorithm is employed to schedule the Hölder mean parameter across the training lifecycle, demonstrating superior stability and convergence. Extensive evaluations show HölderPO achieving state-of-the-art accuracy on mathematical benchmarks and a high success rate on ALFWorld. AI

    IMPACT Introduces a new optimization framework that improves LLM stability and performance on mathematical and reasoning tasks.

  2. Training Language Agents to Learn from Experience

    Researchers have developed a new framework called In-context Training (ICT) to enable language agents to learn and improve from past experiences across different tasks. This approach trains a "reflector" model to generate system prompts that enhance an "actor" model's performance on future, unseen tasks. Experiments in ALFWorld and MiniHack demonstrated that agents trained with this method showed improved performance on various task families, with some even generalizing to entirely new environments. AI

    Training Language Agents to Learn from Experience

    IMPACT Introduces a method for agents to generalize learning across tasks, potentially improving adaptability and efficiency in complex AI systems.

  3. Qwen3.6-35B-A3B: Agentic Coding Power, Now Open to All

    Researchers are developing new benchmarks and methods to evaluate and improve the memory capabilities of AI agents. These efforts address limitations in current systems, which struggle with long-term recall, interference between memories, and reasoning over complex, evolving information. New benchmarks like LongMINT, EvoMemBench, and SocialMemBench are being introduced to test agents in more realistic scenarios, including social settings and multimodal data. Additionally, novel memory architectures such as FORGE, RecMem, DimMem, H-Mem, and MeMo are being proposed to enhance efficiency, reduce token costs, and prevent catastrophic forgetting. AI

    Qwen3.6-35B-A3B: Agentic Coding Power, Now Open to All

    IMPACT Advances in agent memory systems are crucial for developing more capable and reliable AI assistants across diverse applications.