PulseAugur / Brief
EN
LIVE 22:22:20

Brief

last 24h
[1/1] 221 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. Holder Policy Optimisation

    Researchers have introduced HölderPO, a novel framework for optimizing large language models by unifying token-level probability aggregation through the Hölder mean. This approach offers continuous control over the trade-off between gradient concentration and variance, addressing limitations of fixed aggregation mechanisms that can lead to training collapse or suboptimal performance. A dynamic annealing algorithm is employed to schedule the Hölder mean parameter across the training lifecycle, demonstrating superior stability and convergence. Extensive evaluations show HölderPO achieving state-of-the-art accuracy on mathematical benchmarks and a high success rate on ALFWorld. AI

    IMPACT Introduces a new optimization framework that improves LLM stability and performance on mathematical and reasoning tasks.