PulseAugur / Brief
LIVE 18:08:00

Brief

last 24h
[50/158] 186 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. Dan McAteer (@daniel_mac8) claims that a general-purpose reasoning model, not a math-specific system, has created new proofs, emphasizing that AI can indeed generate new knowledge. This sparks anticipation for next-generation reasoning capabilities at the GPT-6 level. https://x

    A general-purpose AI reasoning model has reportedly generated novel mathematical proofs, suggesting AI's capability to create new knowledge beyond specialized systems. This development sparks anticipation for next-generation AI reasoning, potentially on par with future models like GPT-6. The claim highlights AI's emerging ability to produce original insights in complex domains. AI

    Dan McAteer (@daniel_mac8) claims that a general-purpose reasoning model, not a math-specific system, has created new proofs, emphasizing that AI can indeed generate new knowledge. This sparks anticipation for next-generation reasoning capabilities at the GPT-6 level. https://x

    IMPACT Demonstrates AI's potential for genuine knowledge creation, moving beyond pattern recognition to novel discovery.

  2. vLLM V0 to V1: Correctness Before Reinforcement Learning https:// huggingface.co/blog/ServiceNow -AI/correctness-before-corrections ※AI-generated auto-post (headline + link) # AI # GenerativeAI # LLM # AIGenerated

    A blog post details the transition of vLLM from version 0 to version 1, focusing on its accuracy before reinforcement learning corrections. The post highlights the model's performance and improvements in this area. AI

    vLLM V0 to V1: Correctness Before Reinforcement Learning https:// huggingface.co/blog/ServiceNow -AI/correctness-before-corrections ※AI-generated auto-post (headline + link) # AI # GenerativeAI # LLM # AIGenerated

    IMPACT Details advancements in vLLM's accuracy, potentially influencing the development and deployment of large language models.

  3. A Faster and Cheaper Model for # AI Agents and Codin - https:// kensbookinfo.blogspot.com/p/ai .html#34 # Art Cure by Daisy Fancourt review – is culture the - h

    A new, more efficient model has been developed for AI agents and coding tasks, promising faster and cheaper performance. Separately, discussions are ongoing regarding the potential impact of AI on human agency and the future of autonomous agents. The news also touches on unrelated topics such as sports, international relations, and public health. AI

    A Faster and Cheaper Model for # AI Agents and Codin - https:// kensbookinfo.blogspot.com/p/ai .html#34 # Art Cure by Daisy Fancourt review – is culture the - h

    IMPACT A new, more efficient model for AI agents and coding could accelerate development and deployment in these areas.

  4. Blog Update: Tried out the AI video generation model "Ojami Omni" announced at Google I/O 2026 https://kanoayu.cloudfree.jp/2026/05/21/%ef%bd%b8%ef%be%9e%ef%bd%b8%ef%be%9e%ef%be%9a%ef%bd%b6%ef%bd%bdi-

    Google announced Gemini Omni at Google I/O 2026, a new AI model capable of generating video. Early users have begun experimenting with the model, sharing their initial experiences and results. The model's capabilities are being explored by the community following its unveiling. AI

    Blog Update: Tried out the AI video generation model "Ojami Omni" announced at Google I/O 2026 https://kanoayu.cloudfree.jp/2026/05/21/%ef%bd%b8%ef%be%9e%ef%bd%b8%ef%be%9e%ef%be%9a%ef%bd%b6%ef%bd%bdi-

    IMPACT Sets a new benchmark for AI video generation capabilities, potentially influencing future creative tools and media production.

  5. CVPR 2026 (June 3-7, Denver) is approaching. This year's highlight is YOLO26 ── a lightweight edge AI that handles object detection, segmentation, and pose estimation in one model. The day when real-time inference becomes a reality on manufacturing inspection lines is just around the corner. RUNTEC's MOD supports quality inspection in the manufacturing industry with object detection AI. CV

    The upcoming CVPR 2026 conference in Denver will feature YOLO26, a new lightweight edge AI model capable of object detection, segmentation, and pose estimation. This advancement is expected to enable real-time inference for quality inspection lines in manufacturing settings. RUNTEC's MOD product already supports quality inspection in manufacturing using object detection AI, and the company anticipates new technology announcements at CVPR 2026. AI

    CVPR 2026 (June 3-7, Denver) is approaching. This year's highlight is YOLO26 ── a lightweight edge AI that handles object detection, segmentation, and pose estimation in one model. The day when real-time inference becomes a reality on manufacturing inspection lines is just around the corner. RUNTEC's MOD supports quality inspection in the manufacturing industry with object detection AI. CV

    IMPACT YOLO26's real-time inference capabilities could significantly enhance manufacturing quality control and efficiency.

  6. Logan Kilpatrick (@OfficialLoganK) Internal message that Gemini 3.5 will be a new turning point for the Gemini product line. The model itself is the product, and they have been preparing infrastructure, products, and teams for the past 2.5 years, and are now actively collecting user feedback. https://x.

    Google's Gemini 3.5 is poised to be a significant advancement for the Gemini product line, according to internal messages from Logan Kilpatrick. Kilpatrick highlighted that the model itself is now the product, with extensive preparation in infrastructure, product development, and team readiness over the past 2.5 years. The company is now actively seeking user feedback to further refine the model. AI

    Logan Kilpatrick (@OfficialLoganK) Internal message that Gemini 3.5 will be a new turning point for the Gemini product line. The model itself is the product, and they have been preparing infrastructure, products, and teams for the past 2.5 years, and are now actively collecting user feedback. https://x.

    IMPACT Signals a new product-centric phase for Google's Gemini models, emphasizing user feedback for iterative development.

  7. There is a new technique to speed up token generation called MTP. It predicts several future tokens, then the main model verifies them in parallel. There is a c

    A new method called MTP (Multi-Token Prediction) has been developed to accelerate token generation in AI models. This technique involves predicting multiple future tokens simultaneously and then having the main model verify them in parallel. However, MTP requires a significant increase in VRAM, which can lead to slower generation or reduced context size on GPUs with limited memory. The technique does not appear to reduce model hallucinations. AI

    There is a new technique to speed up token generation called MTP. It predicts several future tokens, then the main model verifies them in parallel. There is a c

    IMPACT This technique could speed up AI inference but requires more VRAM, potentially limiting its use on consumer hardware.

  8. # Cohere launches # CommandA +: Fast and multimodal # AI https:// gadgetflux.eu/cohere-lanseaza- command-a-ai-de-top/

    Cohere has released CommandA+, a new multimodal AI model designed for speed and advanced capabilities. This model aims to enhance user interaction and processing power within AI applications. Further details on its specific features and performance benchmarks are expected. AI

    # Cohere launches # CommandA +: Fast and multimodal # AI https:// gadgetflux.eu/cohere-lanseaza- command-a-ai-de-top/

    IMPACT Introduces a new multimodal model, potentially enhancing AI capabilities in speed and interaction.

  9. OpenAI o3 disproves an Erdős conjecture with 125 pages of reasoning, while OpenAI files for IPO at 850B valuation and Cohere returns with an open-weights MoE mo

    OpenAI's latest model, o3, has reportedly disproven an Erdős conjecture through extensive reasoning. Concurrently, OpenAI is rumored to be preparing for an IPO with a valuation of $850 billion. In related news, Cohere has released a new open-weights Mixture-of-Experts (MoE) model. AI

    IMPACT Potential IPO signals massive market confidence in AI, while new models and research breakthroughs push the frontier.

  10. How the New Hermes Agent Release Unlocks Free DeepSeek V4 and Native Windows Support The latest Hermes Agent Foundation Release, as detailed by World of AI, bri

    The latest release of the Hermes Agent Foundation provides access to the DeepSeek V4 model and introduces native Windows support. This update aims to improve accessibility and usability for users. The release details were shared by World of AI. AI

    IMPACT Enhances accessibility to open-source models like DeepSeek V4 for a wider user base.

  11. Google recasts Gemini Read GPS brief. www.global-political-spotlight.com/articles/gps-summaries/daily/2026-05-21-google-pivots-gemini-to-agentic-platform-at-i-o

    Google is shifting its Gemini AI model towards an agentic platform, moving beyond its initial focus on read summaries. This pivot was announced at the Google I/O conference, signaling a new direction for the AI's development and application. AI

    Google recasts Gemini Read GPS brief. www.global-political-spotlight.com/articles/gps-summaries/daily/2026-05-21-google-pivots-gemini-to-agentic-platform-at-i-o

    IMPACT Signals a shift in AI development towards more autonomous agentic capabilities, potentially impacting future product integrations and user interactions.

  12. A multi-agent LLM where each agent learns when to defer to a human, trained with GRPO on a cost-aware reward. Each defer event becomes SFT data, so the model gr

    Researchers have developed a multi-agent large language model that learns to defer to human input. The model is trained using GRPO on a reward system that accounts for costs, and each instance of deferral is used as supervised fine-tuning data. This allows the model to gradually incorporate human expertise, with a tunable cost parameter enabling a trade-off between accuracy and the budget for human intervention during deployment. AI

    IMPACT Introduces a novel training methodology for multi-agent LLMs, enabling adaptive collaboration with human experts.

  13. What new features announced at Google I/O 2026 are already available? Organized chronologically https:// pc.watch.impress.co.jp/docs/ne ws/2110624.html # impress # market # AI # Gemini

    Google I/O 2024 showcased numerous new features and updates, with a focus on AI integration across its product suite. Many of these advancements, particularly those related to Gemini AI, are already being rolled out or are available in preview. The event highlighted Google's commitment to making AI more accessible and useful in everyday applications. AI

    IMPACT Highlights Google's strategy to integrate advanced AI across its services, potentially impacting user experience and competition.

  14. Ricoh develops a high-performance Japanese large language model equivalent to GPT-5 with enhanced inference performance | Ricoh Co., Ltd. https://www.yayafa.com/2804982/ # AgenticAi # AI # ArtificialGeneralIntelligence # ArtificialIntelligence #

    Ricoh has developed a new Japanese large language model that matches GPT-5's performance, particularly in reasoning capabilities. This advanced model is designed to enhance AI applications and services. Separately, Needswell has introduced a new introductory training program for Microsoft 365 Copilot. AI

    Ricoh develops a high-performance Japanese large language model equivalent to GPT-5 with enhanced inference performance | Ricoh Co., Ltd. https://www.yayafa.com/2804982/ # AgenticAi # AI # ArtificialGeneralIntelligence # ArtificialIntelligence #

    IMPACT Ricoh's new Japanese LLM could advance AI capabilities in the region, while Needswell's training program aims to boost adoption of Microsoft's AI assistant.

  15. SMoA: Spectrum Modulation Adapter for Parameter-Efficient Fine-Tuning

    Researchers have introduced SMoA, a novel Spectrum Modulation Adapter designed to enhance parameter-efficient fine-tuning (PEFT) for large language models. Unlike traditional methods like Low-Rank Adaptation (LoRA) which face limitations in representational capacity with decreasing rank, SMoA aims to broaden the spectrum of adaptable updates within a smaller parameter budget. By partitioning layers into spectral blocks and applying modulated low-rank branches, SMoA demonstrates improved performance over existing LoRA-style baselines on various tasks. AI

    SMoA: Spectrum Modulation Adapter for Parameter-Efficient Fine-Tuning

    IMPACT Introduces a more efficient method for adapting large language models, potentially reducing computational costs for fine-tuning.

  16. UniT: Unified Geometry Learning with Group Autoregressive Transformer

    Researchers have introduced UniT, a novel unified model designed to advance geometry perception by integrating various capabilities into a single framework. This model utilizes a Group Autoregressive Transformer, treating groups of sensor observations as autoregressive units to predict point maps in an anchor-free and scale-adaptive manner. UniT effectively unifies diverse view configurations for both online and offline settings, incorporates a KV caching mechanism for long-horizon scalability, and employs a scale-adaptive geometry loss for improved metric-scale generalization. The model demonstrates state-of-the-art performance across ten benchmarks and seven representative tasks. AI

    UniT: Unified Geometry Learning with Group Autoregressive Transformer

    IMPACT Establishes a unified framework for diverse geometry perception tasks, potentially improving efficiency and performance in 3D reconstruction and sensor data analysis.

  17. Linear-DPO: Linear Direct Preference Optimization for Diffusion and Flow-Matching Generative Models

    Researchers have introduced Linear-DPO, a novel method for aligning text-to-image generative models. This approach generalizes the Direct Preference Optimization objective to encompass both diffusion and flow-matching models within a unified framework. By replacing the standard sigmoid-based utility function with a linear one and incorporating an EMA-updated reference model, Linear-DPO demonstrates superior performance over existing methods on diffusion models like SD1.5 and SDXL, as well as the flow-matching model SD3-Medium. AI

    Linear-DPO: Linear Direct Preference Optimization for Diffusion and Flow-Matching Generative Models

    IMPACT Introduces a more effective alignment technique for text-to-image models, potentially improving their adherence to user prompts.

  18. ROAR-3D: Routing Arbitrary Views for High-Fidelity 3D Generation

    Researchers have developed ROAR-3D, a novel method to enhance 3D generation from multiple images. This approach allows pretrained single-view 3D models to effectively utilize an arbitrary number of unposed images without requiring external reconstruction modules. ROAR-3D employs a token-wise view router and a dual-stream attention mechanism to manage 2D-to-3D correspondences and geometric enrichment, introducing minimal trainable parameters and inference overhead. AI

    ROAR-3D: Routing Arbitrary Views for High-Fidelity 3D Generation

    IMPACT Enables more accurate and flexible 3D generation from multiple images, potentially improving applications in virtual reality and content creation.

  19. HORST: Composing Optimizer Geometries for Sparse Transformer Training

    Researchers have developed HORST, a novel optimizer designed to improve the training of sparse transformers. Standard optimizers struggle to balance the need for sparsity with training stability. HORST addresses this by composing optimizer steps as non-commutative operators, integrating hyperbolic geometry to achieve both stability and L1 sparsity bias. Experiments show HORST significantly outperforms AdamW baselines, especially at higher sparsity levels, across vision and language tasks. AI

    HORST: Composing Optimizer Geometries for Sparse Transformer Training

    IMPACT Enables more efficient training of sparse transformer models, potentially leading to smaller and faster AI systems.

  20. Reviving Error Correction in Modern Deep Time-Series Forecasting

    Researchers have developed a new method to combat error accumulation in deep time-series forecasting models. Their Universal Error Corrector with Seasonal-Trend Decomposition (UEC-STD) is an architecture-agnostic model that can be added to existing forecasters without retraining. By separately adjusting trend and seasonal components, UEC-STD significantly enhances prediction accuracy and robustness across various models and datasets, offering a practical solution for long-term forecasting challenges. AI

    Reviving Error Correction in Modern Deep Time-Series Forecasting

    IMPACT Enhances long-term prediction accuracy for deep learning models, offering a practical tool for time-series forecasting applications.

  21. Towards Understanding Self-Pretraining for Sequence Classification

    Researchers have investigated the effectiveness of self-pretraining (SPT) for Transformer models in sequence classification tasks. Their work replicates and ablates previous findings, suggesting that SPT improves optimization by enabling models to learn useful attention patterns. Specifically, the study highlights that SPT helps models learn proximity interactions, transforming absolute positional encodings into attention scores that bias towards nearby elements. This approach proves more effective than standard supervised training in certain Transformer configurations, as label supervision can overlook beneficial attention directions that masked reconstruction can detect. AI

    Towards Understanding Self-Pretraining for Sequence Classification

    IMPACT Enhances Transformer performance on sequence classification by improving attention mechanisms and overcoming limitations of standard supervised training.

  22. Grounding Driving VLA via Inverse Kinematics

    Researchers have developed a new approach to improve the visual grounding of Driving Vision-Language Models (VLAs) by framing trajectory prediction as an inverse kinematics problem. This method requires the model to predict both the current and future visual states, addressing a limitation in existing models that primarily rely on ego status and text commands. By incorporating a next visual state prediction objective and a dedicated Inverse Kinematics Network, a 0.5B-scale model achieved trajectory planning performance comparable to much larger VLAs, particularly in dynamic driving scenarios. AI

    Grounding Driving VLA via Inverse Kinematics

    IMPACT Novel method enhances visual grounding in driving models, potentially improving performance in complex scenarios.

  23. Diagnosing Overhead in Dispatch Operations: Cross-architecture Observatory

    A new research paper introduces DODOCO, a tool designed to diagnose overhead in dispatch operations for Mixture-of-Experts (MoE) models. The study found that common assumptions about workload characteristics and the effectiveness of existing mitigation strategies do not hold true for production routing. Specifically, the research indicates that scaling expert parallelism has minimal impact on routing imbalance, and mock-token benchmarks overestimate routing disparities compared to real text data. AI

    Diagnosing Overhead in Dispatch Operations: Cross-architecture Observatory

    IMPACT Reveals critical performance bottlenecks in MoE models, potentially guiding future interconnect and dispatch design.

  24. Thinking-while-speaking: A Controlled, Interleaved Reasoning Method for Real-Time Speech Generation

    Researchers have developed a new method called InterRS to enable AI to generate speech while simultaneously performing complex reasoning, mimicking human communication. This approach precisely interleaves reasoning steps within natural speech flow, requiring specially aligned data and a novel training pipeline. The method improves performance on logic and math benchmarks by 13% and produces more natural, fluent responses compared to existing techniques. AI

    Thinking-while-speaking: A Controlled, Interleaved Reasoning Method for Real-Time Speech Generation

    IMPACT Enables more human-like AI interaction by allowing real-time speech generation alongside complex reasoning.

  25. DASH: Fast Differentiable Architecture Search for Hybrid Attention in Minutes on a Single GPU

    Researchers have developed DASH, a novel differentiable architecture search framework designed to rapidly discover efficient hybrid attention mechanisms for large language models. Unlike previous methods that required extensive computational resources, DASH significantly reduces search time and token usage by relaxing discrete operator placement into continuous logits and freezing model weights. This approach consistently yields superior results compared to existing baselines and even surpasses some released models, demonstrating that high-quality hybrid attention architectures can be found in minutes on a single GPU. AI

    DASH: Fast Differentiable Architecture Search for Hybrid Attention in Minutes on a Single GPU

    IMPACT Enables rapid, efficient discovery of optimized LLM attention mechanisms, potentially accelerating model development.

  26. Winfree Oscillatory Neural Network

    Researchers have introduced the Winfree Oscillatory Neural Network (WONN), a novel dynamical architecture that leverages generalized Winfree dynamics for computation. This model represents data on a torus through structured oscillatory interactions, combining phase-based inductive biases with flexible interaction mechanisms. WONN has demonstrated competitive performance on image recognition and complex reasoning tasks, including ImageNet and Sudoku, while showing significant parameter efficiency compared to existing models. AI

    Winfree Oscillatory Neural Network

    IMPACT Introduces a novel, parameter-efficient architecture that scales to challenging benchmarks, potentially offering an alternative to conventional neural networks.

  27. Strategy-Induct: Task-Level Strategy Induction for Instruction Generation

    Researchers have developed Strategy-Induct, a new framework for generating effective task-level instructions for large language models. This method bypasses the need for labeled answers by first prompting the model to create reasoning strategies for example questions. These strategy-question pairs are then used to induce a task instruction, which has shown superior performance compared to existing question-only approaches on various tasks and model scales. AI

    Strategy-Induct: Task-Level Strategy Induction for Instruction Generation

    IMPACT This new method for instruction generation could reduce the cost and complexity of fine-tuning LLMs by eliminating the need for labeled answers.

  28. SynCB: A Synergy Concept-Based Model with Dynamic Routing Between Concepts and Complementary Neural Branches

    Researchers have developed a new framework called SynCB, which integrates concept-based models with standard neural networks. This hybrid approach uses a trainable routing module to dynamically select between a concept-based branch for interpretability and a complementary neural branch for performance. The two branches are learned jointly, allowing for information sharing and improved responsiveness to human interventions during testing. SynCB has demonstrated superior accuracy and intervention performance across multiple datasets compared to existing methods. AI

    SynCB: A Synergy Concept-Based Model with Dynamic Routing Between Concepts and Complementary Neural Branches

    IMPACT Introduces a novel hybrid architecture that balances model interpretability with performance, potentially influencing future research in explainable AI.

  29. HDMoE: A Hierarchical Decoupling-Fusion Mixture-of-Experts Framework for Multimodal Cancer Survival Prediction

    Researchers have developed a new framework called HDMoE to improve multimodal cancer survival prediction. This hierarchical decoupling-fusion mixture-of-experts approach aims to better integrate data from sources like whole slide images and genomic profiles. The framework addresses limitations in existing methods by reducing redundant information before feature decoupling and by modeling fine-grained relationships within and between modalities. AI

    HDMoE: A Hierarchical Decoupling-Fusion Mixture-of-Experts Framework for Multimodal Cancer Survival Prediction

    IMPACT Introduces a novel framework for integrating diverse medical data, potentially improving diagnostic accuracy and patient outcomes in oncology.

  30. CAdam: Context-Adaptive Moment Estimation for 3D Gaussian Densification in Generative Distillation

    Researchers have developed CAdam, a new framework for generative distillation in 3D Gaussian Splatting that addresses limitations in adaptive densification. CAdam reinterprets densification as a signal verification problem, using gradient moments to distinguish consistent geometric signals from generative noise. This approach significantly reduces the number of Gaussian primitives needed while maintaining perceptual quality, improving memory efficiency in generative 3D tasks. AI

    CAdam: Context-Adaptive Moment Estimation for 3D Gaussian Densification in Generative Distillation

    IMPACT Improves memory efficiency and representation quality in 3D generative models by reducing redundant primitives.

  31. Activation-Free Backbones for Image Recognition: Polynomial Alternatives within MetaFormer-Style Vision Models

    Researchers have developed new activation-free backbone architectures for vision models, utilizing polynomial functions instead of traditional pointwise nonlinearities like ReLU or GELU. These novel modules, integrated into the MetaFormer framework, demonstrate competitive or superior performance compared to activation-based models on tasks such as ImageNet classification and semantic segmentation. The study also shows these polynomial variants outperform prior specialized polynomial networks while requiring less computational cost. AI

    Activation-Free Backbones for Image Recognition: Polynomial Alternatives within MetaFormer-Style Vision Models

    IMPACT Introduces a new architectural approach for vision models that could lead to more efficient and robust image recognition systems.

  32. Most Transformer Modifications Still Do Not Transfer at 1-3B: A 2020-2026 Update to Narang et al. (2021) with Downstream Evaluation and a Noise Floor

    A recent study re-evaluated the effectiveness of Transformer model modifications, finding that most still do not yield significant improvements when scaled to 1-3 billion parameters. Researchers tested 20 modifications introduced after 2021, using downstream evaluation metrics and controlling for variables like data, compute, and training recipes. The findings largely echo a 2021 study, with only a couple of modifications showing benefits, and one of those proving unstable at the larger scale. The research emphasizes the need for rigorous reporting, downstream evaluation, and cross-scale stability testing for architecture comparisons. AI

    Most Transformer Modifications Still Do Not Transfer at 1-3B: A 2020-2026 Update to Narang et al. (2021) with Downstream Evaluation and a Noise Floor

    IMPACT Confirms that architectural innovations in large language models often fail to scale effectively, suggesting a need for more robust evaluation methods.

  33. Tunable MAGMAX: Preference-Aware Model Merging for Continual Learning

    Researchers have developed Tunable MAGMAX, a new framework for continual learning that allows for preference-aware model merging. This method enables control over task-specific performance in merged models, adapting them to different deployment needs and user preferences. By using a preference vector and leveraging target environment data, the system can automatically construct optimal vectors without manual input. Experiments show Tunable MAGMAX effectively manages task-wise performance and adapts merged models to various environments, outperforming or matching baseline methods. AI

    Tunable MAGMAX: Preference-Aware Model Merging for Continual Learning

    IMPACT Enables more flexible deployment of continual learning models by allowing customization of task performance.

  34. Interaction Locality in Hierarchical Recursive Reasoning

    Researchers have introduced a new framework called interaction locality to measure how information flows within AI models during spatial reasoning tasks. This framework analyzes whether computations remain confined to nearby areas or semantic segments, or if they cross these boundaries. The study applied this to models like HRM, TRM, and MTU3D, finding that high-level states in recursive models tend to write information locally, accumulating into broader structures, while embodied models concentrate causal spatial structure at module boundaries. AI

    Interaction Locality in Hierarchical Recursive Reasoning

    IMPACT Introduces a novel measurement framework for analyzing spatial reasoning in AI, potentially leading to more efficient and interpretable models.

  35. Learning to Think in Physics: Breaking Shortcut Learning in Scientific Diffusion via Representation Alignment

    Researchers have developed a new framework called REPA-P to improve the accuracy and robustness of physics-informed diffusion models. This method aligns intermediate model representations with physical states during training by using lightweight projection heads that are removed during inference, thus adding no computational overhead. Experiments across four different physics tasks demonstrated that REPA-P can accelerate convergence, reduce physics residuals, and enhance out-of-distribution performance. AI

    Learning to Think in Physics: Breaking Shortcut Learning in Scientific Diffusion via Representation Alignment

    IMPACT Enhances the accuracy and robustness of scientific diffusion models, potentially improving their application in fields like fluid dynamics and electromagnetism.

  36. Google Significantly Updates Movie Production Tool "Flow" and Music Production Tool "Flow Music", Introducing Gemini Omni, Adding AI Agents, Custom Tool Creation Features, and a New Mobile App https://fed.brid.gy/r/https://gigazine.net/news/20260520-fl

    Google DeepMind has announced Gemini Omni, a new family of multimodal generative models, integrated into its AI-powered creative tools Flow and Flow Music. The updates to Flow include AI agents for creative assistance, the ability to create custom tools using natural language, and enhanced video generation and editing capabilities with Gemini Omni. Flow Music also receives updates for finer music editing and music video generation, with both tools now available as mobile applications. AI

    Google Significantly Updates Movie Production Tool "Flow" and Music Production Tool "Flow Music", Introducing Gemini Omni, Adding AI Agents, Custom Tool Creation Features, and a New Mobile App https://fed.brid.gy/r/https://gigazine.net/news/20260520-fl

    IMPACT Enhances creative workflows by integrating advanced AI agents and models for video and music production.

  37. Distributional Alignment as a Criterion for Designing Task Vectors in In-Context Learning

    Researchers have introduced a new metric, $d_{\text{NTP}}$, to evaluate the effectiveness of task vectors in large language models by measuring the discrepancy in next-token probabilities between task vector-based and in-context learning inference. This metric serves as a proxy for performance, correlating negatively with downstream accuracy. Based on this, they developed the Linear Task Vector (LTV) method, which uses a closed-form linear mapping to minimize $d_{\text{NTP}}$, outperforming existing baselines by an average of 9.2% in accuracy across various benchmarks and LLMs while reducing inference latency. The study also demonstrated that task vectors extracted from larger models can improve smaller models' performance by 6.4%, indicating potential for cross-model scale transferability. AI

    Distributional Alignment as a Criterion for Designing Task Vectors in In-Context Learning

    IMPACT Improves LLM inference efficiency and accuracy by optimizing task vector design, potentially reducing computational costs.

  38. Graph Navier Stokes Networks

    Researchers have introduced Graph Navier Stokes Networks (GNSN), a new architecture designed to address the oversmoothing problem in Graph Neural Networks. Unlike traditional diffusion-based methods, GNSN incorporates convection to create a dynamic velocity field for more efficient message propagation. This approach allows GNSN to better handle datasets with varying homophily and has demonstrated superior performance on multiple real-world classification tasks. AI

    IMPACT Introduces a novel architecture to improve GNN performance and address oversmoothing, potentially enhancing graph-based machine learning tasks.

  39. Dynamic TMoE: A Drift-Aware Dynamic Mixture of Experts Framework for Non-Stationary Time Series Forecasting

    Researchers have introduced Dynamic TMoE, a novel framework designed to improve time series forecasting for non-stationary data. This approach addresses limitations in existing Mixture-of-Experts models by dynamically creating and removing experts based on detected distribution shifts. A temporal memory router further enhances stability by using recurrent states and an anomaly repository for context-aware expert selection, leading to significant performance gains. AI

    Dynamic TMoE: A Drift-Aware Dynamic Mixture of Experts Framework for Non-Stationary Time Series Forecasting

    IMPACT Introduces a novel framework that improves time series forecasting accuracy for non-stationary data, potentially benefiting applications relying on predictive modeling.

  40. OlmoEarth v1.1: A more efficient family of models

    Allen AI has released OlmoEarth v1.1, an updated family of models designed for processing satellite imagery more efficiently. These new models reduce compute costs by up to 3x for inference and require 1.7x fewer GPU hours for training, while maintaining performance on remote sensing tasks. The efficiency gains are achieved by optimizing the tokenization process for transformer-based architectures, specifically by merging resolution-based tokens without significant performance degradation. AI

    OlmoEarth v1.1: A more efficient family of models

    IMPACT Offers significant cost reductions for satellite imagery analysis, potentially enabling wider adoption of AI for environmental monitoring and mapping.

  41. Jointly Learning Predicates and Actions Enables Zero-Shot Skill Composition

    Researchers have developed a new method called Predicate Action Skills (PACTS) that allows robots to learn and compose skills without retraining. PACTS models both the physical actions and the symbolic outcomes of these actions, enabling better generalization. This approach facilitates zero-shot skill composition through planning by using predicted outcomes to sequence and monitor task execution. AI

    Jointly Learning Predicates and Actions Enables Zero-Shot Skill Composition

    IMPACT Enables robots to learn and combine skills more flexibly, potentially accelerating the development of more adaptable robotic systems.

  42. RankE: End-to-End Post-Training for Discrete Text-to-Image Generation with Decoder Co-Evolution

    Researchers have introduced RankE, a novel end-to-end post-training framework designed to improve discrete text-to-image generation models. Unlike previous methods that kept the VQ decoder frozen, RankE co-evolves both the policy and the decoder through alternating optimization. This approach addresses latent covariate shift, where policy improvements lead to degraded image quality. Experiments on LlamaGen-XL and Janus-Pro models demonstrate that RankE simultaneously enhances both alignment (CLIP score) and image fidelity (FID score), breaking the trade-off seen in earlier techniques. AI

    IMPACT Introduces a new method to improve image fidelity and alignment in discrete text-to-image models, potentially enhancing generative AI capabilities.

  43. Pareto-Enhanced Portrait Generation: Vision-Aligned Text Supervision for Alignment, Realism, and Aesthetics

    Researchers have developed a new method to improve text-to-image diffusion models for generating human portraits, addressing the common trade-off between text alignment, realism, and aesthetics. Their approach uses a feature supervision paradigm with a lightweight cross-modal alignment mechanism that extracts vision-aligned text representations from SigLIP 2. This method injects guidance into the image generation process without degrading the model's original capabilities or requiring extra inference time, while also optimizing for human-perceived aesthetics. AI

    Pareto-Enhanced Portrait Generation: Vision-Aligned Text Supervision for Alignment, Realism, and Aesthetics

    IMPACT Introduces a novel technique to improve the quality and coherence of AI-generated portraits, potentially impacting creative tools and applications.

  44. HRM-Text: Efficient Pretraining Beyond Scaling

    Researchers have developed HRM-Text, a novel Hierarchical Recurrent Model that significantly reduces the computational resources and training data required for pretraining large language models. By decoupling computation into strategic and execution layers and training exclusively on instruction-response pairs, a 1B-parameter model achieved competitive performance on several benchmarks with a fraction of the tokens and compute used by standard models. This approach makes foundational LLM research more accessible by lowering the barrier to entry for pretraining from scratch. AI

    HRM-Text: Efficient Pretraining Beyond Scaling

    IMPACT Enables more researchers to train foundational models from scratch, potentially accelerating innovation.

  45. Beyond Routing: Characterising Expert Tuning and Representation in Vision Mixture-of-Experts

    Researchers have developed new methods to understand the internal workings of Mixture-of-Experts (MoE) models in computer vision. By analyzing how different visual categories are routed to specific experts and examining the tuning of these experts to various inputs, they found that an animate-inanimate distinction is a dominant factor in expert partitioning. The study reveals that experts tune to broader, continuous visual and semantic dimensions beyond simple category boundaries, highlighting the benefits of moving beyond basic routing analyses for a deeper understanding of MoE specialization. AI

    Beyond Routing: Characterising Expert Tuning and Representation in Vision Mixture-of-Experts

    IMPACT Provides novel methods for interpreting the specialized functions within complex vision models, advancing AI research.

  46. Self-Training Doesn't Flatten Language -- It Restructures It: Surface Markers Amplify While Deep Syntax Dies

    A new research paper proposes the Structural Depth Hypothesis (SDH) to explain how self-training restructures language models. The study found that while surface-level linguistic features like discourse markers increase, deeper syntactic structures such as questions and passives decline. This effect was observed across multiple models and architectures, suggesting it's a specific outcome of self-training rather than a general language model behavior. AI

    Self-Training Doesn't Flatten Language -- It Restructures It: Surface Markers Amplify While Deep Syntax Dies

    IMPACT This research suggests that self-training may lead to LLMs that are superficially complex but lack deep syntactic understanding, impacting data curation and text detection.

  47. CEPO: RLVR Self-Distillation using Contrastive Evidence Policy Optimization

    Researchers have developed two novel self-distillation techniques for language models to improve performance on complex reasoning tasks. AVSD (Adaptive-View Self-Distillation) balances consensus and view-specific signals from multiple teacher models to provide more reliable supervision. CEPO (Contrastive Evidence Policy Optimization) sharpens the reward signal by distinguishing decisive reasoning steps from filler tokens, using contrastive learning against incorrect answers. Both methods show significant improvements on mathematical and code-generation benchmarks, outperforming existing self-distillation baselines. AI

    CEPO: RLVR Self-Distillation using Contrastive Evidence Policy Optimization

    IMPACT These new self-distillation techniques offer improved methods for training LLMs, potentially leading to more capable models for complex reasoning tasks.

  48. Reinforcing Human Behavior Simulation via Verbal Feedback

    Researchers have developed DITTO, a new model that learns to simulate human behavior by incorporating verbal feedback as a primary signal in reinforcement learning. This approach, detailed in a new paper, treats subjective and multi-faceted guidance as a first-class input, optimizing for improved rollouts based on this feedback. DITTO demonstrated a 36% improvement over its base model and outperformed GPT-5.4 on six benchmarks within the newly introduced SOUL suite, which comprises ten tasks across various human-like behavior simulations. AI

    Reinforcing Human Behavior Simulation via Verbal Feedback

    IMPACT This research introduces a novel method for training LLMs to better simulate human behavior, potentially improving their utility in roles requiring nuanced social understanding.

  49. Training Language Agents to Learn from Experience

    Researchers have developed a new framework called In-context Training (ICT) to evaluate how language agents can improve their performance on future tasks by learning from past experiences. This approach trains a 'reflector' model to generate system prompts that guide an 'actor' model, enabling cross-task self-improvement without human examples. Experiments in ALFWorld and MiniHack demonstrated that agents trained with ICT outperformed baselines and even generalized to new environments, suggesting that the ability to learn from experience can itself be learned. AI

    Training Language Agents to Learn from Experience

    IMPACT Enables language agents to generalize learning across tasks, potentially accelerating development of more adaptable AI systems.

  50. When Reasoning Supervision Hurts: TTCW-Based Long-Form Literary Review Generation

    Researchers have developed a new dataset containing over 260,000 long-form stories, each annotated with creativity scores and review comments based on the Torrance Test of Creative Writing (TTCW). They fine-tuned Qwen3 models on this data to generate literary reviews, finding that models trained without explicit reasoning supervision performed better. The study suggests that for structured, rubric-based review generation, reasoning supervision may not be beneficial and can even lead to irrelevant or repetitive outputs. AI

    When Reasoning Supervision Hurts: TTCW-Based Long-Form Literary Review Generation

    IMPACT Introduces a novel dataset and methodology for AI-driven literary review generation, potentially improving automated evaluation of creative writing.