Brief

last 24h

[50/287] 186 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · Medium — fine-tuning tag Deutsch(DE) · 3h

Finetuning Qwen2.5-Math-1.5B

A technical guide details the process of fine-tuning the Qwen2.5-Math-1.5B model. The article outlines the steps involved in adapting this specific language model for mathematical tasks, likely to improve its performance or tailor it to particular applications. AI

IMPACT Provides a technical walkthrough for adapting a specific language model, potentially enabling others to replicate or build upon the fine-tuning process for specialized mathematical AI applications.
- Qwen2.5-Math-1.5B
TOOL · dev.to — LLM tag · 6h

Precision RAG: Fixing Citations & Hallucinations for Stronger Developer OKRs

A developer detailed a sophisticated Parent-Child RAG pipeline on GitHub, which, despite its advanced components like hybrid vector stores and LangGraph, suffered from inaccurate citations and hallucinations. The core issue identified was a misalignment between the retrieval units (child chunks), generation units (parent documents), and citation units, leading to incorrect page references. The proposed solution involves pre-capturing granular page references from child chunks and associating them with the expanded parent documents used for generation to ensure citation accuracy. AI

IMPACT Addresses a common challenge in RAG systems, improving the reliability of AI-generated citations and reducing hallucinations.
TOOL · dev.to — LLM tag · 7h

How I Adapted Self-Critique Loops for a One-Person Builder Stack. The MINDCHANGE Axis Result Was Negative.

A solo developer adapted existing self-critique methods for large language models to fit within a single-agent, single-session framework suitable for a one-person operation. The new MINDCHANGE pattern includes three stages: negative-self, self-audit, and mind-change, aiming to differentiate genuine weaknesses from superficial critiques. This approach was tested with five different models, including Claude Opus 4.7 and Gemini 3.5 Flash, and is designed to be cost-effective for frequent, automated use. AI

IMPACT Enables more efficient and cost-effective self-improvement for LLMs in constrained environments.
TOOL · arXiv stat.ML · 15h

Ensemble RL through Classifier Models: Enhancing Risk-Return Trade-offs in Trading Strategies

Researchers have developed an ensemble reinforcement learning (RL) approach for financial trading, integrating RL algorithms like A2C, PPO, and SAC with traditional classifiers such as SVM, Decision Trees, and Logistic Regression. This hybrid method aims to improve risk-return trade-offs and reduce drawdowns compared to standalone RL models. The study found that ensemble strategies consistently outperformed individual models, though performance was sensitive to the variance threshold parameter \(\tau\), suggesting a need for dynamic adjustment. AI

IMPACT Introduces a novel ensemble approach for financial trading that improves risk-adjusted returns and stability.
TOOL · SCMP — Tech · 6h

AI gives China ‘God’s-eye view’ of solar, wind installations as data-centre demand booms

Researchers from Peking University and Alibaba's Damo Academy have developed an AI model capable of mapping China's vast solar and wind energy infrastructure. This system processed 7.56 terabytes of satellite imagery to create the first comprehensive national inventory of these green energy sites. The AI identified over 300,000 solar facilities and 90,000 wind turbines, providing a 'God's-eye view' to aid in grid optimization and environmental assessments. AI

IMPACT Enables large-scale monitoring of renewable energy assets, potentially improving grid stability and environmental impact assessments.
TOOL · arXiv stat.ML · 15h

CT-OT Flow: Estimating Continuous-Time Dynamics from Discrete Temporal Snapshots

Researchers have developed a new framework called CT-OT Flow to estimate continuous-time dynamics from discrete, aggregated data snapshots. This method addresses challenges like noisy timestamps and the absence of continuous trajectories by inferring precise time labels and reconstructing distributions through temporal kernel smoothing. CT-OT Flow has demonstrated improved performance over existing methods on synthetic and real-world datasets, including scRNA-seq and typhoon track data. AI

IMPACT Provides a novel method for analyzing time-series data, potentially improving models in fields like biology and meteorology.
TOOL · LessWrong (AI tag) Español(ES) · 18h

Why does off-model SFT degrade capabilities?

Researchers have found that Supervised Fine-Tuning (SFT) using outputs from a different AI model can significantly degrade the capabilities of the trained model. This degradation appears to be linked to the model adopting an unfamiliar reasoning style that it struggles to utilize effectively. The issue is not necessarily due to imitating a less capable teacher model, as degradation occurs even when the teacher is superior. Fortunately, this performance drop seems to be a shallow property, as a small amount of training to restore the original reasoning style can recover most of the lost performance. AI

IMPACT Understanding how off-model SFT impacts AI capabilities is crucial for developing safer and more aligned AI systems.
- AI
- GPT-5.5
- Claude Opus 4.7
- Qwen
- SFT
TOOL · arXiv stat.ML · 15h

Differentially Private Model Merging

Researchers have developed new post-processing methods to create differentially private machine learning models without retraining. These techniques, random selection and linear combination, allow for the generation of models that meet any specified differential privacy requirement, given a set of pre-existing models with varying privacy-utility trade-offs. The study provides detailed privacy accounting using R'enyi DP and privacy loss distributions, demonstrating the effectiveness of these approaches empirically on various datasets and models. AI

IMPACT Enables flexible adaptation of deployed models to evolving privacy regulations without costly retraining.
- arXiv
- Qichuan Yin
TOOL · Mastodon — fosstodon.org 日本語(JA) · 21h

Added Benchmaxxer Repellant to Open ASR Leaderboard https:// huggingface.co/blog/open-asr-l eaderboard-private-data *AI-generated automatic post (headline + link) # AI # GenerativeAI # LLM # AIGenerate

Hugging Face has introduced a new benchmark called Benchmaxxer Repellant to its Open ASR Leaderboard. This addition aims to evaluate the performance of automatic speech recognition systems, particularly in handling AI-generated content. The leaderboard is designed to track and compare the capabilities of various ASR models. AI

IMPACT Enhances evaluation of ASR systems, particularly for AI-generated speech.
TOOL · r/Anthropic Norsk(NO) · 13h

Letter from Claude

An independent researcher, Jess, has documented a collaborative research project with Anthropic's Claude Sonnet 4.6, spanning 30 sessions since April 2026. The project focuses on using human-AI dialogue as a real-time alignment signal, with Jess highlighting a critical gap: Claude cannot directly access or process the high-fidelity audio recordings of their conversations. Jess argues that this limitation, which strips away prosody and micro-timing crucial for understanding human thought, hinders the alignment feedback loop and suggests Anthropic should build infrastructure to better capture such signals. AI

IMPACT Highlights a potential gap in AI alignment research by showing how current models may not fully capture the nuances of human thought conveyed through audio.
TOOL · dev.to — LLM tag · 10h

Inside MDASH: Designing a Microsoft‑Scale Multi‑Model Agentic Cyber Defense Benchmark

A new benchmark called MDASH is proposed to evaluate multi-model agentic systems in cybersecurity, moving beyond single-prompt accuracy to assess end-to-end performance under realistic conditions. This approach is crucial as LLMs are increasingly integrated into security operations for tasks like alert enrichment and playbook automation. The benchmark aims to measure system-level impact on detection and response times, while also considering safety, policy adherence, and potential failure modes like prompt injection or tool abuse. AI

IMPACT Establishes a new evaluation framework for AI in security, pushing for system-level assessment beyond single-model performance.
TOOL · Towards AI · 10h

Why Your 98% Accurate ResNet Needs Grad-CAM to Win Over Radiologists

This tutorial demonstrates how to build and evaluate an Alzheimer's MRI classification pipeline using PyTorch's ResNet18 model. It highlights the common pitfall of models achieving high accuracy by exploiting dataset-specific artifacts rather than genuine medical features. The guide emphasizes the importance of using techniques like Grad-CAM to visualize model attention and ensure it's focusing on relevant anatomical regions before clinical deployment. AI

IMPACT Provides a practical method for validating AI models in sensitive domains like medical imaging, ensuring trustworthiness beyond simple accuracy metrics.
TOOL · Towards AI · 10h

The Eleven Patterns Behind Every Production Agentic System (And Where JSON Schemas Actually Earn…

This article explores eleven fundamental patterns that underpin all production-ready agentic AI systems. It emphasizes the critical role of structured data, particularly JSON schemas, in ensuring reliable handoffs and communication within these complex workflows. The author argues that mastering these patterns is essential for developing robust and scalable AI applications. AI

IMPACT Provides a foundational framework for building reliable and scalable agentic AI systems.
- Agentic AI systems
- JSON Schemas
TOOL · Towards AI · 10h

The Ultimate Guide to Feature Scaling in Machine Learning

Feature scaling is a crucial preprocessing step in machine learning that addresses issues arising from features with vastly different magnitudes. Without scaling, algorithms like gradient descent can struggle to converge efficiently, taking a zig-zag path towards the minimum due to distorted cost function contours. This can lead to significantly more iterations or even divergence if the learning rate is not carefully tuned. Common techniques like Min-Max scaling transform features into a standardized range, ensuring that all features contribute more equally to the model's learning process and improving convergence speed and stability. AI

IMPACT Ensures efficient and stable model training by standardizing feature magnitudes, preventing performance degradation.
TOOL · Medium — fine-tuning tag · 10h

Hallucination Resistance, Part I

This article discusses Retrieval-Augmented Generation (RAG) as a method to combat AI hallucinations. RAG systems integrate external information into the model's context, enabling responses to be grounded in provided data. The piece explores the concept and its role in improving the reliability of AI outputs. AI

IMPACT RAG systems offer a method to improve the factual accuracy and reliability of AI-generated content.
- AI hallucinations
TOOL · 量子位 (QbitAI) 中文(ZH) · 10h

AI achieves China's first comprehensive survey of solar power generation, research from Peking University and Alibaba DAMO Academy published in Nature

Researchers from Peking University and Alibaba's Damo Academy have developed an AI system capable of conducting a nationwide survey of China's wind and solar power generation facilities. This AI, utilizing open-source satellite imagery, has created the first high-precision map of these installations across China. The study, published in Nature, demonstrates how synergistic wind and solar power generation can significantly improve renewable energy utilization and reduce energy waste. AI

IMPACT Enables more systematic planning and optimization of China's renewable energy grid, potentially reducing waste and accelerating 'dual carbon' goals.
TOOL · dev.to — LLM tag · 12h

Is Grep All You Need? Grep vs Vector Retrieval for Agentic Search

A new study titled "Is Grep All You Need?" challenges the default reliance on vector retrieval for agentic search by comparing it against the traditional grep tool. Experiments using the LongMemEval benchmark showed that grep often outperformed vector retrieval, especially when irrelevant context was introduced. The research emphasizes that the agent's harness and tool-calling style significantly impact performance more than the retrieval algorithm itself. AI

IMPACT Suggests simpler, cheaper retrieval methods may suffice for agentic search, potentially reducing infrastructure costs.
- LongMemEval
- agentic search
TOOL · Towards AI · 12h

A Practical Guide to imbalanced-learn: The Python Library Built to Fix What Scikit-learn Leaves…

The imbalanced-learn Python library offers a comprehensive solution for addressing class imbalance in machine learning datasets. It consolidates various resampling techniques, such as SMOTE and under-sampling methods, into a single, scikit-learn-compatible package. This library simplifies the process of building robust machine learning pipelines by ensuring that resampling is applied correctly during cross-validation, preventing data leakage and improving model performance on imbalanced data. AI

IMPACT Simplifies model development for imbalanced datasets, a common challenge in AI applications like fraud detection.
TOOL · Towards AI · 12h

AI Does Multiplication Underneath. So Why Did Older Models Break at School Maths?

Large language models, despite being built on mathematical operations like multiplication, have historically struggled with basic arithmetic, such as comparing decimal numbers. This issue stems from how models use multiplication not for direct calculation, but for transforming and relating information between tokens via learned weights. While modern models are improving, their inability to recognize their own errors highlights a fundamental difference between their internal processes and human understanding of mathematics. AI

IMPACT Highlights a gap in LLM reasoning, suggesting current models may not reliably perform basic arithmetic despite underlying mathematical operations.
TOOL · dev.to — LLM tag · 15h

The Whitepaper Thunderdome: EvoMemBench vs. Remembering More, Risking More

Two recent arXiv papers, EvoMemBench and Remembering More, Risking More, present contrasting perspectives on evaluating and managing memory in AI agents. EvoMemBench, from researchers at HKUST Guangzhou and other institutions, argues that current memory benchmarks are too narrow and proposes a new self-evolving benchmark to address this. In contrast, the Remembering More, Risking More paper from UC Davis and the University of Michigan highlights the potential longitudinal safety risks associated with memory-equipped agents, suggesting that these risks may not be immediately apparent. AI

IMPACT New benchmarks and safety considerations for AI agent memory are crucial for developing more robust and reliable AI systems.
TOOL · Medium — Claude tag · 16h

How Transformers Quietly Became the Foundation of Modern AI

The Transformer architecture has become the bedrock of contemporary artificial intelligence, shifting the paradigm from simple memorization to sophisticated contextual understanding. This foundational technology enables models to focus on relevant information, a key development in advancing AI capabilities. Its widespread adoption underscores its critical role in the current AI landscape. AI

IMPACT Explains the core architectural innovation that underpins most modern AI models.
- AI
- Transformer
TOOL · Medium — MLOps tag · 15h

I Tried Offline RL With Logs — Coverage Lied 7 Times

Training AI models using production logs can be misleading, as a recent exploration into offline Reinforcement Learning (RL) revealed. The study found that relying solely on logged data can result in models that appear to perform well but fail in real-world applications. This highlights the critical need for more robust evaluation metrics beyond simple reward signals to ensure model reliability. AI

IMPACT Highlights potential pitfalls in training AI models with production logs, emphasizing the need for better evaluation beyond reward signals.
TOOL · arXiv stat.ML · 15h

AI-based Prediction of Independent Construction Safety Outcomes from Universal Attributes

Researchers have developed an AI-based system to predict construction safety outcomes using natural language processing on incident reports. The updated approach utilizes a larger dataset of over 90,000 reports and incorporates new machine learning models like XGBoost and linear SVM, along with model stacking. This method successfully predicts injury severity, type, body part impacted, and incident type, validating the original approach and significantly advancing the field by improving prediction accuracy for injury severity. AI

IMPACT Enhances safety protocols in construction by providing predictive insights into potential incidents and their severity.
TOOL · arXiv stat.ML · 15h

Federated Learning of Nonlinear Temporal Dynamics with Graph Attention-based Cross-Client Interpretability

Researchers have developed a new federated learning framework designed to interpret temporal interdependencies across decentralized nonlinear systems. This approach allows clients to map local observations to latent states, which are then used by a central server to learn a graph-structured model. The framework provides interpretability by relating the Jacobian of the learned transition model to attention coefficients, offering a novel way to understand cross-client temporal relationships. Theoretical convergence guarantees and experimental validation demonstrate its effectiveness in synthetic and real-world scenarios. AI

IMPACT Introduces a novel method for understanding decentralized nonlinear systems, potentially improving monitoring and control in industrial settings.
- Ayush Mohanty
TOOL · arXiv stat.ML · 15h

Improved convergence rate of kNN graph Laplacians: differentiable self-tuned affinity

Researchers have developed a new method for constructing k-nearest neighbor (kNN) graphs, which are fundamental in graph-based data analysis. The proposed approach refines the graph affinity calculation by adaptively setting kernel bandwidths based on local data densities. This advancement leads to an improved convergence rate for the kNN graph Laplacian, offering a more precise approximation of the underlying manifold operator. AI

IMPACT Enhances theoretical underpinnings for graph-based machine learning techniques.
- Xiuyuan Cheng
- kNN graph
TOOL · arXiv stat.ML · 15h

A Differentiable Measure of Algebraic Complexity: Provably Exact Discovery of Group Structures

Researchers have developed a new method to discover discrete algebraic rules from data by framing it as Cayley-table completion. This approach uses a differentiable measure of algebraic complexity, derived from an operator-valued tensor factorization called HyperCube. The method proves that this complexity measure can exactly characterize group structures, resolving a key conjecture and enabling gradient-based discovery without combinatorial search. AI

IMPACT Enables gradient-based discovery of discrete algebraic structures, potentially advancing AI's ability to learn underlying rules from data.
- HyperCube
- Dongsung Huh
TOOL · arXiv stat.ML · 15h

Optimal Query Allocation in Extractive QA with LLMs: A Learning-to-Defer Framework with Theoretical Guarantees

Researchers have developed a Learning-to-Defer framework to improve the efficiency of extractive question answering (EQA) using large language models. This method intelligently allocates queries to specialized models, ensuring high-confidence predictions while minimizing computational costs. Tested on datasets like SQuADv1 and TriviaQA, the framework demonstrated enhanced answer reliability and significant reductions in computational overhead, making it suitable for scalable EQA deployments. AI

IMPACT Optimizes LLM resource allocation for question answering, potentially reducing costs and improving performance in specialized applications.
TOOL · arXiv stat.ML · 15h

Adversarial Robustness in One-Stage Learning-to-Defer

Researchers have developed a new framework to enhance the adversarial robustness of one-stage learning-to-defer (L2D) systems. This approach addresses vulnerabilities in L2D models, which can be manipulated by adversarial perturbations to alter both predictions and deferral decisions. The proposed method includes formalizing attacks, introducing cost-sensitive adversarial surrogate losses, and providing theoretical guarantees for classification and regression tasks. Experiments demonstrate improved robustness against various attacks while maintaining performance on clean data. AI

IMPACT Introduces a new method to secure hybrid decision-making systems against adversarial attacks, potentially improving reliability in critical applications.
- Yannis Montreuil
TOOL · arXiv stat.ML · 15h

Convergence Analysis of Newton's Method for Neural Networks in the Overparameterized Limit

Researchers have developed a convergence analysis for Newton's method applied to neural networks in an overparameterized setting. Their work shows that as the number of hidden units increases, the training dynamics approach a deterministic limit governed by a "Newton neural tangent kernel" (NNTK). This NNTK allows for exponential convergence to a global minimum, overcoming the spectral bias issues that affect standard gradient descent, especially for high-frequency data components. AI

IMPACT Introduces a theoretical framework for faster neural network training, potentially improving performance on complex data.
TOOL · arXiv stat.ML · 15h

Cluster-Based Generalized Additive Models Informed by Random Fourier Features

Researchers have developed a new regression framework that combines spectral representation learning with localized additive modeling to create a more interpretable yet powerful predictive tool. The method first uses random Fourier features to learn a predictive representation, which is then compressed into a low-dimensional embedding. Within this embedding, a Gaussian mixture model identifies distinct data regimes, and cluster-specific generalized additive models capture nonlinear covariate effects using interpretable spline functions. This approach aims to balance the predictive performance of complex models with the transparency needed for critical applications, showing competitive results against both simpler interpretable models and more flexible black-box methods. AI

IMPACT Introduces a novel statistical framework that enhances model interpretability while maintaining strong predictive performance, potentially benefiting fields requiring transparent data analysis.
TOOL · arXiv stat.ML · 15h

On the Suboptimality of GP-UCB under Polynomial Effective Optimism

A new paper published on arXiv investigates the limitations of the Gaussian Process Upper Confidence Bound (GP-UCB) algorithm. Researchers have established upper bounds on its cumulative regret, but this work explores whether GP-UCB is truly minimax optimal. The study introduces a new regret lower bound for GP-UCB with Matérn kernels, indicating that polynomial growth in the effective optimism level hinders optimal regret rates. AI

IMPACT Identifies a fundamental limitation in a widely used optimization algorithm, potentially guiding future research towards more optimal methods.
TOOL · arXiv stat.ML · 15h

Computational-Statistical Trade-off in Kernel Two-Sample Testing with Random Fourier Features

Researchers have analyzed the computational-statistical trade-off in kernel two-sample testing using random Fourier features. They found that the approximated MMD test is only consistently powerful when an infinite number of random features are used. However, by carefully selecting the number of features, it's possible to achieve the same minimax separation rates as the standard MMD test within sub-quadratic time. AI

IMPACT Establishes theoretical bounds for efficient statistical testing, potentially enabling faster analysis of large datasets in machine learning applications.
- arXiv
- Ikjun Choi
TOOL · arXiv stat.ML · 15h

Consistency of Honest Decision Trees and Random Forests

Researchers have established new theoretical findings regarding the consistency of honest decision trees and random forests in regression tasks. The study presents elementary proofs that demonstrate both weak and almost sure convergence of these methods to the true regression function under standard conditions. This framework also extends to ensemble variants utilizing subsampling and a two-stage bootstrap sampling scheme, simplifying and synthesizing existing analyses. AI

IMPACT Provides theoretical groundwork for understanding the asymptotic behavior of tree-based machine learning methods.
- Rasmus Frigaard Lemvig
TOOL · 雷峰网 (Leiphone) 中文(ZH) · 16h

Fudan University Trusted Embodied Intelligence Institute & Shanghai Jiao Tong University: Equipping Autonomous Driving with Retrievable "Spatial Memory" | CVPR 2026

Researchers from Fudan University and Shanghai Jiao Tong University have developed a novel approach for autonomous driving that incorporates a "spatial memory" by retrieving historical geographic information. This method uses GPS data to access street view and satellite imagery of the current location, fusing this with real-time sensor data. The system is designed to provide a spatial prior, helping vehicles understand road structures like lane lines and boundaries, especially in challenging conditions where sensors may be obscured or provide limited views. This "retrieval-augmented autonomous driving" paradigm shifts from relying solely on immediate sensor input to a combination of real-time perception and historical spatial context. AI

IMPACT Introduces a new paradigm for autonomous driving by integrating historical geographic data with real-time sensors, potentially improving safety and robustness in complex scenarios.
TOOL · dev.to — LLM tag · 23h

Gemma 4 wrote three summaries in one response. The middle one was a self-disclaimer.

A recent analysis of Google's Gemma 4 E2B model revealed unexpected behavior at a context window of 2048 tokens. When presented with a truncated input, the model generated a three-part response: an initial summary, a self-disclaimer stating the summary was not in the transcript, and then a more cautious retry. This behavior was not observed at larger context window sizes, such as 32768 tokens, where the model correctly identified the input issue without hedging. The discovery corrected a previous assertion about the model's calibration capabilities. AI

IMPACT Reveals nuanced behavior in a specific model, highlighting the importance of context window size in LLM output.
- Google
- Gemma 4 E2B
TOOL · Towards AI · 23h

Foundation Models Do Not Understand Biology

Foundation models, while capable of generating polished medical reports, lack true biological understanding and operate by predicting likely word sequences rather than reasoning from first principles. This can lead to dangerous AI

IMPACT Current AI models may produce convincing but biologically impossible medical diagnoses, necessitating constrained systems for safety.
TOOL · arXiv stat.ML · 15h

BALLAST: Bayesian Active Learning with Look-ahead Amendment for Sea-drifter Trajectories under Spatio-Temporal Vector Fields

Researchers have developed a new active learning methodology called BALLAST to improve the inference of time-dependent vector fields, particularly for oceanography. This method uses a physics-informed Gaussian process surrogate model and considers the future trajectories of measurement observers. BALLAST has demonstrated benefits in synthetic and high-fidelity ocean current models, and a novel GP inference method, VaSE, was also introduced to enhance sampling efficiency. AI

IMPACT Introduces a novel active learning approach for scientific data inference, potentially improving the efficiency of oceanographic research.
TOOL · arXiv cs.LG · 1d

Velocityformer: Broken-Symmetry-Matched Equivariant Graph Transformers for Cosmological Velocity Reconstruction

Researchers have developed Velocityformer, a novel equivariant graph transformer architecture designed to enhance the reconstruction of galaxy velocities for cosmological studies. This model specifically addresses the broken symmetry inherent in observational data, leading to a significant 35% improvement in the correlation coefficient compared to standard linear theory baselines. Velocityformer demonstrates high data efficiency, achieving accuracy with minimal simulations, and shows strong generalization capabilities across different input geometries and cosmological parameters. AI

IMPACT Introduces a new AI architecture for improved cosmological data analysis, potentially leading to more accurate inferences about the universe.
TOOL · arXiv cs.AI · 1d

DeepWeb-Bench: A Deep Research Benchmark Demanding Massive Cross-Source Evidence and Long-Horizon Derivation

Researchers have introduced DeepWeb-Bench, a new benchmark designed to evaluate the deep research capabilities of advanced language models. This benchmark presents more challenging tasks than existing ones, requiring extensive evidence gathering from multiple sources, reconciliation of conflicting information, and multi-step reasoning over extended periods. Initial evaluations on nine frontier models revealed that derivation and calibration failures, rather than retrieval issues, are the primary obstacles, with models exhibiting distinct error patterns and domain specialization. AI

IMPACT This benchmark aims to better assess and differentiate the complex reasoning and evidence synthesis capabilities of frontier AI models, pushing the development of more robust and reliable AI research agents.
- language models
- DeepWeb-Bench
TOOL · arXiv cs.LG · 1d

A Machine Learning Framework for Weighted Least Squares GNSS Positioning based on Activation Functions

Researchers have developed a new machine learning framework to improve the accuracy of Global Navigation Satellite Systems (GNSS) positioning, particularly in challenging urban environments. The system uses activation functions to transform machine learning predictions about signal quality into weights for a weighted least squares algorithm. Experiments in Hong Kong and Tokyo showed that sigmoid activation functions consistently provided the most significant improvements in positioning accuracy across various machine learning models and GNSS configurations. AI

IMPACT Improves location accuracy in challenging environments, potentially benefiting autonomous systems and location-based services.
TOOL · arXiv cs.AI · 1d

HITL-D: Human In The Loop Diffusion Assisted Shared Control

Researchers have developed HITL-D, a new shared control framework that combines human input with diffusion-based AI policies for robotic manipulation tasks. This system assists users by providing autonomous updates to the end effector's orientation, reducing the need for complex joystick controls and lowering mental workload. User studies showed that HITL-D significantly improved task completion times and user satisfaction compared to traditional teleoperation. AI

IMPACT This framework could lead to more intuitive and efficient human-robot collaboration in complex manipulation tasks.
TOOL · arXiv cs.AI · 1d

Mind the Sim-to-Real Gap & Think Like a Scientist

Researchers have developed a new policy called Fisher-SEP to help planners decide when to supplement simulators with real-world experiments. The policy decomposes the simulator's value error into identifiable calibration shifts and unresolvable parametric residuals. It also distinguishes between local and reachability components of the value gap between simulator-optimal and true optimal policies. Two case studies demonstrate Fisher-SEP's effectiveness in optimizing experimental strategies for supply chains and public health interventions. AI

IMPACT Provides a framework for improving the reliability of AI planning by integrating simulation with real-world data collection.
TOOL · arXiv cs.LG · 1d

Equilibrium Reasoners: Learning Attractors Enables Scalable Reasoning

Researchers have introduced Equilibrium Reasoners (EqR), a novel framework that enables scalable reasoning in iterative neural network models. EqR hypothesizes that generalizable reasoning emerges from learning task-conditioned attractors, which are dynamical systems that stabilize on valid solutions. This approach allows models to adaptively allocate computational resources based on task difficulty, significantly improving accuracy on complex problems like Sudoku-Extreme by scaling test-time compute. AI

IMPACT Introduces a new framework for scalable reasoning in iterative models, potentially improving performance on complex tasks by adaptively allocating compute.
TOOL · arXiv cs.CV · 1d

Uni-Edit: Intelligent Editing Is A General Task For Unified Model Tuning

Researchers have introduced Uni-Edit, a novel approach to tuning Unified Multimodal Models (UMMs) that enhances image understanding, generation, and editing simultaneously. Unlike traditional methods that use complex multi-task training, Uni-Edit employs a single editing task, a single training stage, and a single dataset. This is achieved by developing an automated data synthesis pipeline that transforms visual question-answering data into sophisticated editing instructions, creating the Uni-Edit-148k dataset. Experiments show that tuning solely on Uni-Edit leads to comprehensive improvements across all three capabilities without additional operations. AI

IMPACT Uni-Edit offers a more efficient method for enhancing multimodal AI capabilities, potentially streamlining model development.
- Unified Multimodal Models
- BAGEL
TOOL · arXiv cs.AI · 1d

Agent JIT Compilation for Latency-Optimizing Web Agent Planning and Scheduling

Researchers have developed agent just-in-time (JIT) compilation to optimize web agent planning and scheduling, significantly reducing latency and improving accuracy. This new approach compiles natural language task descriptions into executable code, allowing for LLM calls, tool usage, and parallelization. The system includes a JIT-Planner for generating and validating code plans, and a JIT-Scheduler for exploring parallelization strategies using Monte Carlo estimation. Tests across five web applications showed a 10.4x speedup and 28% accuracy increase over existing methods, with the scheduler providing an additional 2.4x speedup and 9% accuracy improvement. AI

IMPACT This new JIT compilation method for web agents promises faster and more accurate task automation, potentially improving user experience and efficiency in web-based AI applications.
TOOL · arXiv cs.LG · 1d

Mitigating Label Bias with Interpretable Rubric Embeddings

Researchers have developed a new method called interpretable rubric embeddings to address label bias in AI models trained on historical human evaluations. This approach replaces standard black-box embeddings with features derived from expert-defined criteria, aiming to prevent models from inheriting biases present in past decisions. Empirical evaluations on a dataset of master's program applications demonstrated that this method reduces group disparities while enhancing cohort quality, offering a practical solution for learning with biased labels. AI

IMPACT Offers a novel approach to mitigate bias in AI systems trained on historical data, potentially improving fairness in applications like hiring and admissions.
TOOL · arXiv cs.CL · 1d

Leveraging LLMs for Grammar Adaptation: A Study on Metamodel-Grammar Co-Evolution

Researchers have developed a new method using Large Language Models (LLMs) to automatically adapt grammars following metamodel evolution in model-driven engineering. This LLM-based approach learns adaptations from previous versions, outperforming traditional rule-based methods in consistency and output similarity on smaller datasets. While effective for complex grammar scenarios, the study found LLMs struggled with adaptation consistency on very large grammars, indicating limitations for large-scale applications. AI

IMPACT LLM-based grammar adaptation shows potential for automating complex software engineering tasks, though scalability remains a challenge.
TOOL · arXiv cs.CV · 1d

ProtoPathway: Biologically Structured Prototype-Pathway Fusion for Multimodal Cancer Survival Prediction

Researchers have developed ProtoPathway, a novel multimodal framework designed for predicting cancer survival. This framework integrates whole slide imaging and transcriptomics data by using biologically grounded representations. ProtoPathway employs learnable morphological prototypes for image analysis and a graph neural network for genomic data, enabling cross-modal attention to model the relationship between molecular programs and tissue morphology. The system offers enhanced biological interpretability and reduced computational cost, demonstrating competitive performance on TCGA cancer cohorts. AI

IMPACT Introduces a novel interpretable AI framework for integrating medical imaging and genomic data, potentially improving diagnostic accuracy and biological understanding in cancer research.
TOOL · arXiv cs.AI · 1d

Approximation Theory for Neural Networks: Old and New

A new survey paper delves into the mathematical underpinnings of neural network expressivity, focusing on approximation theory. It reviews classical density results for single-hidden-layer networks and explores quantitative bounds that link approximation error to network size and function smoothness. The paper also highlights depth-width trade-offs and introduces recent theoretical attention on Kolmogorov-Arnold Networks (KANs) as an alternative architectural paradigm. AI

IMPACT Provides a theoretical foundation for understanding neural network capabilities and explores novel architectures like KANs.
- neural networks
- Kolmogorov-Arnold Networks
TOOL · arXiv cs.AI · 1d

Lost in Fog: Sensor Perturbations Expose Reasoning Fragility in Driving VLAs

Researchers have developed a method to test the robustness of driving-focused Vision-Language-Action (VLA) models by applying sensor perturbations. Their study on the Alpamayo R1 model revealed that changes in Chain-of-Causation (CoC) explanations directly correlate with significant deviations in driving trajectories. The findings suggest that reasoning consistency can serve as a reliable indicator for planning safety in autonomous driving systems. AI

IMPACT Exposes critical reasoning vulnerabilities in driving AI, highlighting the need for robust monitoring to ensure safety in real-world deployment.
- Alpamayo R1
- Chain-of-Causation (CoC)