Brief

last 24h

[50/10534] 224 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · arXiv cs.LG English(EN) · 2d

SpAArSIST: Sparsified AASIST for Efficient and Reliable Anti-Spoofing

Researchers have developed SpAArSIST, an optimized version of the AASIST model for anti-spoofing in audio. This new configuration reduces computational requirements by over 20% and model size by 4%, while significantly improving out-of-domain robustness. The system also introduces a composite score to aid in selecting models for deployment based on accuracy, calibration, and compute efficiency. AI

IMPACT Optimizes audio anti-spoofing models, potentially leading to more efficient and reliable security systems.
- SpAArSIST
- AASIST
TOOL · arXiv cs.AI English(EN) · 2d

Diffusing to Coordinate: Efficient Online Multi-Agent Diffusion Policies

Researchers have introduced OMAD, a novel framework for online multi-agent reinforcement learning (MARL) that utilizes diffusion policies to enhance agent coordination. This approach addresses the challenge of intractable likelihoods in diffusion models, which typically hinder exploration in online MARL settings. OMAD employs a relaxed policy objective that maximizes scaled joint entropy and a joint distributional value function for decentralized policy optimization, leading to significant improvements in sample efficiency. AI

IMPACT Introduces a novel approach to multi-agent reinforcement learning, potentially improving coordination and sample efficiency in complex AI systems.
- Zhuoran Li
TOOL · arXiv cs.AI English(EN) · 2d

Afrispeech Semantics: Evaluating Audio Semantic Reasoning in Spoken Language Models Across Domains and Accents

Researchers have introduced Afrispeech Semantics, a new benchmark designed to evaluate the audio semantic reasoning capabilities of spoken language models. The benchmark focuses on five distinct tasks: entailment, consistency, plausibility, accent drift, and accent restraint. This evaluation aims to uncover critical limitations in current audio reasoning assessments and guide the development of more robust and equitable audio language models, particularly concerning accent variation and domain shifts. AI

IMPACT This benchmark could lead to more nuanced evaluations of audio language models, improving their ability to understand and reason about spoken language across diverse accents and contexts.
- arXiv
- Afrispeech Semantics
TOOL · arXiv cs.AI English(EN) · 2d

The Environmental Cost of LLMs in AIED: Reporting and Practices

A new paper from the AIED community highlights the significant environmental costs associated with large language models (LLMs). Researchers found that while many AIED projects utilize LLMs, very few report the computational resources consumed or discuss the environmental impacts as an ethical concern. To address this, the paper proposes an open-source method and software tools for systematically measuring and reporting the carbon footprint of LLMs in AIED systems, aiming to encourage more transparent reporting of these hidden costs. AI

IMPACT Promotes transparency in LLM development and usage, potentially influencing future research and deployment practices to consider environmental sustainability.
- arXiv
- LLMs
TOOL · arXiv cs.AI English(EN) · 2d

Physics-Distilled Neural Network enabled by Large Language Models for Manufacturing Process-Property Predictive Modeling

Researchers have developed a new knowledge distillation framework that uses Large Language Models (LLMs) to extract physics principles from scientific literature. This framework creates a 'teacher' model that imbues a 'student' model with predictive capabilities for manufacturing processes, even with limited data. The resulting student model is lightweight, capable of high-frequency inference for real-time deployment, and shows robustness even when the LLM-derived physics knowledge is imperfect. AI

IMPACT This framework could enable more accurate and efficient AI-driven predictive modeling in manufacturing, especially in data-scarce environments.
- Physics-Distilled Neural Network
- Large Language Models
TOOL · arXiv cs.AI English(EN) · 2d

From Explicit Elements to Implicit Intent: A Predefined Library for Auditable Behavioral Inference

Researchers have introduced SemantiClean, a novel framework designed to extract structured semantic signals from e-commerce session data. This system prioritizes auditability and reproducibility over marginal predictive gains, organizing behavioral elements into a four-layer architecture. The framework utilizes an LLM-Integrated Semantic Inference Engine to ensure deterministic and reproducible outputs, with a focus on transparency and defensible decision trails. AI

IMPACT Introduces a new approach to AI inference that prioritizes transparency and auditability in e-commerce applications.
- LLM-Integrated Semantic Inference Engine
- SemantiClean
TOOL · arXiv cs.CL English(EN) · 2d

LibriConvo: Simulating Conversations from Read Literature for ASR and Diarization

Researchers have developed LibriConvo, a new synthetic conversational speech corpus designed to improve automatic speech recognition (ASR) and speaker diarization systems. The corpus was created by adapting the Speaker-Aware Simulated Conversation framework, processing existing English CallHome data for conversational timing and using LibriTTS utterances grouped by book for semantic continuity. LibriConvo contains over 240 hours of audio featuring 830 speakers, and baseline results show that models like Sortformer and a fine-tuned Fast Conformer-CTC XLarge outperform existing systems on this benchmark. AI

IMPACT Provides a new benchmark for evaluating and improving multi-speaker speech processing systems.
TOOL · arXiv cs.CL English(EN) · 2d

Language Shapes Mental Health Evaluations in Large Language Models

A new study published on arXiv reveals that multilingual large language models exhibit biases in mental health evaluations based on prompt language. Researchers found that prompts in Chinese elicited higher stigma scores and more conservative depression severity judgments compared to equivalent prompts in English when using models like GPT-4o and Qwen3-32B. This suggests that LLMs do not apply consistent evaluative standards across languages in sensitive domains, potentially leading to under-estimation errors in mental health assessments. AI

IMPACT Highlights the need for careful evaluation of multilingual LLMs in sensitive applications like mental health to ensure consistent and unbiased performance across languages.
- Qwen3-32B
- GPT-4o
- arXiv
- Xiyang Hu
TOOL · arXiv cs.AI English(EN) · 2d

MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios

Researchers have introduced MobilityBench, a new benchmark designed to evaluate the performance of large language model (LLM) based route-planning agents in real-world mobility scenarios. The benchmark utilizes a large dataset of anonymized user queries from Amap, covering diverse routing needs across multiple cities. To ensure reproducibility, MobilityBench includes a deterministic API-replay sandbox and a multi-dimensional evaluation protocol that assesses outcome validity, instruction understanding, planning, tool use, and efficiency. Initial evaluations show current LLM agents are competent in basic information retrieval and route planning but struggle with preference-constrained planning, indicating a need for improvement in personalized mobility applications. AI

IMPACT Provides a standardized method to assess and improve LLM-based mobility agents, potentially leading to more personalized and efficient navigation tools.
TOOL · arXiv cs.AI English(EN) · 2d

MARIC: Multi-Agent Reasoning for Image Classification

Researchers have developed MARIC, a novel multi-agent framework for image classification that enhances performance by treating the task as a collaborative reasoning process. This system employs an Outliner Agent to grasp the image's theme and generate prompts, followed by three Aspect Agents that extract detailed descriptions from different visual perspectives. A final Reasoning Agent then synthesizes these insights with a reflection step to produce a unified classification, outperforming traditional methods and monolithic vision-language models on diverse benchmarks. AI

IMPACT Introduces a novel multi-agent approach that could improve the interpretability and robustness of AI systems in visual recognition tasks.
TOOL · arXiv cs.CL English(EN) · 2d

Scenario-based Probing and Steering Cultural Values in Large Language Models--Extended Version

Researchers have developed a new method to probe and influence the cultural values embedded within large language models. This approach uses scenario-based dilemmas, translating survey questions into behavioral choices to reveal implicit model preferences rather than relying on direct, often safety-aligned, responses. The study found that interventions to steer cultural values can lead to shifts along multiple dimensions simultaneously, similar to human behavior, and that this entanglement persists across different steering techniques without significantly degrading general task performance. AI

IMPACT This research offers a novel way to understand and potentially align LLM behavior with diverse cultural norms, crucial for global deployment.
TOOL · arXiv cs.CL English(EN) · 2d

Neuron-based Personality Trait Induction in Large Language Models

Researchers have developed a novel method to imbue large language models with specific personality traits without requiring model retraining. This approach involves identifying key neurons within the LLM that correlate with personality dimensions, based on the Big Five personality traits framework. By manipulating these identified neurons, the system can induce desired personality characteristics in the model's output, demonstrating comparable effectiveness to fine-tuned models but with greater efficiency and flexibility. AI

IMPACT Enables more nuanced and controllable AI interactions by allowing specific personality traits to be induced in LLMs without extensive retraining.
TOOL · arXiv cs.AI English(EN) · 2d

Grounding Computer Use Agents on Human Demonstrations

Researchers have introduced GroundCUA, a large-scale dataset designed to improve computer-use agents by accurately connecting natural language instructions to on-screen elements in desktop environments. The dataset comprises 56,000 screenshots with over 3.56 million human-verified annotations across 87 applications. Utilizing this dataset, the GroundNext models, at 3B and 7B parameter scales, achieved state-of-the-art performance on five benchmarks with significantly less training data than previous methods. AI

IMPACT Enhances AI agent capabilities for desktop environments, potentially leading to more sophisticated automation tools.
TOOL · arXiv cs.AI English(EN) · 2d

Neural FOXP2 -- Language Specific Neuron Steering for Targeted Language Improvement in LLMs

Researchers have developed a new technique called Neural FOXP2 to improve the performance of large language models in non-English languages. This method works by identifying and steering "language neurons" within the model, which are responsible for controlling language defaultness. The process involves localizing these neurons, defining steering directions, and then applying targeted activation shifts to make languages like Hindi or Spanish primary, thereby reducing English dominance. AI

IMPACT Enables more equitable performance across languages in LLMs, reducing English bias.
- LLMs
- Vinija Jain
TOOL · arXiv cs.AI English(EN) · 2d

HERO: Hindsight-Enhanced Reflection from Environment Observations for Agentic Self-Distillation

Researchers have introduced HERO, a novel framework for reinforcement learning agents designed to improve multi-turn decision-making. Unlike traditional methods that rely on terminal outcomes, HERO uses hindsight-enhanced self-distillation with next environment observations as localized feedback. This approach converts each observation into a compact turn-level diagnosis, providing actionable insights into the agent's actions. HERO has demonstrated improved task success and reduced unnecessary turns on benchmarks like TauBench and WebShop, particularly under limited training budgets where successful rollouts are infrequent. AI

IMPACT Enhances AI agent learning by providing more granular, context-aware feedback, potentially improving efficiency and success rates in complex tasks.
- TauBench
- HERO
- WebShop
TOOL · arXiv cs.CL English(EN) · 2d

LatticeBridge: Rare-Event Sequential Inference for Faithful Structured Sequence Synthesis

Researchers have developed LatticeBridge, a novel method for structured sequence generation that addresses the challenge of satisfying multiple input-derived constraints within a single output. This approach frames the problem as a rare-event sequential inference task, combining a prefix language model with instance-compiled surface automata and a specialized Monte Carlo decoder. LatticeBridge aims to improve the faithfulness of generated sequences by ensuring all required anchors are jointly realized, outperforming baseline methods on benchmarks like CommonGen and WikiBio. AI

IMPACT Enhances faithfulness in structured sequence generation, potentially improving applications requiring precise output constraints.
TOOL · arXiv cs.CL English(EN) · 2d

Fanar-Sadiq: A Multi-Agent Architecture for Grounded Islamic QA

Researchers have developed Fanar-Sadiq, a multi-agent system designed for accurate and grounded Islamic question answering. This bilingual Arabic-English platform addresses the limitations of standard LLMs in religious contexts by incorporating specialized modules for diverse query types. It supports retrieval-augmented generation for jurisprudential answers, exact scripture lookup, and precise calculations for zakat and inheritance, with a focus on verification and canonical grounding. AI

IMPACT This system could set a precedent for specialized, grounded AI applications in sensitive domains like religious scholarship.
- arXiv
- Fanar AI
- Firoj Alam
- Fanar-Sadiq
TOOL · arXiv cs.AI English(EN) · 2d

Automated Mediator for Human Negotiation: Pre-Mediation via a Structured LLM Pipeline

Researchers have developed an automated mediator for human negotiation using a structured pipeline of Large Language Model (LLM) modules. This system aims to support the pre-mediation phase, which is often skipped due to cost and time constraints. Experiments showed that the AI mediator achieved preparation outcomes comparable to human mediators in terms of trust and confidence, while also demonstrating superior accuracy in preference inference. AI

IMPACT This research suggests LLM pipelines can offer scalable, low-cost pre-mediation support, potentially improving negotiation outcomes.
- human negotiation
- LLM
TOOL · arXiv cs.AI English(EN) · 2d

BioDivergence: A Benchmark and Evaluation Framework for Hidden Contextual Contradictions in Biomedical Abstracts

Researchers have introduced BioDivergence, a new framework designed to evaluate how well AI models can distinguish between contextual contradictions and genuine disagreements in biomedical research abstracts. This framework moves beyond simple entailment or contradiction classifications to capture the nuanced reasons behind conflicting findings, such as differences in study populations or methodologies. BioDivergence includes a six-class conflict taxonomy and a 13-axis divergence ontology, along with a silver benchmark dataset of over 11,000 claim pairs to test model performance. AI

IMPACT Provides a more nuanced evaluation for AI models in scientific literature, potentially improving their ability to synthesize complex biomedical information.
- Mistral-7B-Instruct-v0.3
- BioDivergence
TOOL · arXiv cs.AI English(EN) · 2d

Skill-Augmented AI Agents for Medical Research Analysis: An Exploratory Multi-Model Human Evaluation in an NSCLC Transcriptomic Biomarker Task

A new research paper explores the use of AI agents with specialized medical research skills to improve the analysis of complex biological data. The study evaluated whether these skill-augmented agents produced higher quality outputs compared to native AI models in a task related to non-small cell lung cancer biomarkers. While the skill-augmented agents showed a directional improvement in quality, the effect was small and not statistically significant, suggesting a need for larger-scale evaluations with more robust controls. AI

IMPACT Suggests potential for AI agents to improve complex scientific analysis, but highlights need for more rigorous validation.
TOOL · arXiv cs.CV English(EN) · 2d

Adapting Vision-Language Models from Iconic to Inclusive for Multi-Label Recognition Without Labels

Researchers have developed a new unsupervised framework to adapt vision-language models (VLMs) for more comprehensive multi-label image recognition. The method addresses the tendency of VLMs to focus on a single iconic object, thereby missing other relevant labels in an image. By employing "cutting" and "sewing" stages, the framework enhances the model's ability to identify multiple objects and adjust label distributions without requiring manual annotations. Experiments show this approach significantly outperforms existing unsupervised methods and even some weakly supervised baselines. AI

IMPACT Enables more comprehensive image understanding without manual labeling, potentially improving applications in image search and content moderation.
TOOL · arXiv cs.LG English(EN) · 2d

Visualizing LLM Latent Space Geometry Through Dimensionality Reduction

Researchers have developed new methods to visualize the internal geometric structures of large language models (LLMs) by employing dimensionality reduction techniques like PCA and UMAP. Their analysis of GPT-2 and LLaMa models revealed distinct patterns, including a separation between attention and MLP component outputs in intermediate layers. The study also characterized high-norm latent states at initial sequence positions and visualized the evolution of these states across layers, uncovering a helical structure in GPT-2's positional embeddings. AI

IMPACT Provides new tools for understanding LLM behavior, potentially guiding future model development and interpretability efforts.
- Alex Ning
- LLMs
- PCA
- UMAP
- GPT-2
- LLaMa
TOOL · arXiv cs.CV English(EN) · 2d

VL-DINO: Leveraging CLIP Vision-Language Knowledge for Open-Vocabulary Object Detectio

Researchers have developed VL-DINO, a new object detection model that effectively integrates knowledge from CLIP, a vision-language model. The model uses novel modules to construct better training samples and fuse visual and textual information. In zero-shot tests on the LVIS benchmark, VL-DINO achieved state-of-the-art results, outperforming previous methods. AI

IMPACT Sets new SOTA on zero-shot object detection benchmarks, potentially improving image analysis capabilities.
- VL-DINO
- LVIS benchmark
TOOL · arXiv cs.LG English(EN) · 2d

MobileFineTuner: A Mobile-Native Framework for On-Device LLM Fine-Tuning in Real-World Embedded AI Applications

Researchers have developed MobileFineTuner, an open-source framework enabling large language models to be fine-tuned directly on mobile phones. This C++ based system integrates resource-aware runtime features like memory-efficient attention and gradient accumulation to overcome the limitations of commodity mobile devices. Evaluations using models such as GPT-2 and Gemma 3 demonstrate its effectiveness in reducing memory pressure and improving executability, paving the way for personalized on-device AI applications. AI

IMPACT Enables personalized AI experiences by allowing LLMs to adapt to user-specific data directly on mobile devices without cloud reliance.
- LLM
- Gemma 3
- GPT-2
- MobileFineTuner
- Jiaxiang Geng
- Qwen2.5
TOOL · arXiv cs.AI English(EN) · 2d

An XAI View on Explainable ASP: Methods, Systems, and Perspectives

A new survey paper examines Explainable AI (XAI) methods within Answer Set Programming (ASP), a symbolic AI approach. The paper categorizes different types of ASP explanations and maps them to user queries, evaluating the coverage provided by existing theories and tools. It also identifies current limitations and suggests future research directions in this area. AI

IMPACT Provides a structured overview of explainability techniques in symbolic AI, potentially guiding future research and development in interpretable AI systems.
TOOL · arXiv cs.CV English(EN) · 2d

Cross-Modal Benchmarking for Robotic Perception in Natural Environments

Researchers have introduced WildCross, a new benchmark designed to evaluate robotic perception systems in natural environments. The benchmark includes over 476,000 RGB frames with depth and surface normal annotations, along with pose and lidar data. This work expands on previous results, focusing on metric depth estimation to highlight the limitations of current vision models trained primarily on urban data. AI

IMPACT Highlights limitations in current AI models for real-world robotic applications, potentially driving development of more robust perception systems.
TOOL · arXiv cs.CV English(EN) · 2d

3D-CBM: A Framework for Concept-Based Interpretability in Generative 3D Modeling

Researchers have developed a framework called 3D-CBM to enhance interpretability in 3D generative models by integrating Concept Bottleneck Models. This approach aims to bridge the semantic gap in deep geometric learning by aligning latent representations with human-defined concepts. The framework has demonstrated effectiveness in a proof-of-concept experiment, achieving high accuracy in concept prediction and enabling precise interventions for error correction in 3D models. AI

IMPACT Introduces a method to make 3D generative models more understandable and controllable, potentially improving their use in sensitive applications.
TOOL · arXiv cs.LG English(EN) · 2d

Scaling Laws of Global Weather Models

A new research paper explores the scaling laws of data-driven global weather models, analyzing how performance relates to model size, dataset size, and compute budget. The study found that weather models favor wider architectures over deeper ones and that increasing training data yields greater performance gains than increasing model size under fixed compute budgets. Specifically, the Aurora model showed strong data-scaling behavior, with a 10x increase in training data leading to a 3.2x reduction in validation loss. AI

IMPACT Provides insights into optimizing AI model development for weather forecasting, suggesting wider architectures and larger datasets are key.
TOOL · arXiv cs.AI English(EN) · 2d

A New Perspective on Precision and Recall for Generative Models

Researchers have introduced a novel framework for estimating precision and recall curves in generative models, moving beyond single scalar metrics. This approach frames the estimation as a binary classification problem, offering a more detailed analysis of model performance. The framework also provides a minimax upper bound on estimation risk and unifies several existing precision-recall metrics. AI

IMPACT Provides a more nuanced evaluation method for generative models, potentially leading to better model development and comparison.
- Benjamin Sykes
TOOL · arXiv cs.AI English(EN) · 2d

T2MM: An LLM Supported Architecture For Inquiry-Based Modeling

Researchers have developed T2MM, a novel architecture that integrates Large Language Models with multimodal capabilities to assist in science learning and model construction. Unlike static image generation, T2MM creates interactive models that respond to user adjustments within the Virtual Experimental Research Assistant (VERA) software. Technical feasibility was demonstrated using a custom dataset, where T2MM outperformed a baseline code-generation approach across all metrics. AI

IMPACT Enables more dynamic and responsive educational tools by integrating LLMs into interactive modeling environments.
TOOL · arXiv cs.CV English(EN) · 2d

From Simulation to Real-World: An In-Field 6D Pose Dataset and Baseline for Robotic Strawberry Harvesting

Researchers have developed a new dataset and baseline for 6D pose estimation specifically for robotic strawberry harvesting. This dataset, comprising 12,040 real-world images collected from agricultural fields, addresses the limitations of purely synthetic data used in previous studies. Experiments revealed a persistent sim-to-real gap, highlighting the critical need for real-world data to accurately evaluate and deploy such robotic systems. AI

IMPACT Provides crucial real-world data for advancing robotic perception in agriculture, potentially improving efficiency and yield.
- NVIDIA Isaac Sim
TOOL · arXiv cs.CV English(EN) · 2d

What Semantics Survive the Connector? Diagnosing VLM-to-DiT Alignment in Video Editing

Researchers have identified a significant semantic bottleneck in video editing models that rely on Vision-Language Models (VLMs) to interpret instructions. Their study, using a newly created diagnostic dataset called TRACE-Edit, reveals that fine-grained structural information can be lost during the alignment process between the VLM and the Diffusion Transformer (DiT) models. This finding challenges the assumption of lossless semantic transfer and highlights the VLM-to-DiT alignment as a critical area for improvement in future multi-modal architectures. AI

IMPACT Identifies a critical alignment bottleneck in VLM-based video editing, potentially guiding future research towards more semantically faithful generative models.
TOOL · arXiv cs.LG English(EN) · 2d

Why Depth Matters in Parallelizable Sequence Models: A Lie Algebraic View

Researchers have developed a Lie-algebraic framework to analyze the expressivity and error bounds of parallelizable sequence models like Transformers. Their theory establishes a direct link between a model's depth and its expressivity, showing that increasing depth exponentially reduces approximation error. This theoretical insight was validated through experiments on symbolic and continuous-valued state-tracking tasks, confirming the empirical performance of deep sequence models. AI

IMPACT Provides a theoretical foundation for understanding and improving the performance of deep sequence models.
- Transformer
- Gyuryang Heo
TOOL · arXiv cs.CV English(EN) · 2d

Lighting-aware Unified Model for Instance Segmentation

Researchers have developed a new adapter module called Lighting Convolutional-Attention (LCA) to improve the robustness of foundation models like SAM for instance segmentation under varied lighting conditions. LCA processes RGB features alongside contrast maps to distinguish structural changes from illumination artifacts, enhancing segmentation accuracy without needing to fine-tune the entire model. The module is trained using a pairwise strategy with a specific loss term to penalize discrepancies between clean and illuminated images, and its effectiveness is validated on existing benchmarks and a new synthetic dataset designed for complex lighting. AI

IMPACT Enhances robustness of foundation models for instance segmentation, potentially improving real-world AI applications in computer vision.
TOOL · arXiv cs.AI English(EN) · 2d

The Dynamics of Human and AI-Generated Language: How Semantics Fluctuates across Different Timescales

Researchers have developed a new method to analyze the temporal dynamics of semantic content in both human and AI-generated language. This pipeline uses WordNet depth and SBERT embeddings to create semantic time-series, which are then analyzed using autocorrelation-window measures. The study found that longer autocorrelation windows in semantic time-series correlate with more generic vocabulary, while shorter windows are associated with specific words, indicating a non-trivial temporal organization in language. AI

IMPACT Provides a new quantitative method for comparing the temporal structure of AI-generated language to human speech.
- WordNet
- SBERT
TOOL · arXiv cs.LG English(EN) · 2d

Open Materials Generation with Inference-Time Reinforcement Learning

Researchers have developed a new reinforcement learning framework called OMatG-IRL for generating crystalline materials. This method allows for the incorporation of target properties into the generative process without needing to compute the score, a limitation of previous approaches. OMatG-IRL operates directly on learned velocity fields, enabling efficient exploration and policy-gradient estimation at inference time. The framework has demonstrated competitive performance in crystal structure prediction, achieving significant improvements in sampling efficiency and generation time. AI

IMPACT Introduces a novel RL approach for materials design, potentially accelerating discovery and improving efficiency in crystal structure prediction.
TOOL · arXiv cs.AI English(EN) · 2d

MoCA-Agent: A Market-of-Claims Code Agent for Financial and Numerical Reasoning

Researchers have developed MoCA-Agent, a novel code agent designed for robust financial and numerical reasoning. This system breaks down questions into atomic claims, uses specialist agents to trade these claims, and synthesizes an executable Python program from verified evidence. MoCA-Agent demonstrates strong performance on various benchmarks, including financial, tabular, and multimodal chart reasoning, by aggregating evidence at the claim level for improved accuracy. AI

IMPACT Enhances AI's ability to perform accurate financial and numerical reasoning by verifying claims at an atomic level.
TOOL · arXiv cs.AI English(EN) · 2d

NightFeats @ MMU-RAGent NeurIPS 2025: A Context-Optimized Multi-Agent RAG System for the Text-to-Text Track

A research paper details NightFeats, a multi-agent retrieval-augmented generation (RAG) system that won Best Dynamic Evaluation in the text-to-text track at the MMU-RAGent competition for NeurIPS 2025. The system employs a three-phase pipeline for knowledge synthesis: retrieval, curation, and composition, utilizing temporal-semantic reranking and contradiction reconciliation. Evaluations indicated NightFeats outperformed proprietary systems like Claude-SonnetV2 and Nova-Pro, suggesting that architectural transparency and verifiable evidence grounding are more aligned with human preferences than systems focused solely on automatic metrics. AI

IMPACT Demonstrates that transparent, evidence-grounded RAG systems can outperform proprietary models in human evaluations.
TOOL · arXiv cs.LG English(EN) · 2d

Understanding Sample Efficiency in Predictive Coding

Researchers have developed a new metric called "target alignment" to theoretically understand why predictive coding (PC) is more sample-efficient than backpropagation (BP) in neural networks. Their analysis, particularly in deep linear networks, shows that PC learning is more efficient, especially in deep, narrow, and pre-trained models. The study provides analytical expressions and experimental validation, offering insights into optimizing PC for effective learning. AI

IMPACT Provides theoretical understanding for optimizing sample efficiency in neural network training.
TOOL · arXiv cs.LG English(EN) · 2d

Momentum LMS Theory beyond Stationarity: Stability, Tracking, and Regret

Researchers have developed a new theoretical framework for the Momentum Least Mean Squares (MLMS) algorithm, designed to handle nonstationary data streams common in large-scale processing. The paper derives tracking performance and regret bounds for MLMS in time-varying stochastic linear systems, addressing the complexities introduced by momentum in stability analysis. Experimental results on both synthetic and real-world data confirm MLMS's ability to adapt rapidly and track effectively in nonstationary environments, suggesting its utility for modern online learning applications. AI

IMPACT Provides theoretical grounding for adaptive algorithms crucial in real-time data processing for AI systems.
- Yifei Jin
- Momentum Least Mean Squares (MLMS)
TOOL · arXiv cs.AI English(EN) · 2d

Making Models Unmergeable via Scaling-Sensitive Loss Landscape

Researchers have developed Trap$^2$, a new framework designed to prevent unauthorized model merging in AI. This architecture-agnostic system encodes protection directly into fine-tuned weights, degrading them when they are recomposed into unauthorized mixtures. Trap$^2$ aims to address a governance gap created by model hubs, ensuring that released weights remain effective for standalone use while undermining attempts to bypass safety alignments or licensing terms through merging. AI

IMPACT Provides a technical solution to prevent misuse of released AI models through unauthorized merging.
- Trap$^2$
- Minwoo Jang
TOOL · arXiv cs.AI English(EN) · 2d

MLaGA: Multimodal Large Language and Graph Assistant

Researchers have developed MLaGA, a novel model designed to enhance Large Language Models' (LLMs) ability to process and reason over multimodal graphs. This system addresses the challenge of graphs containing diverse attribute types, such as text and images, which have been underexplored by existing LLM-based graph methods. MLaGA employs a structure-aware multimodal encoder and a multimodal instruction-tuning approach to integrate these varied attributes and graph structures into LLMs. AI

IMPACT Enables LLMs to analyze complex graphs with mixed text and image data, potentially improving applications in areas like knowledge discovery and recommendation systems.
TOOL · arXiv cs.CV English(EN) · 2d

Understanding Cross-Sensor Feature Variations for Generalizable 3D Perception

Researchers have developed a new framework to improve the robustness of 3D perception systems that fuse data from radar and cameras. The method addresses performance degradation caused by variations in driving scenes, sensor setups, and environmental conditions. By modeling these variations in the frequency domain and synthesizing diverse views, the framework regularizes the detector to maintain stable fused representations during training, without requiring target-domain samples for inference. AI

IMPACT Enhances the reliability of autonomous driving perception systems by improving cross-dataset generalization.
- View-of-Delft
- TJ4DRadSet
TOOL · arXiv cs.LG English(EN) · 2d

MPK: A Compiler and Runtime for Mega-Kernelizing Tensor Programs

Researchers have developed MPK, a novel compiler and runtime system designed to optimize multi-GPU model inference by transforming operations into a single, high-performance mega-kernel. This system utilizes an SM-level graph representation to enable advanced optimizations like cross-operator software pipelining and fine-grained overlap of computation and communication. Evaluations demonstrate that MPK significantly reduces end-to-end inference latency, achieving up to 1.7x improvement and pushing LLM inference performance closer to hardware limits. AI

IMPACT Optimizes LLM inference performance, potentially reducing latency and improving hardware utilization for AI operators.
- Zhihao Jia
TOOL · arXiv cs.AI English(EN) · 2d

Every Act Has Its Price: Compressed Moral Composition in Frontier LLMs

Researchers have developed a new benchmark called the Moral Trolley Arena to evaluate how large language models compose moral judgments. This benchmark assesses models' ability to combine multiple moral signals within a single scenario, moving beyond simple preference rankings of isolated acts. Across ten frontier models, the study found that composite moral judgments are largely predictable by the strength of individual acts but are consistently compressed rather than simply additive, indicating complex moral reasoning processes in LLMs. AI

IMPACT This research highlights the need for more sophisticated methods to audit LLM moral reasoning, potentially influencing future safety evaluations and model development.
TOOL · arXiv cs.AI English(EN) · 2d

MPC-Patch-Bench: Security-Aware LLM Code Patch for Multi-Party Computation

Researchers have developed MPC-Patch-Bench, a new benchmark designed to evaluate the code repair capabilities of Large Language Models (LLMs) specifically for Secure Multi-Party Computation (MPC) software. Existing general-purpose benchmarks are insufficient for MPC due to its unique cryptographic logic, lack of standardized tests, and the critical need for cryptographic safety. MPC-Patch-Bench includes a data curation framework and a specialized MPC Verifier to ensure both functional correctness and security, addressing the limitations of current evaluation methods. AI

IMPACT Establishes a specialized benchmark for evaluating LLM code repair in the critical domain of secure multi-party computation.
TOOL · arXiv cs.LG English(EN) · 2d

Evaluating and Combating the Impact of Concept Drift on the Performance of Machine Learning-Based Phishing Detection Systems

A new research paper explores how concept drift affects machine learning models used for detecting phishing emails. The study aims to evaluate the performance degradation of these systems as phishing tactics evolve and to propose mitigation strategies. The paper highlights the increasing sophistication of phishing attacks and the critical role of email spam filters in protecting users. AI

IMPACT Addresses the challenge of maintaining effective AI-based security systems against evolving threats.
TOOL · arXiv cs.AI English(EN) · 2d

Planning under Distribution Shifts with Causal POMDPs

Researchers have introduced a new theoretical framework for planning in environments that experience distribution shifts. This approach utilizes Causal Partially Observable Markov Decision Processes (POMDPs) to model and adapt to changes in state or environment dynamics. By treating shifts as interventions on the causal POMDP, the system can evaluate plans under hypothetical changes and identify which environmental components have been altered, maintaining planning tractability. AI

IMPACT Provides a theoretical foundation for more robust AI planning agents capable of adapting to changing environments.
- Matteo Ceriscioli
- Causal POMDPs
TOOL · arXiv cs.AI English(EN) · 2d

Blind Dexterous Grasping via Real2Sim2Real Tactile Policy Learning

Researchers have developed a novel framework for tactile-only blind grasping using a dexterous robotic hand. Their approach utilizes a Real2Sim tactile calibration pipeline to create a digital-twin simulator that accurately reproduces real-world tactile signals. This is combined with a layout-aware tactile encoder that incorporates sensor-geometry priors and a Diffusion Policy trained on object-specific reinforcement learning experts in the simulator. The deployed policy achieved a 27% success rate on a physical robotic hand across 20 objects, without visual input. AI

IMPACT This research advances robotic manipulation capabilities, potentially enabling more sophisticated automation in unstructured environments.
- LEAP Hand
- Diffusion Policy
TOOL · arXiv cs.AI English(EN) · 2d

The Structural Attention Tax: How Retrieval Format Hijacks In-Context Learning Independent of Content

A new research paper introduces the concept of a "structural attention tax" in retrieval-augmented generation (RAG) systems. The study found that the format of retrieved information, particularly knowledge graph triples, can disproportionately capture the model's attention compared to semantically equivalent natural language text. This phenomenon can reduce the effectiveness of in-context learning by up to 42%, regardless of the content's relevance. The research proposes a framework to decouple semantic and structural components of attention, suggesting strategies to mitigate this tax by optimizing retrieval quality and reducing format-driven attention capture. AI

IMPACT Identifies a format-based bias in RAG systems that can degrade performance, suggesting new avenues for optimizing retrieval and model training.