Brief

last 24h

[50/9096] 224 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

RESEARCH · arXiv cs.LG English(EN) · 5d · [3 sources]

Scaling Neural Network Verification with Tensor Parallelism and Fully Sharded Data Parallelism

Researchers have adapted tensor parallelism and fully sharded data parallelism techniques, typically used for training large models, to improve the scalability of neural network verification. These methods address the GPU memory limitations that have previously constrained formal verification algorithms. The study demonstrates significant memory reductions, with FSDP achieving up to 90% baseline memory drops while maintaining bitwise identical bounds to single-GPU systems. AI

IMPACT Enables verification of larger and more complex neural networks, crucial for safety-critical AI applications.
RESEARCH · Mastodon — sigmoid.social English(EN) · 3d · [3 sources]

The Agentic AI book: From Language Models to Multi-Agent Systems by Dr. Ryan Rad is the featured book 📖 on Leanpub! It's never been easier to build an AI agent—

Dr. Ryan Rad's new book, "The Agentic AI book: From Language Models to Multi-Agent Systems," has been launched on Leanpub. The book aims to guide readers from foundational language models to building functional multi-agent systems. It covers the complexities of creating AI agents that perform effectively. AI

IMPACT Provides guidance on building functional AI agents from foundational models.
RESEARCH · arXiv cs.CV English(EN) · 4d · [2 sources]

IMPACT: Learning Internal-Model Predictive Control for Forceful Robotic Manipulation

Researchers have developed IMPACT, a new framework for robotic manipulation that improves performance in tasks requiring forceful interactions. This system decouples task planning from internal-model predictive control, allowing robots to better handle objects of varying weights and perform contact-rich tasks. Experiments show IMPACT achieves higher success rates, better generalization, and improved safety and energy efficiency compared to previous methods. AI

IMPACT Enhances robotic capabilities in real-world manipulation tasks, potentially leading to more versatile and efficient automation.
- IMPACT
- Robotic Manipulation
RESEARCH · arXiv cs.LG English(EN) · 4d · [2 sources]

Correcting Variable Importance Scored by Random Forests

Researchers have developed a new method to correct variable importance scores generated by Random Forests. The current method often masks the importance of correlated variables. The proposed approach groups variables based on their conditional correlations with the response variable, leading to more accurate importance assessments. Experiments demonstrate that this correction method yields sensible results for variable importance. AI

IMPACT Improves interpretability of machine learning models by refining variable importance metrics.
- arXiv
- Random Forests
RESEARCH · arXiv cs.CV English(EN) · 4d · [2 sources]

DD-INR: Dynamics-Driven Implicit Neural Representation for Accelerated Whole-Brain Functional MRI Reconstruction

Researchers have developed DD-INR, a novel framework for reconstructing functional MRI (fMRI) data that has been acquired with accelerated sampling. This method specifically addresses the challenge of recovering subtle task-evoked brain activity signals, which are often missed by traditional reconstruction techniques that prioritize spatial accuracy over temporal fidelity. By separating static background information from dynamic changes and using an Implicit Neural Representation (INR) for the latter, DD-INR focuses computational resources on relevant activations, potentially enhancing the sensitivity and robustness of fMRI studies. AI

IMPACT This new framework could improve the sensitivity and robustness of fMRI studies by enabling more accurate reconstruction of brain activity from accelerated scans.
- DD-INR
- Qiaoxin LI
- fMRI
RESEARCH · Hugging Face Daily Papers English(EN) · 4d · [2 sources]

Content-Induced Spatial-Spectral Aggregation Network for Change Detection in Remote Sensing Images

Researchers have developed a new network called CSI-Net for change detection in remote sensing images. This network effectively integrates spatial and spectral information to improve accuracy. CSI-Net addresses the challenge of distinguishing actual changes from variations in unchanged areas by employing a spatial reasoning module, a spectral difference module, and a content-guided integration module. Experiments on multiple datasets show that CSI-Net outperforms existing state-of-the-art methods. AI

IMPACT Introduces a novel network architecture that enhances change detection accuracy in remote sensing, potentially improving applications in environmental monitoring and urban planning.
- CSI-Net
- LEVIR-CD
- WHU-CD
RESEARCH · Hugging Face Daily Papers English(EN) · 4d · [4 sources]

Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models

A new approach called Dexterity-BEV is being introduced to address the data challenges in embodied intelligence by adapting the Bird's-Eye View (BEV) methodology from autonomous driving. This method aims to unify heterogeneous robot data, including visual inputs, sensor readings, and action commands, into a common spatial reference frame. This unified representation is intended to enable more scalable and transferable training for robots, moving beyond simple data aggregation to establishing a foundational data infrastructure for embodied AI. AI

IMPACT New frameworks like Dexterity-BEV and Embodied-R1.5 aim to standardize robot data and improve generalization, potentially accelerating the development of more capable and adaptable embodied AI systems.
RESEARCH · Hugging Face Daily Papers English(EN) · 4d · [3 sources]

Workflow-GYM: Towards Long-Horizon Evaluation of Computer-use Agentic tasks in Real-World Professional Fields

A new benchmark called Workflow-GYM has been introduced to evaluate AI agents on complex, long-horizon tasks within professional software environments. Current AI agents demonstrate significant limitations in handling these real-world workflows, with even the most advanced models achieving success rates just above 30%. The research highlights issues such as inconsistent workflow execution, error propagation, and a lack of understanding of specialized professional software, indicating a need for substantial advancements in agent capabilities. AI

IMPACT Highlights significant limitations in current AI agents for professional tasks, guiding future research in agentic AI.
- AI agents
- Workflow-GYM
RESEARCH · arXiv cs.LG English(EN) · 5d · [2 sources]

AutoMegaKernel: A Statically-Checked Agent Harness for Self-Retargeting Megakernel Synthesis

Researchers have developed AutoMegaKernel (AMK), a system that compiles HuggingFace Llama-family models into a single, persistent CUDA kernel for efficient forward passes. AMK's static validator ensures schedule safety, preventing deadlocks and race conditions. The system supports multiple NVIDIA GPU architectures from a single codebase and has demonstrated self-improvement capabilities. AI

IMPACT This system could improve inference efficiency by consolidating model execution into single CUDA kernels.
- SmolLM2-135M
- AutoMegaKernel
- Llama
- CUDA
- NVIDIA
- TinyLlama-1.1B
- L40S
- RTX 5090
- cuBLAS
- A100
- A10G
- L4
RESEARCH · arXiv cs.AI English(EN) · 5d · [2 sources]

(Auto)formalization is supposed to be easy: Trellis process semantics for spelling out rigorous proofs

Researchers have developed Trellis, an autoformalization system designed to assist in creating rigorous mathematical proofs. The system utilizes LLM agents within a structured workflow to refine natural language proofs incrementally. Trellis aims for reliable formalization with generalist agents by enforcing a process semantics inspired by the notion of mathematical rigor. AI

IMPACT Introduces a novel method for leveraging LLMs in formal mathematical reasoning, potentially accelerating theorem proving and verification.
- Trellis
- LLM agents
- Lean
RESEARCH · arXiv cs.AI English(EN) · 5d · [2 sources]

Frequency-based Constrained Sampling for Interval Patterns

Researchers have developed a new sampling approach called CFips for exploring large pattern spaces, specifically focusing on interval patterns with user-defined constraints. This method integrates constraints directly into the sampling procedure, decomposing them into elementary predicates on interval bounds to ensure exact sampling guarantees. Experimental results indicate that CFips can successfully complete mining tasks that might otherwise fail due to time constraints. AI

IMPACT Introduces a novel constrained sampling technique for pattern mining, potentially improving efficiency in AI-driven data analysis tasks.
- CFips
RESEARCH · arXiv cs.AI English(EN) · 5d · [2 sources]

From 0-to-1 to 1-to-N: Reproducible Engineering Evidence for MetaAI Recursive Self-Design

A new research paper introduces the concept of "MetaAI Recursive Self-Design," defining it as an AI-assisted development pattern where the AI itself modifies its building and improvement mechanisms. The paper proposes a framework to evaluate such systems and highlights the Darwin Goedel Machine (DGM) as a prime example, showing significant performance gains on coding benchmarks after 80 iterations. To facilitate further research, the authors also release MetaAI-Mini, a reproducible protocol and codebase based on HumanEval. AI

IMPACT Introduces a framework for AI self-improvement, potentially accelerating development cycles and pushing the boundaries of AI capabilities.
RESEARCH · arXiv cs.CL English(EN) · 5d · [2 sources]

Beyond Accuracy: Community Perspectives on Machine Translation

A new paper analyzes social media discussions about machine translation (MT) to bridge the gap between AI development and user needs. Researchers examined over 79,000 posts from 2019 to 2025 across platforms like Reddit and Facebook, focusing on the perspectives of AI developers, professional translators, language learners, and language service providers. The study found significant disagreements and polarized sentiments among these communities regarding translation quality, efficiency, and reliability, highlighting differing priorities between technical benchmarks and real-world user concerns. AI

IMPACT Highlights a disconnect between AI development and user needs, suggesting a need for research to prioritize real-world concerns over technical benchmarks.
RESEARCH · arXiv cs.LG English(EN) · 5d · [2 sources]

A Unifying Framework for Concept-Based Representational Similarity

Researchers have introduced a new framework to unify and clarify concept-based representational similarity in machine learning models. The framework decomposes alignment into representation vs. concept and instance-wise vs. distributional levels, identifying four key properties. They also developed an intervention-based benchmark called \InterVenchA to measure these properties and proposed the Coupled Sparse Autoencoder (CoSAE) method, which demonstrates that strong alignment emerges when multiple objectives are jointly enforced, even with minimal paired data. AI

IMPACT Clarifies concept alignment in ML, potentially leading to more robust and interpretable models.
RESEARCH · arXiv cs.AI English(EN) · 5d · [2 sources]

Do Video Foundation Models Understand Intuitive Physics? A Layerwise Probing Analysis

A new research paper investigates whether video foundation models possess an understanding of intuitive physics. The study probes frozen representations of models like V-JEPA, VideoMAE, and LTX-Video using benchmarks such as IntPhys2 and Minimal Video Pairs. Results indicate that V-JEPA performs best, particularly with temporal dynamics probes, while VideoMAE is competitive, and LTX-Video shows weaker but present signals. The research also found that physics knowledge is more accessible in intermediate to late layers of these models. AI

IMPACT Reveals emergent physics understanding in video models, potentially improving their real-world interaction capabilities.
RESEARCH · arXiv cs.CL English(EN) · 5d · [2 sources]

Where Does the Answer Come From? Benchmarking View-Level Visual Evidence Identification in Multi-View MLLMs for Autonomous Driving

Researchers have developed a new benchmark to evaluate how well multimodal large language models (MLLMs) identify the correct visual evidence for their answers, particularly in autonomous driving scenarios. The benchmark uses synchronized multi-view driving data from NuScenes, presenting models with questions and requiring them to pinpoint the supporting camera view before answering. This approach aims to expose grounding failures that traditional answer-only evaluations might miss, by explicitly separating evidence identification from response accuracy. AI

IMPACT This benchmark will help developers create more reliable AI systems for autonomous driving by ensuring models ground their decisions in correct visual data.
RESEARCH · arXiv cs.CV English(EN) · 5d · [2 sources]

CineDance: Towards Next-Generation Multi-Shot Long-Form Cinematic Audio-Video Generation

Researchers have introduced CineDance-1M, a large-scale dataset for open-source text-to-audio-video generation, aiming to improve cinematic narrative capabilities. The dataset features long-form videos with an average of 92.8 seconds and 24.2 shots, supported by structured audio-video annotations derived from a three-stage curation process. To evaluate performance, they also propose CineBench, a new metric system for complex audio-video narratives, and demonstrate an adapted LTX-2.3 model that shows strong alignment and consistency. AI

IMPACT Provides a foundational dataset and evaluation tools to accelerate open-source research in long-form cinematic audio-video generation.
RESEARCH · arXiv cs.LG English(EN) · 5d · [2 sources]

Data-driven discovery of governing differential equations across physical systems

A new review paper proposes a problem-oriented perspective on data-driven differential equation discovery, a field that uses AI to infer governing laws from data. The paper introduces a phase diagram to organize discovery problems by complexity and a Representation-Evaluation-Optimization (REO) framework to abstract the discovery process. This approach aims to shift focus from individual algorithms to fundamental principles of discoverability, with applications across various scientific domains. AI

IMPACT Provides a structured framework for advancing AI-driven scientific discovery in differential equations.
RESEARCH · arXiv cs.CV English(EN) · 5d · [2 sources]

DexPIE: Stable Dexterous Policy Improvement from Real-World Experience

Researchers have developed DexPIE, a new framework designed to improve the performance of dexterous manipulation policies trained through imitation learning. This post-training system leverages real-world deployment experience to overcome the limitations of relying solely on expert demonstrations. DexPIE incorporates an intervention system for better exploration and a DAgger-style data collection method, alongside asynchronous inference and an optimality indicator to refine policy learning. In tests across three complex tasks, DexPIE demonstrated a 37% increase in success rate compared to baseline methods. AI

IMPACT Enhances AI's ability to perform complex physical tasks, potentially accelerating robotics adoption in manufacturing and logistics.
- DexPIE
- arXiv
RESEARCH · arXiv cs.AI English(EN) · 5d · [2 sources]

Shape Formation for the Cooperative Transportation of Arbitrary Objects Using Multi-Agent Reinforcement Learning

Researchers have developed a new multi-agent reinforcement learning approach for cooperative object transportation. This method allows multiple robots to autonomously position themselves to support objects of arbitrary shape and mass distribution. The system is designed to handle formation control, navigation, and collision avoidance, demonstrating reliable performance in cluttered environments and with complex object geometries. AI

IMPACT Enables more adaptable robotic systems for complex logistics and industrial tasks.
- Tanja Katharina Kaiser
- Multi-Agent Reinforcement Learning
RESEARCH · arXiv cs.AI English(EN) · 5d · [2 sources]

Closure-Validated Circuit Discovery in Attention Heads: Co-activation Proposes, Ablation Disposes

Researchers have developed a new method for discovering circuits within large language models by clustering attention head co-activation statistics. This approach, termed "closure-validated circuit discovery," uses causal ablation to confirm whether these identified groups of components actually function as circuits. The method was tested on models like Pythia 1B and OLMo 1B, demonstrating its effectiveness in identifying statistically significant circuits, while also showing limitations in Mixture-of-Experts models. AI

IMPACT This research offers a more rigorous method for understanding internal LLM mechanisms, potentially improving safety and reliability.
RESEARCH · arXiv cs.AI English(EN) · 5d · [2 sources]

Next-Token Prediction Learns Generalisable Representations of Sleep Physiology

Researchers have developed Hypnos, a new foundation model for sleep physiology that utilizes next-token prediction for representation learning. Trained on eight different sensing modalities from over 20,000 polysomnography recordings, Hypnos tokenizes physiological signals and uses an auto-regressive RQ-Transformer to predict future data points. This approach significantly outperforms existing models on various benchmarks, including sleep stage classification and atrial fibrillation detection, while requiring substantially less labeled data. AI

IMPACT Demonstrates a novel self-supervised learning approach for multi-modal physiological data, potentially improving healthcare diagnostics with less labeled data.
RESEARCH · arXiv cs.CL English(EN) · 5d · [2 sources]

Automated IEP Generation from Traditional Chinese Parent-Teacher Interviews via Corpus-Grounded Feature Diffusion

Researchers have developed a novel method for automatically generating Individualized Education Programs (IEPs) in Traditional Chinese, addressing a significant gap in special-education NLP. The proposed Corpus-Grounded Feature Diffusion (CGFD) pipeline utilizes a low-resource fine-tuning approach with a modified Breeze-7B model. This system achieves state-of-the-art results on a held-out test set, outperforming several leading LLMs in zero-shot performance while ensuring privacy-preserving, local inference. AI

IMPACT Addresses a gap in special-education NLP for Traditional Chinese, offering a privacy-preserving local inference solution.
RESEARCH · arXiv cs.LG English(EN) · 5d · [2 sources]

Assessing Sample Quality in Conditional Generation under Compositional Shift

Researchers have developed a new method to evaluate the quality of generated samples from conditional models, particularly when exploring novel or unobserved conditions. This approach uses a post-hoc trust score that combines global realism and attribute faithfulness, requiring only the original training distribution for assessment. The score can effectively filter, rank, and abstain from generations, demonstrating improvements in downstream predictive performance in biological imaging and vision benchmarks. AI

IMPACT Enables more reliable evaluation of AI-generated content, especially in scientific domains where real-world data is scarce.
RESEARCH · arXiv stat.ML English(EN) · 5d · [2 sources]

On Choosing the $μ$ Parameter in Gaussian Differential Privacy

Researchers have published a paper detailing methods for converting privacy parameters between pure differential privacy ($\varepsilon$) and Gaussian differential privacy (GDP, $\mu$). The study proposes principled mappings by aligning worst-case membership inference attack success rates across three metrics. The authors recommend a general-purpose conversion of $\mu \approx \varepsilon/5$ for conservative privacy reporting in machine learning. AI

IMPACT Provides a standardized method for reporting privacy guarantees in machine learning models, potentially improving transparency and comparability.
RESEARCH · arXiv cs.CL English(EN) · 5d · [2 sources]

Code Is More Than Text: Uncertainty Estimation for Code Generation

Researchers have developed a new method for estimating uncertainty in code generated by large language models, addressing the risks associated with silently incorrect code. The approach, detailed in a new paper, recognizes that code has unique properties like token fragility, an intent-code gap, and executability, which differ from natural language. By introducing three specific uncertainty axes—lexical, algorithmic, and functional—the method significantly improves the accuracy of uncertainty estimation compared to existing natural language-derived techniques. AI

IMPACT Enhances reliability of LLM-generated code by providing better uncertainty estimates, crucial for safety-critical applications.
RESEARCH · arXiv cs.AI English(EN) · 5d · [2 sources]

Self-Explainability in Self-Adaptive and Self-Organising Systems: Status and Research Directions

A new paper reviews the status and future research directions for self-explainability (SX) in complex AI systems. The authors define SX as a system's ability to explain its own decision-making, going beyond traditional Explainable AI. Their systematic literature review reveals that most SX approaches are still conceptual, with limited practical implementations and no standardized evaluation methods, indicating a significant research gap. AI

IMPACT Highlights the need for standardized evaluation and practical implementation of self-explaining AI systems, crucial for trust and understanding in complex AI applications.
RESEARCH · arXiv cs.LG English(EN) · 5d · [2 sources]

Integrating gene regulatory priors into Transformer attention with scTransformer for interpretable scRNA-seq analysis

Researchers have developed scTransformer, a novel approach that integrates gene regulatory information into Transformer models for analyzing single-cell RNA sequencing data. This method enhances interpretability and robustness by incorporating prior biological knowledge into the model's attention mechanisms. Evaluations show scTransformer improves cell-type classification accuracy and produces more biologically meaningful representations compared to standard Transformers. AI

IMPACT Enhances interpretability of AI models in genomics, potentially leading to new biological discoveries.
RESEARCH · arXiv cs.AI English(EN) · 5d · [2 sources]

Model Poisoning Against Federated Model Adaptation with Chain of Bit-Flips

Researchers have developed a new type of backdoor attack against federated learning systems by inducing hardware faults, specifically bit-flips, in model parameters during training. This novel approach, termed "Chain of Bit-Flips," is task-agnostic and can be implanted by a single malicious client. The attack demonstrated a high success rate, reaching 94% with a limited number of faults on a ResNet-18 model, and discussed the practical implications and potential defenses. AI

IMPACT Highlights a new vulnerability in federated learning, potentially requiring new hardware and software defenses to secure distributed AI training.
RESEARCH · arXiv cs.CL English(EN) · 5d · [2 sources]

From Genes to Tokens: a GWAS-inspired Approach for Interpretable Stylometric Analysis

Researchers have developed a new method for stylometric analysis inspired by genome-wide association studies (GWAS). This approach tests individual word tokens for their association with authorship, similar to how genes are linked to traits. Applied to corpora in English, German, and Russian, the technique successfully identifies statistically significant lexical markers that are characteristic of specific authors. AI

IMPACT Introduces a novel interpretability technique for authorship attribution, potentially enhancing AI's ability to understand stylistic nuances in text.
RESEARCH · arXiv cs.CV English(EN) · 5d · [2 sources]

A VideoMAE-v2 Approach to Zero-Shot Traffic Accident Anticipation

Researchers have developed a new zero-shot approach for anticipating traffic accidents using dashcam footage. Their method, which couples a VideoMAE-v2 backbone with a per-frame prediction head, can predict imminent collisions without needing in-domain training data. This framework achieved second place in the 2026 CVPR@AUTOPILOT Zero-Shot Accident Anticipation competition. AI

IMPACT This zero-shot approach could reduce the need for extensive data collection in safety-critical applications like autonomous driving.
- VideoMAE-v2
- CVPR@AUTOPILOT Zero-Shot Accident Anticipation competition
RESEARCH · arXiv cs.LG English(EN) · 5d · [2 sources]

Automating the Expert Eye: A System-Agnostic Deep Learning Framework for Rare Event Discovery in Imbalanced Force Spectroscopy

Researchers have developed a novel deep learning framework to automate the identification of rare molecular unbinding events in Single-Molecule Force Spectroscopy (SMFS). This system-agnostic tool uses a modified ResNet18 architecture and an asymmetric Focal Loss objective to handle extreme class imbalance, achieving a 92.31% true positive rate on a dataset where rare events constituted only 1.34%. The framework successfully reduced manual curation workload by over 90% while maintaining high data preservation, and its interpretability via Grad-CAM addresses 'black-box' concerns. AI

IMPACT Automates complex data analysis in biophysics, potentially accelerating discovery in molecular mechanics.
RESEARCH · arXiv cs.LG English(EN) · 5d · [2 sources]

Efficient Traffic Prediction at Scale: A Systematic Study of STGCN Architectural Depth

A new study on arXiv investigates the architectural depth of Spatio-Temporal Graph Convolutional Networks (STGCNs) for traffic prediction. Researchers found that a single-block STGCN architecture often performs optimally for short-term predictions, with only minor performance degradation at longer horizons. The standard two-block variant incurs significant increases in latency and decreases in throughput, suggesting it may be over-parameterized for many applications in intelligent transportation systems. AI

IMPACT Suggests simpler, more efficient models can be used for traffic prediction, reducing computational overhead in intelligent transportation systems.
- arXiv
- STGCN
RESEARCH · arXiv cs.CV English(EN) · 5d · [2 sources]

Adversarial Attack and Disturbance Detection by Hadamard-Coded Output Representations for Object Detection and Semantic Segmentation

Researchers have developed a new framework called HadamardNet to improve the robustness of object detection and semantic segmentation models against adversarial attacks. This framework utilizes Hadamard-coded output representations, which offer better calibration and allow for more effective detection of disturbances compared to traditional one-hot encodings. The novel approach includes an optimized decoding procedure and a method to exploit prediction inconsistencies for enhanced security. Evaluations show HadamardNet achieves state-of-the-art performance in detecting perturbations while maintaining competitive accuracy on clean data. AI

IMPACT Enhances AI model security by providing better detection of adversarial attacks and disturbances.
RESEARCH · arXiv cs.CV English(EN) · 5d · [2 sources]

Securing Self-supervised Data Curation for Foundation Models Robustness

Researchers have developed a Poisoned Data Detector (PDD) to ensure the integrity of datasets curated using self-supervised learning for foundation models. This defense mechanism combines the ImageBind model with traditional classifiers like SVM to identify and mitigate data poisoning risks. Evaluations showed SVM-PDD performed effectively across various datasets and adversarial attacks, demonstrating scalability and ensemble integration capabilities. AI

IMPACT Enhances the security and reliability of training data for large AI models, potentially improving their robustness against adversarial attacks.
RESEARCH · arXiv cs.AI English(EN) · 5d · [2 sources]

Can Data Work be Reparative?

A new paper explores an alternative approach to data work, focusing on building datasets for AI safety systems through collaboration with individuals impacted by online harms. This method aims to reorient data work as a means of repair and redress, addressing issues of fair compensation and collective governance of AI datasets. The research highlights the importance of accountability in data production and advocates for centering those most affected by current dataset creation practices. AI

IMPACT Proposes a new framework for AI dataset creation that centers affected communities, potentially improving AI safety and ethical development.
- Srravya Chandhiramowuli
- arXiv
RESEARCH · arXiv cs.AI English(EN) · 5d · [3 sources]

Reasoning Arena: Trace Tournaments When Verifiable Rewards Fall Short

Researchers have developed "Reasoning Arena," a new framework designed to enhance the reasoning capabilities of large language models. This system addresses a limitation in reinforcement learning with verifiable rewards where identical rewards across different reasoning traces lead to a lack of gradient signal. Reasoning Arena converts these uninformative reward groups into valuable training data by using trace tournaments for head-to-head comparisons, thereby generating richer relative reward signals. The method improves training efficiency and performance on benchmarks, outperforming standard RLVR by 7.6% on average. AI

IMPACT Enhances LLM reasoning by converting uninformative reward signals into useful training data, potentially accelerating development.
RESEARCH · arXiv cs.AI English(EN) · 5d · [3 sources]

PhysScene: A Scene Graph Dataset for Scientific Visual Reasoning in Physics Experiments

Researchers have introduced two new frameworks, PhysScene and PhysGraph, aimed at improving visual reasoning in physics experiments and robotics. PhysScene is a novel dataset designed to evaluate scene graph generation for physics experiments, focusing on functional relations and logical dependencies beyond simple spatial arrangements. PhysGraph, on the other hand, is a framework that unifies symbolic reasoning with 3D geometry to create physics-aware scene graphs, enabling robots to better understand kinematic and physical properties for tasks like mass estimation and articulation prediction. AI

IMPACT These advancements could lead to more sophisticated AI systems capable of understanding and interacting with complex physical environments.
RESEARCH · arXiv cs.LG English(EN) · 5d · [3 sources]

Conan-embedding-v3: Fusing Modality-Specific Models for Omni-Modal Embedding

Researchers have developed Conan-embedding-v3, a new framework designed to create a unified embedding space for multiple data modalities including text, images, video, documents, and audio. The approach involves training modality-specific models independently, then fusing their task vectors into a single backbone. A key challenge addressed is "Projector Drift," which occurs when fusing models with external encoders, leading to performance degradation in specific modalities like audio. Conan-embedding-v3 employs "Projector Recovery" and multi-modal rehearsal to mitigate this issue, achieving strong performance on benchmarks like MMEB and MAEB. AI

IMPACT Introduces a novel framework for unifying diverse data types into a single embedding space, potentially improving cross-modal retrieval and understanding.
- Conan-embedding-v3
RESEARCH · arXiv cs.AI English(EN) · 5d · [7 sources]

Unveiling Privacy Risks in Multi-modal Large Language Models: Task-specific Vulnerabilities and Mitigation Challenges

Researchers are exploring privacy risks associated with large language models (LLMs) and their adaptations. One study focuses on detecting sensitive personal information in Japanese pre-training corpora, developing a classifier for special care-required personal information (SCPI) under Japan's APPI. Another paper investigates privacy vulnerabilities in multi-modal LLMs, highlighting how they can leak sensitive data from images and memory, and introduces a dataset for evaluation. A third study benchmarks the effectiveness of differential privacy (DP) in adapting LLMs, finding that data distribution shifts significantly impact privacy risks, with parameter-efficient fine-tuning methods like LoRA offering better protection for out-of-distribution data. AI

IMPACT These studies highlight critical privacy challenges in LLMs, informing developers on data handling, multi-modal risks, and effective privacy protection techniques during model adaptation.
RESEARCH · arXiv cs.AI English(EN) · 4d · [2 sources]

Hybridizing Equilibrium Propagation with Ising Machines for Efficient Energy-Based Learning

Researchers have developed a new method for training energy-based neural networks by hybridizing Equilibrium Propagation with Ising Machines. This approach aims to overcome the energy demands of traditional GPU-based training and improve convergence by modifying the physical dynamics of neural states. The new framework demonstrates comparable performance to backpropagation on various datasets and suggests a path toward more energy-efficient AI hardware. AI

IMPACT This research offers a potential pathway for more energy-efficient AI hardware by leveraging physical computing principles.
RESEARCH · arXiv cs.LG English(EN) · 4d · [2 sources]

Heterophily-Aware Adaptive Knowledge Distillation for Hypergraph Neural Networks

Two new research papers introduce advancements in hypergraph neural networks (HNNs). One paper proposes HADES, a method for knowledge distillation that adapts to node heterophily, improving student model performance and inference speed. The other paper introduces Hypergraph U-Nets, a novel architecture that addresses the challenge of pooling and unpooling operations in HNNs, demonstrating superior performance in reconstruction, classification, and anomaly detection tasks. AI

IMPACT These advancements in hypergraph neural networks could lead to more efficient and accurate models for complex relational data.
RESEARCH · arXiv cs.CV English(EN) · 4d · [2 sources]

SMI: Efficient Self-Supervised Learning via Mutual-Information-Inspired Dependency Optimization

Two new research papers explore novel approaches to self-supervised learning (SSL) in computer vision, aiming to improve efficiency and performance. The first paper introduces Semantic Mutual Information (SMI), a method that optimizes a sample-level dependency matrix to achieve competitive results with reduced computational cost. The second paper proposes a multi-task formulation for Siamese SSL, assigning a dedicated predictor to each spatial transformation to stabilize optimization and enhance performance across different frameworks. AI

IMPACT These papers introduce novel techniques that could lead to more efficient and effective computer vision models, potentially reducing training costs and improving performance on various downstream tasks.
RESEARCH · arXiv cs.AI English(EN) · 4d · [2 sources]

The CIFAR Synthetic Evidence Corpus for Detecting AI-Generated Evidence

Researchers have developed new datasets to help detect AI-generated evidence in legal contexts. One corpus focuses on synthetic documents like receipts and administrative records, while another dataset, SLED-1400, contains authentic and AI-generated photographs relevant to civil disputes. Studies show that while AI models struggle to detect sophisticated synthetic images, humans also perform poorly, indicating a need for combined detection methods. AI

IMPACT Highlights the growing challenge of AI-generated content in legal evidence and the need for robust detection methods.
RESEARCH · arXiv cs.LG English(EN) · 4d · [3 sources]

In-Context Learning of Stochastic Differential Equations with Foundation Inference Models

Researchers have developed a suite of Foundation Inference Models (FIMs) designed to rapidly estimate parameters for various differential equations from time-series data. These models, including FIM-SDE for stochastic differential equations, FIM-PP for temporal point processes, and FIM-ODE for ordinary differential equations, are pretrained on broad distributions of synthetic data. This pretraining allows them to perform in-context (zero-shot) inference or be quickly fine-tuned to specific datasets, often outperforming traditional methods and specialized models that require extensive training. AI

IMPACT These foundation models could significantly speed up scientific discovery by enabling faster and more accurate parameter estimation for complex dynamical systems.
- Ramses Sanchez
- arXiv
- FIM-SDE
- FIM-PP
- FIM-ODE
RESEARCH · arXiv cs.CV English(EN) · 4d · [2 sources]

CamoSAM2: SAM2-oriented Prompt Auto-Refinement for Video Camouflaged Object Detection

Researchers have developed new frameworks for camouflaged object detection (COD) that address the issue of over-detection. One approach, CFCamo, uses a counterfactual benchmark to train agents to both detect camouflaged objects and abstain when no object is present, improving performance on existing datasets and achieving high pair accuracy on the new CF-COD benchmark. Another method, CamoSAM2, refines prompts for the Segment Anything Model 2 (SAM2) by integrating motion and appearance cues to enhance automatic detection and segmentation of camouflaged objects in videos, outperforming current state-of-the-art methods in mean intersection over union (mIoU) and inference speed. AI

IMPACT These advancements in camouflaged object detection could improve AI's ability to accurately identify and segment objects in complex visual environments, impacting fields like surveillance, medical imaging, and autonomous systems.
- SAM2
- Xin Zhang
- CamoSAM2
- Qwen3-VL-4B-Instruct
- CFCamo
- arXiv
RESEARCH · Hugging Face Daily Papers English(EN) · 4d · [2 sources]

Supervised Fine-tuning with Synthetic Rationale Data Hurts Real-World Disease Prediction

A new research paper challenges the common assumption that supervised fine-tuning with synthetic rationale data improves language model performance on clinical prediction tasks. Experiments on Alzheimer's disease prediction found that this method consistently degraded performance compared to label-only fine-tuning, even when the rationales were medically accurate. The study suggests a conflict between narrative plausibility and discriminative optimization is the likely cause, urging caution in developing language models for high-stakes medical applications. AI

IMPACT Challenges the efficacy of rationale-based fine-tuning for high-stakes clinical prediction, suggesting a need for more robust training methodologies.
RESEARCH · arXiv cs.IR (Information Retrieval) English(EN) · 5d · [2 sources]

TABVERSE: Benchmarking Cross-Format Table Understanding in LLMs and VLMs

Researchers have introduced TABVERSE, a new benchmark designed to evaluate how well Large Language Models (LLMs) and Vision-Language Models (VLMs) understand tables across different formats. The benchmark standardizes table content while varying its representation, such as HTML, Markdown, LaTeX, and rendered images. Initial findings indicate that model performance is significantly influenced by the table's format, with structured text generally outperforming images, though specific tasks and formats present unique challenges. AI

IMPACT Highlights the impact of data representation on LLM/VLM performance, suggesting a need for robust cross-format handling in future model development.
- LLMs
- TABVERSE
RESEARCH · arXiv cs.CL English(EN) · 5d · [2 sources]

UXBench: Benchmarking User Experience in AI Assistants

Researchers have introduced UXBench, a novel benchmark designed to evaluate the user experience of AI assistants. This benchmark is the first to use real user feedback signals and includes three tasks: UX Judge, UX Eval, and UX Recovery. It is built upon a dataset of 7,400 instances derived from over 70,000 interaction logs of a Chinese AI assistant, covering diverse scenarios and failure patterns. Experiments with 26 language models demonstrate that user feedback prediction is a learnable capability and highlight biases in current LLM-as-a-judge evaluation methods. AI

IMPACT Establishes a new evaluation framework for AI assistants, pushing for user-centric optimization beyond raw capability.
RESEARCH · arXiv cs.AI English(EN) · 5d · [2 sources]

AI Scientists Are Only as Good as Their Evidence: A Stratified Ablation of Proprietary Data and Reasoning Skills in Drug-Asset Valuation

A new research paper explores the impact of data access on AI scientist performance in drug-asset valuation. The study found that while reasoning skills and tools improve calibration, proprietary data significantly increases the AI's ability to recover relevant information and make informed decisions. Without access to curated, proprietary datasets, AI performance is fundamentally capped, regardless of advanced reasoning capabilities. AI

IMPACT Highlights the critical role of proprietary data in unlocking advanced AI capabilities for specialized decision-making tasks.
- arXiv
- AI Scientists