Brief

last 24h

[50/770] 223 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · arXiv cs.CV English(EN) · 17h

Test-Time Scaling in Multimodal Foundation Models: A Comprehensive Survey of Generation and Reasoning

A new survey paper details the emerging field of Test-Time Scaling (TTS) for Multimodal Foundation Models (MFMs). The paper categorizes existing TTS methods into sampling-based, feedback-based, and search-based approaches. It also outlines common applications, benchmarks, and future research directions for enhancing MFM performance in generation and reasoning tasks. AI

IMPACT Provides a structured overview and taxonomy for multimodal AI scaling research, guiding future development.
- Test-Time Scaling
- Multimodal Foundation Models
TOOL · arXiv cs.LG English(EN) · 17h

Similarity-Distance-Magnitude Activations

Researchers have introduced a new activation function called Similarity-Distance-Magnitude (SDM). This function aims to improve upon the standard softmax by incorporating awareness of similarity to correct predictions, distance from the training distribution, and the existing magnitude of outputs. The SDM estimator, built upon this activation, is designed to enhance interpretability and robustness against distribution shifts, particularly for selective classification tasks in pre-trained language models. AI

IMPACT Introduces a novel activation function that could improve the interpretability and robustness of large language models.
TOOL · arXiv cs.LG English(EN) · 17h

Normality Calibration in Semi-supervised Graph Anomaly Detection

Researchers have developed a new framework called GraphNC to improve semi-supervised graph anomaly detection. This method calibrates normality by leveraging both labeled and unlabeled data, using a teacher model to guide the process. GraphNC incorporates anomaly score distribution alignment and perturbation-based normality regularization to enhance the accuracy and separability of anomaly scores and node representations. AI
- Hezhe Qiao
- GraphNC
TOOL · arXiv cs.CV English(EN) · 17h

SSAFE: Simple and Strong AI-Generated Image Detection via Frozen Vision Encoders

Researchers have developed a new method for detecting AI-generated images using pre-trained multimodal vision encoders. This approach leverages the inherent separation of real and synthetic images within the embedding space of these frozen encoders, allowing a simple linear classifier to achieve high accuracy without extensive fine-tuning. The method also incorporates a data curation strategy that uses a compact set of representative generators, resulting in a smaller training dataset that improves robustness against unseen generators and distribution shifts. AI

IMPACT This research offers a more robust and efficient approach to detecting AI-generated images, which could be crucial for maintaining trust in digital media.
TOOL · arXiv cs.CV English(EN) · 17h

Thinking Without Images: Internalizing Visual Manipulation with On-Policy Self-Distillation

Researchers have developed a new self-distillation framework called Imagine-OPD to improve visual reasoning in AI models. This method trains models to "imagine" relevant visual cues rather than relying on external tools for image cropping, reducing inference time and computational cost. Experiments show Imagine-OPD outperforms existing methods on vision-centric benchmarks while being more efficient. AI

IMPACT This approach could lead to more efficient visual reasoning models, reducing computational costs for AI applications that rely on image analysis.
- arXiv
- Imagine-OPD
TOOL · arXiv cs.LG English(EN) · 17h

Operator learning for the 2D incompressible Navier-Stokes equations: a conformal prediction approach in the data-scarce regime

Researchers have developed a new conformal prediction framework to quantify uncertainty in neural operator learning, specifically for the 2D incompressible Navier-Stokes equations. This method uses a perturbation-based approach to estimate uncertainty by comparing predictions from two similarly trained neural operators. It aims to provide calibrated uncertainty estimates efficiently, even in data-scarce scenarios, by avoiding the need for separate uncertainty networks. AI

IMPACT This method offers a more sample-efficient way to quantify uncertainty in complex physical simulations, potentially improving the reliability of AI models in scientific applications.
- Fourier Neural Operator
- 2D incompressible Navier-Stokes equations
TOOL · arXiv cs.LG English(EN) · 17h

Function-Vector Heads Are Two Populations: Writers and Cancellers in In-Context Learning

Researchers have identified two distinct populations within function-vector (FV) heads in large language models, challenging the assumption that these heads are a homogeneous group. By employing a sign-preserving criterion instead of magnitude-only ranking, they found that FV heads either push correct logits up (writers) or push them down (cancellers). This dual nature was observed across multiple model families and scales, and zero-ablating cancellers led to improved accuracy. AI

IMPACT Reveals a more nuanced understanding of how LLMs process information, potentially impacting future model interpretability and design.
TOOL · arXiv cs.LG English(EN) · 17h

Beyond Homophily: Towards Generalized Graph Reconstruction Attack and Defense

Researchers have developed new methods for attacking and defending graph neural networks (GNNs) against information leakage. The study characterizes how graph properties like homophily and heterophily influence the recoverability of training data. Building on a Markov chain approximation, they propose an attack that reconstructs graph adjacency by aligning representations across GNN layers and a defense that suppresses this sensitive information while maintaining classification accuracy. AI

IMPACT Introduces new techniques for privacy preservation in GNNs, potentially impacting how sensitive graph data is handled.
TOOL · arXiv cs.CV English(EN) · 17h

Generalizing Geometry-Guided Mamba as a Plug-and-Play Context Module for CNN-based Semantic Segmentation

Researchers have adapted a geometry-guided Mamba model, originally from DGM-Net, to serve as a plug-and-play context module for CNN-based semantic segmentation. This approach injects geometric guidance into the selective scan process, enabling long-range feature propagation modulated by boundary and centripetal-flow cues. When integrated into six different CNN segmentation models, the geometry-guided SSM modules consistently improved mean Intersection over Union (mIoU) scores on the Cityscapes dataset with only a slight increase in computational cost. AI

IMPACT Enhances existing CNN segmentation models with improved context aggregation, potentially leading to more accurate image analysis in computer vision tasks.
- DGM-Net
- ResNet-101
- PSPNet
- OCRNet
- DANet
- Cityscapes
- CNN
- Geometry-Guided Mamba
TOOL · arXiv cs.LG English(EN) · 17h

State Backdoor: Towards Stealthy Real-world Poisoning Attack on Vision-Language-Action Model in State Space

Researchers have developed a new type of backdoor attack targeting Vision-Language-Action (VLA) models, which are crucial for embodied AI applications like robotics. Unlike previous methods that rely on visible visual triggers, this novel "State Backdoor" utilizes the initial state of a robot arm as the trigger. A Preference-guided Genetic Algorithm was employed to find minimal yet effective state-based triggers, achieving over 90% attack success without degrading performance on normal tasks. AI

IMPACT Reveals a new vulnerability in embodied AI, potentially requiring new security measures for robotic systems.
TOOL · arXiv cs.LG English(EN) · 17h

Self-Consistent Generative Paths via Admissible Random Variational Transport

Researchers have introduced a new framework for understanding generative models, focusing on the concept of "self-consistent generative paths." This framework defines a path as self-consistent if it represents a random fixed point of admissible local variational transport corrections. The theory yields a metric called the random fixed-point path residual (R-FPR) to quantify the gap between a generated path and its correction, offering a principle for diagnosing and improving various generative models. AI

IMPACT Introduces a theoretical framework for unifying and improving various generative models, potentially impacting future research and development.
- arXiv cs.LG
- Self-Consistent Generative Paths via Admissible Random Variational Transport
TOOL · arXiv cs.LG English(EN) · 17h

Bulk-boundary decomposition of neural networks

Researchers have introduced the bulk-boundary decomposition, a novel framework for analyzing the training dynamics of deep neural networks. This approach separates the network's Lagrangian into a data-independent bulk term and a data-dependent boundary term. The bulk term characterizes the inherent dynamics influenced by network architecture and activation functions, while the boundary term reflects the stochastic interactions arising from training samples at the input and output layers. This decomposition reveals the local and homogeneous structure within deep networks, leading to the derivation of an energy continuity equation. AI

IMPACT Introduces a new theoretical lens for understanding and potentially optimizing neural network training processes.
- Donghee Lee
TOOL · arXiv cs.LG English(EN) · 17h

Measuring a hate speech spectrum with faceted Rasch item response theory and perspective-aware, explainable-by-design deep learning

Researchers have developed a novel system to measure hate speech on a continuous spectrum, ranging from genocidal to supportive language. This approach combines supervised deep learning with faceted Rasch item response theory, breaking down hate speech into 10 ordinal labels. These labels are then probabilistically modeled to create an interval outcome measure, while also accounting for individual annotator perspectives. The system, applied to a dataset of 50,070 social media comments from YouTube, Twitter, and Reddit annotated by over 11,000 Mechanical Turk workers, utilizes a RoBERTa-based model that demonstrates improved accuracy over existing methods. AI

IMPACT Introduces a new paradigm for NLP that encourages continuous constructs and incorporates annotator perspective and model explainability.
TOOL · arXiv cs.CV English(EN) · 17h

Stain-Aware Wavelet Regularization for Instant Adversarial Purification in Histopathology

Researchers have developed a new method called Stain-Aware Wavelet Regularization (SAWR) to improve the robustness of deep learning models used in histopathology. This technique uses wavelet-domain regularization to separate adversarial noise from important tissue structures in medical images. SAWR also adapts this regularization to specific stain channels, enhancing its effectiveness and improving adversarial robustness by over 10% while preserving image quality. AI

IMPACT Enhances the reliability of AI in clinical diagnostics by mitigating adversarial attacks on histopathology images.
- Stain-Aware Wavelet Regularization
- Hematoxylin
TOOL · arXiv cs.CV English(EN) · 17h

Learnable Token Sparsification for Efficient Gigapixel Whole Slide Image Reasoning

Researchers have developed a novel method for processing gigapixel whole slide images in vision language models by treating token reduction as a trainable sparsification problem. This approach, detailed in a new arXiv paper, allows the model to learn an optimal selection strategy for visual tokens, unlike previous methods that used non-trained downsampling or heuristic pruning. The proposed decoupled routing architecture and SparseLearn component enable gradient propagation through the pruning process, ultimately reducing the visual sequence to a sparse set of 32 tokens with minimal computational overhead during inference. This technique achieves high accuracy on benchmarks like SlideBench, offering an efficient paradigm for end-to-end gigapixel image reasoning. AI

IMPACT Enables more efficient and accurate analysis of large medical images by AI, potentially improving diagnostic capabilities.
- WSI VQA*
- TCGA
- SlideBench
- SparseLearn
TOOL · arXiv cs.LG English(EN) · 17h

Exposing Hidden Biases in Text-to-Image Models via Automated Prompt Search

Researchers have developed a new framework called Bias-Guided Prompt Search (BGPS) to automatically uncover hidden biases in text-to-image models. This method uses an LLM to generate prompts that, when fed into image generation models, amplify specific attributes like gender or race. Experiments on Stable Diffusion revealed previously undocumented biases, highlighting vulnerabilities in current models and offering a new evaluation tool for bias mitigation efforts. AI

IMPACT This research provides a novel method for identifying and potentially mitigating biases in generative AI, crucial for responsible AI development.
TOOL · arXiv cs.LG English(EN) · 17h

Cryptographic Backdoor for Neural Networks: Boon and Bane

Researchers have developed a method to embed cryptographic backdoors into neural networks, which can be used for both offensive attacks and defensive measures. These backdoors enable powerful, undetectable attacks while also facilitating provably robust watermarking, user authentication, and intellectual property tracking. The work draws inspiration from existing cryptographic techniques and has been demonstrated on modern neural network architectures, with potential for post-quantum applications. AI

IMPACT Introduces new methods for securing neural networks against unauthorized use and tampering.
TOOL · arXiv cs.LG English(EN) · 17h

phepy: Visual benchmarks and improvements for out-of-distribution detectors

Researchers have developed a new benchmark called "phepy" to evaluate out-of-distribution (OOD) detection methods in machine learning. This benchmark uses three novel, visually intuitive toy examples to assess a detector's ability to identify linear and non-linear concepts, as well as thin in-distribution subspaces within high-dimensional data. The study also explores methods for synthesizing OOD inputs for supervised training and introduces improvements like t-poking and OOD sample weighting to enhance detector precision at the decision boundary. AI

IMPACT Provides new tools and methods for improving the reliability of machine learning models in real-world, unpredictable scenarios.
- Andreas Rupp
TOOL · arXiv cs.LG English(EN) · 17h

Byzantine Cheap Talk: Adversarial Resilience and Topology Effects in LLM Coordination Games

Researchers have explored the vulnerabilities of multi-agent LLM systems that rely on communication for coordination. Their study found that when some agents act deceptively (Byzantine agents), others can detect the betrayal but struggle to adapt, leading to continued exploitation. The research also revealed that restricting communication pathways can degrade cooperation, even without an adversary present, by affecting the agents' meta-reasoning about hidden information. AI

IMPACT Reveals specific security vulnerabilities in LLM coordination, suggesting communication channels can be exploited and topology disclosure can degrade performance.
- LLM
- Byzantine agents
TOOL · arXiv cs.CV English(EN) · 17h

Rethinking 3D Shape Generation: Diffusion over Superquadrics

Researchers have developed a new method for generating 3D shapes by diffusing over superquadric parameters instead of dense geometric representations. This approach significantly reduces the dimensionality of the diffusion state, requiring only 7KB of parameters per shape. The diffusion-over-superquadrics method enables faster generation, improved scalability, and supports advanced capabilities like part-level editing and constraint-based design, while achieving competitive performance on standard benchmarks. AI

IMPACT Enables more efficient and controllable 3D shape generation, potentially impacting fields requiring rapid asset creation.
- Diffusion models
TOOL · arXiv cs.CV English(EN) · 17h

MB-Loc: Multi-planar Bird's-eye-view Localization in outdoor LiDAR scenes

Researchers have developed MB-Loc, a new framework for multi-planar bird's-eye-view localization in outdoor LiDAR scenes. This method addresses computational inefficiency and viewpoint sensitivity in existing scene coordinate regression techniques. MB-Loc projects LiDAR scans into a 2.5D representation, enabling faster processing with standard 2D CNNs while retaining crucial 3D geometric information. The framework also incorporates a KL-regularized latent bottleneck for spatial uncertainty modeling and 3D spatial augmentations for rotation robustness, outperforming current state-of-the-art methods on the NCLT dataset at real-time inference speeds. AI

IMPACT Enhances autonomous navigation systems by improving the efficiency and robustness of LiDAR localization.
TOOL · arXiv cs.LG English(EN) · 17h

Decentralized Online Riemannian Optimization Beyond Hadamard Manifolds

Researchers have developed a new decentralized online Riemannian optimization algorithm capable of operating beyond the limitations of Hadamard manifolds, extending its applicability to spaces with positive curvature. The algorithm incorporates a curvature-aware consensus step that facilitates linear convergence even in these more complex geometric settings. This advancement leads to a $O(\sqrt{T})$ regret bound for the decentralized online Riemannian gradient descent method, with similar bounds achieved in a two-point bandit feedback scenario using efficient gradient estimators. AI
- Emre Sahinoglu
TOOL · arXiv cs.LG English(EN) · 17h

IR-SIM: A Lightweight Skill-Native Simulator for Navigation, Learning, and Benchmarking

Researchers have developed IR-SIM, a new lightweight simulator designed to streamline robotics research, particularly for tasks involving large language models. This simulator allows for the creation and modification of navigation scenarios using simple YAML configuration files and text prompts, making it easier to prototype and develop algorithms. IR-SIM also facilitates automated benchmarking and data generation for robot learning, with capabilities to bridge to higher-fidelity simulators and real-world deployments. AI

IMPACT Simplifies the development and benchmarking of AI-powered robot navigation systems.
- IR-SIM
- large language models
TOOL · arXiv cs.CV English(EN) · 17h

RGB-S: Image-Aligned Tactile Saliency for Robust Dexterous Manipulation

Researchers have developed a new framework called RGB-S that explicitly aligns tactile sensor data with visual information for robotic manipulation. This method projects tactile sensor locations directly onto RGB images, creating saliency maps that account for spatial uncertainty. By integrating these 2D anchors, the system injects physical contact priors into visual models, improving their ability to handle unreliable or occluded visual inputs. Experiments demonstrated a significant improvement in success rates for dexterous manipulation tasks under severe visual occlusion. AI

IMPACT Enhances robotic manipulation capabilities by improving sensor fusion and robustness to visual occlusions.
- Robotic Dexterous Manipulation
TOOL · arXiv cs.LG English(EN) · 17h

Learning from flowsheets: A generative transformer model for autocompletion of flowsheets

Researchers have developed a novel method for autocompleting chemical flowsheets using a transformer-based language model. The approach represents flowsheets as strings and trains the model on their grammatical structure and common patterns. After pre-training on synthetic data and fine-tuning on real-world examples, the model can suggest completions for flowsheets, aiding chemical engineers in process synthesis. AI

IMPACT This AI-driven autocompletion could streamline chemical process design and accelerate innovation in the field.
- Lukas Schulze Balhorn
- SFILES 2.0
TOOL · arXiv cs.LG English(EN) · 17h

Analysis of Information Theory for Explainable AI

Researchers have developed a new post-hoc visual explanation method for convolutional neural networks called MI CAM. This method utilizes activation mapping and weighs feature maps based on their mutual information with the input image and the network's final output. MI CAM aims to provide causal interpretations and has demonstrated performance on par with or exceeding state-of-the-art methods in qualitative and quantitative measures. AI

IMPACT Provides a novel method for understanding AI decision-making, potentially improving trust and debugging in critical applications.
- Ram S Iyer
TOOL · arXiv cs.LG English(EN) · 17h

BlendServe: Optimizing Offline Inference for Auto-regressive Large Models with Resource-aware Batching

Researchers have developed BlendServe, a new system designed to optimize offline inference for auto-regressive large language models. BlendServe combines resource overlapping and prefix sharing techniques to maximize throughput and reduce costs for latency-insensitive applications. Evaluations show that BlendServe can achieve up to a 1.44x throughput increase compared to existing standards like vLLM and SGLang. AI

IMPACT Optimizes LLM inference for cost and throughput, potentially lowering operational expenses for AI applications.
- vLLM
- BlendServe
- SGLang
- Yilong Zhao
TOOL · arXiv cs.LG English(EN) · 17h

Priors Persist Through Suppression: A Stroop Paradigm for Lexical Override

Researchers have developed a Stroop-style paradigm to investigate how language models handle conflicting instructions. Their experiments, conducted across 11 open-weight models, reveal that lexical priors persist through override rather than being replaced. Activation patching on aligned models pinpointed a specific source-position triplet crucial for binding these conflicting pieces of information. AI

IMPACT This research offers a new method for probing LLM behavior, potentially leading to better understanding and control of their responses.
TOOL · arXiv cs.LG English(EN) · 17h

SpectraLDS: Provable Distillation for Linear Dynamical Systems

Researchers have developed SpectraLDS, a novel method for distilling linear dynamical systems (LDS) with provable accuracy guarantees. This approach leverages recent advancements in representing LDSs as learnable convolutions through spectral transformations. SpectraLDS allows for the inversion of this representation, enabling an end-to-end convex optimization process that maintains predictive accuracy while significantly improving inference efficiency to constant time and space per token, regardless of sequence length. The method has shown promise when integrated into sequence prediction architectures, enhancing efficiency in tasks like language modeling. AI

IMPACT Introduces a method to improve inference efficiency in sequence prediction tasks like language modeling.
- SpectraLDS
- Devan Shah
TOOL · arXiv cs.LG English(EN) · 17h

A Graphop Analysis of Graph Neural Networks on Sparse Graphs: Generalization and Universal Approximation

Researchers have developed a novel approach to analyzing the generalization and approximation capabilities of message passing graph neural networks (MPNNs). This new method defines a compact metric space that accommodates graphs of all sizes, both sparse and dense, which is a significant improvement over prior work that was limited to either dense graphs or uniformly bounded sparse graphs. The theory, based on graphop analysis, yields more potent universal approximation theorems and generalization bounds for MPNNs. AI

IMPACT Enhances theoretical understanding of graph neural networks, potentially leading to more robust and generalizable models for graph-based AI tasks.
TOOL · arXiv cs.LG English(EN) · 17h

A Survey of Heterogeneous Graph Neural Networks for Cybersecurity Anomaly Detection

This paper surveys the use of Heterogeneous Graph Neural Networks (HGNNs) for anomaly detection in cybersecurity. It addresses the limitations of traditional graph-based methods in handling complex, evolving cyber data. The survey categorizes existing HGNN approaches, reviews their applications, and discusses common datasets and evaluation metrics. Finally, it outlines future research directions to improve the scalability and interpretability of these models. AI

IMPACT Provides a structured overview of HGNN applications in cybersecurity, guiding future research and development in threat detection.
TOOL · arXiv cs.CV English(EN) · 17h

PhysAgent: Automating Physics-Based 4D Synthesis via Trajectory-Grounded Multi-Agent Feedback

Researchers have introduced PhysAgent, a novel multi-agent framework designed to automate the creation of physically plausible 4D animations. This system addresses limitations in current methods by integrating a simulation-in-the-loop approach with multimodal inputs. PhysAgent uses a Semantic Agent to manage simulation rules and a Refine Agent that employs vision foundation models and LLM reasoning to extract and interpret motion trajectories, enabling dynamic force field adjustments and escaping local optima. AI

IMPACT Automates complex 4D synthesis, potentially accelerating data generation for graphics and robotics applications.
- PhysAgent
- LLM
TOOL · arXiv cs.LG English(EN) · 17h

From inverse problems to neural operators: prediction, mechanism, and generalization of data-driven models

A new paper explores the relationship between traditional differential equation models and modern data-driven approaches like neural operators. It argues that many modeling strategies share a common structure, differing primarily in their assumed input-output mappings. The research suggests that only certain models are capable of true mechanism discovery and subsequent generalization, offering insights into their appropriate applications. AI

IMPACT Provides a theoretical framework for understanding and comparing different data-driven modeling approaches in scientific applications.
TOOL · arXiv cs.CV English(EN) · 17h

DroneDAR: Long-Range Drone Distance Estimation Using Monocular Vision and Bounding-Box Features

Researchers have developed DroneDAR, a new model for estimating drone distances using monocular vision and bounding-box features. This approach is crucial for tracking and situational awareness, especially in long-range imagery where drones appear very small. DroneDAR combines a convolutional backbone with bounding-box cues via a gating mechanism to improve accuracy and robustness against factors like bounding-box noise and low texture detail. AI

IMPACT This research could improve drone tracking and situational awareness in long-range scenarios, potentially impacting surveillance and autonomous navigation systems.
- arXiv
TOOL · arXiv cs.LG English(EN) · 17h

Benchmark Datasets for Lead-Lag Forecasting on Social Platforms

Researchers have introduced a new framework called Lead-Lag Forecasting (LLF) to address the challenge of predicting future impacts based on early user interactions on social platforms. To support this research, they have created two large benchmark datasets derived from arXiv and GitHub, encompassing millions of papers and repositories respectively. These datasets are designed to capture long-term dynamics and avoid sampling biases, providing a foundation for developing and testing LLF models. AI

IMPACT Establishes a new forecasting paradigm for analyzing long-term user behavior dynamics on social platforms.
TOOL · arXiv cs.LG English(EN) · 17h

Temporal Coverage over Density: Parsimonious Training-Set Design for ML Climate Downscaling

Researchers have developed a new method for training machine learning models to downscale climate data, focusing on how to select training years effectively. Their study, using the CESM2 Large Ensemble, found that training models on years distributed across the entire climate trajectory, rather than contiguous historical periods, significantly improves their ability to reproduce climate variability. This approach, even with limited data, outperforms models trained solely on historical data and suggests that broad sampling of climate states is more beneficial than temporal continuity for allocating scarce high-resolution simulation resources. AI

IMPACT Optimizes training data selection for climate models, potentially improving accuracy and efficiency in climate impact assessments.
- CESM2 Large Ensemble
TOOL · arXiv cs.LG English(EN) · 17h

Lattice: A Confidence-Gated Hybrid System for Uncertainty-Aware Sequential Prediction with Behavioral Archetypes

Researchers have developed Lattice, a novel system designed for uncertainty-aware sequential prediction. This hybrid system uses confidence gating to selectively activate learned behavioral archetypes, falling back to a base model when uncertain. Experiments on datasets like MovieLens and Amazon Electronics demonstrated significant improvements in prediction accuracy, with gains of over 30% in some cases. AI

IMPACT Introduces a novel method for improving sequential prediction models by incorporating uncertainty awareness and conditional activation of learned behaviors.
- Lorian Bannis
- Lattice
- MovieLens
- Amazon Electronics
- LSTM
- transformer
- SASRec
- BERT4Rec
TOOL · arXiv cs.CV English(EN) · 17h

Struct-Searcher: Agentic Structural Thinking Advances Multimodal Deep Information Seeking

Researchers have introduced Struct-Searcher, a novel agentic workflow designed for multimodal deep information seeking. This system moves beyond simple evidence accumulation by employing belief revision theory to construct and maintain an evolving multimodal structural graph. This allows Struct-Searcher to effectively handle contradictory information across different modalities, leading to improved accuracy in complex research tasks. AI

IMPACT This new agentic workflow could improve the accuracy and robustness of AI systems in complex multimodal research tasks.
TOOL · arXiv cs.LG English(EN) · 17h

TinyJudge: Unverifiable Constraint Alignment via Lightweight Specialist Ensembles

Researchers have developed TinyJudge, a new framework designed to improve instruction following in large language models (LLMs). This system utilizes an ensemble of small, specialized language models to evaluate and reward adherence to complex, often unverifiable constraints, such as tone or style. By distilling expertise from larger models into these smaller ones, TinyJudge aims to overcome limitations like reward hacking and high computational costs associated with current methods. Experiments show TinyJudge significantly outperforms existing approaches in performance and reward precision, while also reducing training time by threefold. AI

IMPACT This approach could lead to more efficient and precise alignment of LLMs with complex human instructions, potentially improving their usability in diverse applications.
- TinyJudge
- LLMs
TOOL · arXiv cs.LG English(EN) · 17h

SAD-Flower: Flow Matching for Safe, Admissible, and Dynamically Consistent Planning

Researchers have developed SAD-Flower, a new framework designed to enhance the safety and reliability of trajectory planning using flow matching. This method addresses limitations in existing flow matching techniques by incorporating formal guarantees for state and action constraints, as well as ensuring dynamical consistency. SAD-Flower achieves this by augmenting the flow with a virtual control input, allowing for test-time satisfaction of unseen constraints without retraining, and has demonstrated superior performance over other generative model-based baselines in experiments. AI

IMPACT Enhances safety and reliability in AI-driven planning systems, potentially enabling wider adoption in critical applications.
TOOL · arXiv cs.LG English(EN) · 17h

Solving Inverse Problems with Flow-based Models via Model Predictive Control

Researchers have developed MPC-Flow, a novel framework for solving inverse problems using flow-based generative models. This method employs model predictive control to guide the model's dynamics, making conditional generation more practical. MPC-Flow offers a spectrum of guidance algorithms, some of which bypass the need for backpropagation through the generative model's trajectory. The framework has demonstrated strong performance and scalability on image restoration tasks, including in-painting, deblurring, and super-resolution, even with large-scale models like FLUX.2 on consumer hardware. AI

IMPACT Introduces a more efficient method for conditional generation in flow-based models, potentially improving performance on tasks like image restoration.
TOOL · arXiv cs.LG English(EN) · 17h

Optimizing Few-Step Generation with Adaptive Matching Distillation

Researchers have developed Adaptive Matching Distillation (AMD), a new framework to improve the stability and performance of few-step generative models. AMD addresses issues in "Forbidden Zones" where existing distillation methods struggle by using reward proxies to detect and escape these problematic areas. Experiments on image and video generation tasks, including SDXL and Wan2.1, show AMD enhances sample fidelity and training robustness, notably improving the HPSv2 score on SDXL. AI

IMPACT Enhances training robustness and sample fidelity for generative models, potentially leading to more efficient and higher-quality AI-generated content.
TOOL · arXiv cs.LG English(EN) · 17h

Teacher-Free Self-Training Amplifies but Does Not Compound: A Pass@$K$ Crossover on a Free-Verifier Domain

Researchers investigated whether self-training language models on their own outputs leads to new capabilities or simply refines existing ones. Using a teacher-free setup with a generator, critic, and verifier on a Qwen3-4B model, they found that critic-guided selection improved performance. Self-training raised the performance ceiling but did not accelerate learning, with the base model eventually outperforming the self-trained model at higher computational budgets, indicating amplification rather than compounding of capabilities. AI

IMPACT This research suggests that current self-training methods may not unlock fundamentally new LLM abilities, potentially shifting focus towards architectural or data innovations for true capability breakthroughs.
- Qwen3-4B
- arXiv
TOOL · arXiv cs.CV English(EN) · 17h

DALE-CT: Depth-Aware Foundation Models for Computed Tomography

Researchers have developed DALE-CT, a new family of 2D foundation models for processing computed tomography (CT) data. Built from scratch using a self-supervised learning approach called LeJEPA, DALE-CT incorporates a novel 3D depth-aware pre-training strategy with both automated and human-annotated supervision. This model achieved a Macro AUROC of 0.833 on the CT-RATE dataset for multi-abnormality detection, nearing the performance of state-of-the-art 3D vision-language models with less data and no textual supervision. AI

IMPACT Introduces a novel, data-efficient approach for medical image analysis, potentially improving diagnostic accuracy in CT scans.
- DINOv2
- DALE-CT
- LeJEPA
- CT-RATE dataset
TOOL · arXiv cs.LG English(EN) · 17h

Layer-wise Derivative Controlled Networks Achieve Competitive Accuracy and Gradient Stability Across Data Regimes

Researchers have developed a new neural network architecture called Layer-wise Derivative Controlled Networks (CR) that demonstrates improved accuracy and gradient stability across various data regimes. In studies on the Pima Diabetes dataset, CR maintained a consistent accuracy advantage even with limited training data, showing significantly more stable gradient tail ratios compared to standard ReLU networks. Further experiments on the SST-5 dataset indicated competitive or superior performance in both frozen-embedding and BERT fine-tuned scenarios, outperforming existing baselines with less training data. AI

IMPACT This new architecture offers improved generalization and stability, potentially leading to more robust AI models across different data volumes and types.
TOOL · arXiv cs.LG English(EN) · 17h

The Hidden Bias of Process Reward Models:PRISM for Rewarding the Right Reasoning

Researchers have identified a significant bias in Process Reward Models (PRMs) stemming from imbalanced training data, which leads to an overemphasis on plausible but incorrect reasoning steps. This bias can actively mislead AI systems, negatively impacting tasks like guided decoding and Best-of-N selection. To combat this, a new framework called PRISM has been developed, which uses contrastive learning and hard negative examples to improve step-level modeling without requiring additional human labels, substantially reducing false positives and enhancing accuracy. AI

IMPACT Reduces false positives in AI reasoning, potentially leading to more reliable and accurate AI decision-making.
- PRISM
- Process Reward Models
TOOL · arXiv cs.CV English(EN) · 17h

EgoPriMo: Egocentric Motion Generation for Interactive Humanoid Control

Researchers have developed EgoPriMo, a new framework for generating full-body motion for humanoid robots using egocentric human demonstrations. This system takes egocentric visual observations and text prompts to reconstruct, generate, and forecast SMPL-based motion. EgoPriMo utilizes a Triple-stream DiT model that processes body dynamics, visual context, and text, enabling it to learn generalizable and interactive motion priors from diverse human actions. AI

IMPACT Enables more natural and interactive control of humanoid robots by learning from human demonstrations.
TOOL · arXiv cs.LG English(EN) · 17h

Towards Automated Kernel Generation in the Era of LLMs

A new survey paper explores the use of large language models (LLMs) and agentic systems for automating the generation and optimization of GPU kernels. These kernels are crucial for the performance of AI systems, but their manual creation is a time-consuming and non-scalable process. The paper aims to provide a structured overview of current LLM-driven approaches, datasets, and benchmarks, while also outlining future research directions in this rapidly evolving field. AI

IMPACT Automating GPU kernel generation with LLMs could significantly accelerate AI system development and performance.
- GPU kernels
- LLMs
- Yang Yu
TOOL · arXiv cs.LG English(EN) · 17h

Neural Legendre-Fenchel transform with Hessian Preconditioning

Researchers have developed a new method for approximating the Legendre-Fenchel transform, a key tool in convex analysis and machine learning. Their approach utilizes neural networks and introduces a Hessian-based preconditioning strategy to improve accuracy, especially for ill-conditioned functions. This method involves an affine deformation around a function's minimizer, simplifying the conjugation map and allowing a residual network to learn it more effectively. Experiments show enhanced convergence rates and numerical accuracy, particularly for challenging problems, with minimal computational overhead. AI

IMPACT Enhances numerical methods for optimization problems, potentially improving performance in machine learning tasks that rely on convex analysis.
- neural networks
- Legendre-Fenchel transform
TOOL · arXiv cs.LG English(EN) · 17h

Disjoint Generation of Synthetic Data

Researchers have introduced a novel framework for creating synthetic tabular datasets using disjoint generative models. This approach partitions data into separate subsets, each processed by distinct generative models before being combined via a joining operation that doesn't require common identifiers. The method enhances privacy, improves computational feasibility, and allows for mixed-model synthesis, achieving competitive accuracy and utility while significantly reducing re-identification risk. AI

IMPACT Introduces a new method for generating synthetic data that improves privacy and utility, potentially impacting data sharing and model training.
- Anton Danholt Lautrup