Brief

last 24h

[50/1235] 223 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · arXiv cs.CV English(EN) · 17h

DroneDAR: Long-Range Drone Distance Estimation Using Monocular Vision and Bounding-Box Features

Researchers have developed DroneDAR, a new model for estimating drone distances using monocular vision and bounding-box features. This approach is crucial for tracking and situational awareness, especially in long-range imagery where drones appear very small. DroneDAR combines a convolutional backbone with bounding-box cues via a gating mechanism to improve accuracy and robustness against factors like bounding-box noise and low texture detail. AI

IMPACT This research could improve drone tracking and situational awareness in long-range scenarios, potentially impacting surveillance and autonomous navigation systems.
- arXiv
TOOL · arXiv cs.LG English(EN) · 17h

Benchmark Datasets for Lead-Lag Forecasting on Social Platforms

Researchers have introduced a new framework called Lead-Lag Forecasting (LLF) to address the challenge of predicting future impacts based on early user interactions on social platforms. To support this research, they have created two large benchmark datasets derived from arXiv and GitHub, encompassing millions of papers and repositories respectively. These datasets are designed to capture long-term dynamics and avoid sampling biases, providing a foundation for developing and testing LLF models. AI

IMPACT Establishes a new forecasting paradigm for analyzing long-term user behavior dynamics on social platforms.
TOOL · arXiv cs.LG English(EN) · 17h

From inverse problems to neural operators: prediction, mechanism, and generalization of data-driven models

A new paper explores the relationship between traditional differential equation models and modern data-driven approaches like neural operators. It argues that many modeling strategies share a common structure, differing primarily in their assumed input-output mappings. The research suggests that only certain models are capable of true mechanism discovery and subsequent generalization, offering insights into their appropriate applications. AI

IMPACT Provides a theoretical framework for understanding and comparing different data-driven modeling approaches in scientific applications.
TOOL · arXiv cs.LG English(EN) · 17h

Lattice: A Confidence-Gated Hybrid System for Uncertainty-Aware Sequential Prediction with Behavioral Archetypes

Researchers have developed Lattice, a novel system designed for uncertainty-aware sequential prediction. This hybrid system uses confidence gating to selectively activate learned behavioral archetypes, falling back to a base model when uncertain. Experiments on datasets like MovieLens and Amazon Electronics demonstrated significant improvements in prediction accuracy, with gains of over 30% in some cases. AI

IMPACT Introduces a novel method for improving sequential prediction models by incorporating uncertainty awareness and conditional activation of learned behaviors.
- SASRec
- Lattice
- MovieLens
- Amazon Electronics
- LSTM
- transformer
- BERT4Rec
- Lorian Bannis
TOOL · arXiv cs.LG English(EN) · 17h

A Graphop Analysis of Graph Neural Networks on Sparse Graphs: Generalization and Universal Approximation

Researchers have developed a novel approach to analyzing the generalization and approximation capabilities of message passing graph neural networks (MPNNs). This new method defines a compact metric space that accommodates graphs of all sizes, both sparse and dense, which is a significant improvement over prior work that was limited to either dense graphs or uniformly bounded sparse graphs. The theory, based on graphop analysis, yields more potent universal approximation theorems and generalization bounds for MPNNs. AI

IMPACT Enhances theoretical understanding of graph neural networks, potentially leading to more robust and generalizable models for graph-based AI tasks.
TOOL · arXiv cs.LG English(EN) · 17h

Temporal Coverage over Density: Parsimonious Training-Set Design for ML Climate Downscaling

Researchers have developed a new method for training machine learning models to downscale climate data, focusing on how to select training years effectively. Their study, using the CESM2 Large Ensemble, found that training models on years distributed across the entire climate trajectory, rather than contiguous historical periods, significantly improves their ability to reproduce climate variability. This approach, even with limited data, outperforms models trained solely on historical data and suggests that broad sampling of climate states is more beneficial than temporal continuity for allocating scarce high-resolution simulation resources. AI

IMPACT Optimizes training data selection for climate models, potentially improving accuracy and efficiency in climate impact assessments.
- CESM2 Large Ensemble
TOOL · arXiv cs.LG English(EN) · 17h

TinyJudge: Unverifiable Constraint Alignment via Lightweight Specialist Ensembles

Researchers have developed TinyJudge, a new framework designed to improve instruction following in large language models (LLMs). This system utilizes an ensemble of small, specialized language models to evaluate and reward adherence to complex, often unverifiable constraints, such as tone or style. By distilling expertise from larger models into these smaller ones, TinyJudge aims to overcome limitations like reward hacking and high computational costs associated with current methods. Experiments show TinyJudge significantly outperforms existing approaches in performance and reward precision, while also reducing training time by threefold. AI

IMPACT This approach could lead to more efficient and precise alignment of LLMs with complex human instructions, potentially improving their usability in diverse applications.
- TinyJudge
- LLMs
TOOL · arXiv cs.CV English(EN) · 17h

Struct-Searcher: Agentic Structural Thinking Advances Multimodal Deep Information Seeking

Researchers have introduced Struct-Searcher, a novel agentic workflow designed for multimodal deep information seeking. This system moves beyond simple evidence accumulation by employing belief revision theory to construct and maintain an evolving multimodal structural graph. This allows Struct-Searcher to effectively handle contradictory information across different modalities, leading to improved accuracy in complex research tasks. AI

IMPACT This new agentic workflow could improve the accuracy and robustness of AI systems in complex multimodal research tasks.
TOOL · arXiv cs.LG English(EN) · 17h

SAD-Flower: Flow Matching for Safe, Admissible, and Dynamically Consistent Planning

Researchers have developed SAD-Flower, a new framework designed to enhance the safety and reliability of trajectory planning using flow matching. This method addresses limitations in existing flow matching techniques by incorporating formal guarantees for state and action constraints, as well as ensuring dynamical consistency. SAD-Flower achieves this by augmenting the flow with a virtual control input, allowing for test-time satisfaction of unseen constraints without retraining, and has demonstrated superior performance over other generative model-based baselines in experiments. AI

IMPACT Enhances safety and reliability in AI-driven planning systems, potentially enabling wider adoption in critical applications.
TOOL · arXiv cs.LG English(EN) · 17h

Solving Inverse Problems with Flow-based Models via Model Predictive Control

Researchers have developed MPC-Flow, a novel framework for solving inverse problems using flow-based generative models. This method employs model predictive control to guide the model's dynamics, making conditional generation more practical. MPC-Flow offers a spectrum of guidance algorithms, some of which bypass the need for backpropagation through the generative model's trajectory. The framework has demonstrated strong performance and scalability on image restoration tasks, including in-painting, deblurring, and super-resolution, even with large-scale models like FLUX.2 on consumer hardware. AI

IMPACT Introduces a more efficient method for conditional generation in flow-based models, potentially improving performance on tasks like image restoration.
TOOL · arXiv cs.LG English(EN) · 17h

Teacher-Free Self-Training Amplifies but Does Not Compound: A Pass@$K$ Crossover on a Free-Verifier Domain

Researchers investigated whether self-training language models on their own outputs leads to new capabilities or simply refines existing ones. Using a teacher-free setup with a generator, critic, and verifier on a Qwen3-4B model, they found that critic-guided selection improved performance. Self-training raised the performance ceiling but did not accelerate learning, with the base model eventually outperforming the self-trained model at higher computational budgets, indicating amplification rather than compounding of capabilities. AI

IMPACT This research suggests that current self-training methods may not unlock fundamentally new LLM abilities, potentially shifting focus towards architectural or data innovations for true capability breakthroughs.
- Qwen3-4B
- arXiv
TOOL · arXiv cs.LG English(EN) · 17h

Optimizing Few-Step Generation with Adaptive Matching Distillation

Researchers have developed Adaptive Matching Distillation (AMD), a new framework to improve the stability and performance of few-step generative models. AMD addresses issues in "Forbidden Zones" where existing distillation methods struggle by using reward proxies to detect and escape these problematic areas. Experiments on image and video generation tasks, including SDXL and Wan2.1, show AMD enhances sample fidelity and training robustness, notably improving the HPSv2 score on SDXL. AI

IMPACT Enhances training robustness and sample fidelity for generative models, potentially leading to more efficient and higher-quality AI-generated content.
TOOL · arXiv cs.CV English(EN) · 17h

DALE-CT: Depth-Aware Foundation Models for Computed Tomography

Researchers have developed DALE-CT, a new family of 2D foundation models for processing computed tomography (CT) data. Built from scratch using a self-supervised learning approach called LeJEPA, DALE-CT incorporates a novel 3D depth-aware pre-training strategy with both automated and human-annotated supervision. This model achieved a Macro AUROC of 0.833 on the CT-RATE dataset for multi-abnormality detection, nearing the performance of state-of-the-art 3D vision-language models with less data and no textual supervision. AI

IMPACT Introduces a novel, data-efficient approach for medical image analysis, potentially improving diagnostic accuracy in CT scans.
- CT-RATE dataset
- DALE-CT
- LeJEPA
- DINOv2
TOOL · arXiv cs.LG English(EN) · 17h

Layer-wise Derivative Controlled Networks Achieve Competitive Accuracy and Gradient Stability Across Data Regimes

Researchers have developed a new neural network architecture called Layer-wise Derivative Controlled Networks (CR) that demonstrates improved accuracy and gradient stability across various data regimes. In studies on the Pima Diabetes dataset, CR maintained a consistent accuracy advantage even with limited training data, showing significantly more stable gradient tail ratios compared to standard ReLU networks. Further experiments on the SST-5 dataset indicated competitive or superior performance in both frozen-embedding and BERT fine-tuned scenarios, outperforming existing baselines with less training data. AI

IMPACT This new architecture offers improved generalization and stability, potentially leading to more robust AI models across different data volumes and types.
TOOL · arXiv cs.LG English(EN) · 17h

The Hidden Bias of Process Reward Models:PRISM for Rewarding the Right Reasoning

Researchers have identified a significant bias in Process Reward Models (PRMs) stemming from imbalanced training data, which leads to an overemphasis on plausible but incorrect reasoning steps. This bias can actively mislead AI systems, negatively impacting tasks like guided decoding and Best-of-N selection. To combat this, a new framework called PRISM has been developed, which uses contrastive learning and hard negative examples to improve step-level modeling without requiring additional human labels, substantially reducing false positives and enhancing accuracy. AI

IMPACT Reduces false positives in AI reasoning, potentially leading to more reliable and accurate AI decision-making.
- PRISM
- Process Reward Models
TOOL · arXiv cs.CV English(EN) · 17h

EgoPriMo: Egocentric Motion Generation for Interactive Humanoid Control

Researchers have developed EgoPriMo, a new framework for generating full-body motion for humanoid robots using egocentric human demonstrations. This system takes egocentric visual observations and text prompts to reconstruct, generate, and forecast SMPL-based motion. EgoPriMo utilizes a Triple-stream DiT model that processes body dynamics, visual context, and text, enabling it to learn generalizable and interactive motion priors from diverse human actions. AI

IMPACT Enables more natural and interactive control of humanoid robots by learning from human demonstrations.
TOOL · arXiv cs.CV English(EN) · 17h

Shift-Dependent Asymmetry: Orthogonal Inverse Low-Rank Adaptation for Federated Medical Segmentation

Researchers have developed a new method called Inverse Asymmetric Tuning (IAT) to improve federated fine-tuning of medical segmentation models. Existing federated LoRA methods struggle with the inherent asymmetry between a model's encoder and decoder, leading to issues with generalization. IAT addresses this by personalizing module-specific components to handle appearance shifts in the encoder and supervision variations in the decoder, while maintaining a shared pathway for common knowledge. The method also incorporates a Subspace Orthogonality Regularizer to prevent site-specific updates from interfering with shared parameters, showing consistent improvements in experiments. AI

IMPACT Enhances federated learning techniques for medical imaging, potentially improving model generalization across different healthcare institutions.
- Inverse Asymmetric Tuning (IAT)
- Low-Rank Adaptation (LoRA)
TOOL · arXiv cs.LG English(EN) · 17h

TAMUNA: Doubly Accelerated Distributed Optimization under Partial Participation

Researchers have developed a new algorithm called TAMUNA designed to improve the efficiency of distributed optimization and federated learning. TAMUNA addresses the communication bottleneck by combining local training and data compression techniques, while also uniquely supporting partial client participation. This approach allows for doubly-accelerated convergence rates, outperforming previous methods that required all clients to be active. AI

IMPACT Introduces a novel algorithm that could enhance the efficiency of distributed AI training by allowing for partial client participation.
- Laurent Condat
TOOL · arXiv cs.LG English(EN) · 17h

Neural Legendre-Fenchel transform with Hessian Preconditioning

Researchers have developed a new method for approximating the Legendre-Fenchel transform, a key tool in convex analysis and machine learning. Their approach utilizes neural networks and introduces a Hessian-based preconditioning strategy to improve accuracy, especially for ill-conditioned functions. This method involves an affine deformation around a function's minimizer, simplifying the conjugation map and allowing a residual network to learn it more effectively. Experiments show enhanced convergence rates and numerical accuracy, particularly for challenging problems, with minimal computational overhead. AI

IMPACT Enhances numerical methods for optimization problems, potentially improving performance in machine learning tasks that rely on convex analysis.
- Legendre-Fenchel transform
- neural networks
TOOL · arXiv cs.LG English(EN) · 17h

Disjoint Generation of Synthetic Data

Researchers have introduced a novel framework for creating synthetic tabular datasets using disjoint generative models. This approach partitions data into separate subsets, each processed by distinct generative models before being combined via a joining operation that doesn't require common identifiers. The method enhances privacy, improves computational feasibility, and allows for mixed-model synthesis, achieving competitive accuracy and utility while significantly reducing re-identification risk. AI

IMPACT Introduces a new method for generating synthetic data that improves privacy and utility, potentially impacting data sharing and model training.
- Anton Danholt Lautrup
TOOL · arXiv cs.LG English(EN) · 17h

Towards Automated Kernel Generation in the Era of LLMs

A new survey paper explores the use of large language models (LLMs) and agentic systems for automating the generation and optimization of GPU kernels. These kernels are crucial for the performance of AI systems, but their manual creation is a time-consuming and non-scalable process. The paper aims to provide a structured overview of current LLM-driven approaches, datasets, and benchmarks, while also outlining future research directions in this rapidly evolving field. AI

IMPACT Automating GPU kernel generation with LLMs could significantly accelerate AI system development and performance.
- GPU kernels
- Yang Yu
- LLMs
TOOL · arXiv cs.CV English(EN) · 17h

TIDE: Task-Isolated Diffusion for Unified Video Editing and Generation

Researchers have developed TIDE, a novel framework designed to unify video editing and generation tasks within a single model. TIDE utilizes per-token task embeddings to differentiate between various conditioning inputs, such as target, source, and reference tokens. The framework also employs a dual-path conditioning scheme and a progressive multi-task training strategy to enhance its ability to handle diverse video manipulation objectives and achieve state-of-the-art results across multiple benchmarks. AI

IMPACT Introduces a unified framework for video editing and generation, potentially simplifying workflows and improving performance across diverse tasks.
TOOL · arXiv cs.CV English(EN) · 17h

Crayotter: Traceable Multi-Agent Workflows for Long-Form Video Editing

Researchers have developed Crayotter, an open-source system designed to streamline long-form video editing through a multi-agent approach. This system organizes the editing process into distinct phases, ensuring narrative intent is maintained and providing detailed artifacts for traceability and failure diagnosis. Evaluations show Crayotter outperforms existing tools in theme alignment, narrative coherence, and editing smoothness. AI

IMPACT Introduces a novel multi-agent system for video editing, potentially improving efficiency and quality in content creation workflows.
TOOL · arXiv cs.CV English(EN) · 17h

Harnessing Streaming Video in the Wild

Researchers have developed a new framework called Streaming Harness to enable Vision-Language Models (VLMs) to process unbounded video streams in real-time. This system enhances VLMs with proactive interaction, long-term memory retention up to 12 hours, and sub-second processing latency. To support this advancement, they also introduced a new streaming dataset, Streaming-Train-248K, and a benchmark, Streaming-Eval, to drive further progress in deployable streaming intelligence. AI

IMPACT Enables real-time analysis of live video feeds for applications like assistants and robotics, moving beyond offline video understanding.
TOOL · arXiv cs.LG English(EN) · 17h

CAAL: Contextual Bandits based Online Hand-Craft Active Learning Strategy Selection

Researchers have developed a new active learning strategy called CAAL, which uses contextual bandits to dynamically select the best hand-crafted strategy for labeling data. This approach addresses the challenge of uncertain data distributions by predicting rewards based on external context information. CAAL has demonstrated superior performance compared to existing adaptive strategies on public datasets, with results remaining consistent across different batch sizes. AI

IMPACT Introduces a novel method for improving data labeling efficiency in machine learning.
- CAAL
- Contextual Adaptive Active Learning
TOOL · arXiv cs.LG English(EN) · 17h

Fourier Neural Operators with rank-1 lattice points and hyperbolic cross

Researchers have developed a new approach to Fourier Neural Operators (FNOs) that improves their efficiency and accuracy. By replacing standard tensor product grids with rank-1 lattice points and using a hyperbolic cross frequency index set, the method requires fewer parameters and training samples. This lattice-based hyperbolic-cross FNO architecture simplifies the high-dimensional Fourier transform into a single one-dimensional fast Fourier transform, demonstrating benefits for solving partial differential equations. AI

IMPACT This research could lead to more efficient and accurate AI models for scientific simulations and complex problem-solving.
TOOL · arXiv cs.CV English(EN) · 17h

Beyond Scalar Rewards by Internalizing Reasoning into Score Distributions

Researchers have developed a new framework called Z-Reward for improving text-to-image generation models. This system uses a teacher-student approach where a large vision-language model (VLM) acts as the teacher, inferring score distributions based on reasoning. A smaller student VLM is then trained to mimic these distributions, enabling efficient reward deployment without requiring explicit reasoning during inference. The Z-Reward framework demonstrated significant improvements in human preference accuracy compared to existing methods and enhanced text-to-image optimization. AI

IMPACT Introduces a novel reward modeling technique that could enhance the quality and controllability of text-to-image generation models.
TOOL · arXiv cs.LG English(EN) · 17h

Learning to Solve Generative ODEs Beyond the Linear Span

Researchers have developed SpanLift, a new neural solver designed to improve the efficiency of generative models. Current models integrate learned Ordinary Differential Equations (ODEs), but this process is slow due to the need for many sequential evaluations. SpanLift addresses this by augmenting standard updates with a spatial residual operator, allowing it to capture components beyond the linear span of buffered velocity evaluations. This method has demonstrated state-of-the-art few-step sampling across various applications, significantly improving metrics like FID scores on datasets such as CIFAR-10 and ImageNet with minimal model evaluations. AI

IMPACT Improves sampling efficiency for generative models, potentially reducing computational costs and enabling faster generation of high-quality outputs.
TOOL · arXiv cs.LG English(EN) · 17h

Pointwise Complexity for Gaussian Fields: Upper Envelopes, Algorithmic Lower Bounds, and Separation

Researchers have developed a new theorem for understanding Gaussian processes, offering a more precise high-probability envelope for the entire field rather than just a scalar quantity. This theorem refines existing generic chaining methods and provides a Gaussian process equivalent to pointwise empirical-process bounds used in deep neural networks. Additionally, the study introduces a Bayesian algorithmic lower envelope derived from the interactive Fano/data-processing principle, which offers local-geometric certificates of pointwise complexity for estimators in overparameterized classes. AI

IMPACT Provides theoretical underpinnings for understanding complexity in AI models, potentially improving estimator design.
TOOL · arXiv cs.LG English(EN) · 17h

OTora: A Unified Red Teaming Framework for Reasoning-Level Denial-of-Service in LLM Agents

Researchers have developed OTora, a novel framework designed to test the resilience of large language model (LLM) agents against a specific type of attack known as Reasoning-Level Denial-of-Service (R-DoS). This attack method aims to degrade an agent's performance by artificially increasing its reasoning depth or tool usage, rather than by causing outright task failure. OTora employs a two-stage process, utilizing adversarial triggers and genetic search to amplify overthinking while maintaining task accuracy, demonstrating significant latency increases on various agent benchmarks. AI

IMPACT This research highlights a new vulnerability in LLM agents, potentially impacting the reliability and efficiency of deployed AI systems.
TOOL · arXiv cs.LG English(EN) · 17h

Overcoming the Limits of Finite Difference Method; Physics-Informed Neural Network for Noisy High-Dimensional Heat Diffusion

Researchers have developed a Physics-Informed Neural Network (PINN) framework to address the limitations of traditional numerical methods like the Finite Difference Method (FDM) when dealing with noisy, high-dimensional heat diffusion problems. In simulations with 20% boundary noise in 3D, the PINN maintained approximately 91% accuracy, while FDM accuracy dropped to 36%. The PINN also demonstrated superior performance in a physical copper thermal system, reducing boundary reconstruction error by 3.3 times under realistic noise conditions, and proved more efficient than FDM in 3D scenarios. AI

IMPACT PINN framework offers a more accurate and efficient solution for complex thermal simulations, potentially impacting engineering and scientific modeling.
TOOL · arXiv cs.CV English(EN) · 17h

Vision-Language Work Zone Intelligence for Safety-Critical Speed Regulation of Mixed-Autonomy Vehicles in Dynamic Environments

Researchers have developed a new system to improve safety in work zones for both human drivers and autonomous vehicles. The system uses onboard perception to detect active work zones and recognize temporary speed limits, even when signage is inconsistent or missing from digital maps. It fuses object detection with semantic verification and temporal smoothing to ensure reliable operation in dynamic environments, running on low-cost embedded hardware. AI

IMPACT This system could significantly improve safety in dynamic work zones by providing real-time speed limit awareness to both human and autonomous drivers.
- ROADWork dataset
- Angel Martinez-Sanchez
TOOL · arXiv cs.LG English(EN) · 17h

DHAuDS: A Dynamic and Heterogeneous Audio Benchmark for Test-Time Adaptation

Researchers have introduced DHAuDS, a new benchmark suite designed to evaluate the robustness of test-time adaptation (TTA) in audio classification. Unlike existing benchmarks that use static and homogeneous corruption protocols, DHAuDS models realistic heterogeneous acoustic degradation under dynamic corruption severity. The goal is to provide a more accurate assessment of TTA algorithms' real-world performance by exposing limitations that are masked by conventional evaluation methods. AI

IMPACT Provides a more realistic evaluation framework for audio AI models, potentially leading to more robust real-world applications.
- Weichuang Shao
- DHAuDS
TOOL · arXiv cs.LG English(EN) · 17h

Causal Representation Learning from Network Data

Researchers have developed GraCE-VAE, a novel graph-aware causal discrepancy variational autoencoder designed to improve causal disentanglement from soft interventions. This method leverages known interaction networks, such as biological pathways, as an auxiliary view to enhance inference. Experiments on CRISPR perturbation datasets show that incorporating structured biological context leads to better predictions of interventional outcomes, even for novel perturbation combinations. AI

IMPACT Enhances causal inference capabilities by integrating network structures, potentially improving predictive accuracy in complex systems.
- GraCE-VAE
- Jifan Zhang
TOOL · arXiv cs.LG English(EN) · 17h

ForcingDAS: Unified and Robust Data Assimilation via Diffusion Forcing

Researchers have developed ForcingDAS, a new framework for data assimilation that unifies filtering and smoothing approaches. This method uses Diffusion Forcing to learn a joint-trajectory prior, which helps in capturing long-horizon temporal dependencies and reducing error accumulation, unlike traditional frame-to-frame transition models. ForcingDAS has demonstrated competitive or superior performance compared to specialized baselines across various applications, including weather forecasting and atmospheric state estimation, by using a single trained model for the entire spectrum of inference tasks. AI
- Yixuan Jia
- ForcingDAS
TOOL · arXiv cs.CV English(EN) · 17h

IMAGINE: Adaptive Schema-Imagery Enhanced Composition for Composed Video Retrieval

Researchers have developed IMAGINE, a novel network designed for Composed Video Retrieval (CVR) and Composed Image Retrieval (CIR). This system addresses the limitation of existing methods by incorporating implicit semantic information, which is often conveyed through visually related cues rather than explicit representations. IMAGINE utilizes dynamic multimodal prototypes to capture these shared latent concepts, adaptively modulating visual features to guide the retrieval process more effectively. The approach has demonstrated state-of-the-art performance on three major benchmarks for both CVR and CIR tasks. AI

IMPACT Enhances video and image retrieval by incorporating implicit semantic understanding, potentially improving search accuracy in multimodal AI systems.
TOOL · arXiv cs.CV English(EN) · 17h

Less Is More: Training-Free Acceleration Framework of 3D Diffusion Models for Low-Count PET Denoising via Global-Local Trajectory Reduction

Researchers have developed a novel framework to accelerate 3D diffusion models for low-count PET image denoising. This training-free approach, called the Global-Local Skipping Strategy, significantly reduces inference latency without requiring model retraining. The method employs a global denoising step skipping strategy and a local feature reuse shortcut to achieve over an order of magnitude acceleration while maintaining or improving reconstruction quality. Blinded reader studies confirmed enhanced clinical confidence and diagnostic quality. AI

IMPACT Accelerates AI model inference for medical imaging, potentially enabling faster and more accurate diagnoses from lower-radiation PET scans.
- 3D Diffusion Models
- Global-Local Skipping Strategy
RESEARCH · Mastodon — fosstodon.org English(EN) · 5h · [2 sources]

🔥 TRENDING 📢 34. Mucosal Trained Immunity-based Vaccines as Immunotherapy Against Respiratory Infections - springerprofessional.de 🔗 https:// news.google.com/rs

A research paper explores a conceptual framework for integrating generative AI into organizations, moving beyond simple adoption strategies. The paper, published by Springer Professional, delves into the nuances of how businesses can effectively implement and leverage generative AI technologies. It aims to provide a structured approach for organizations navigating the complexities of AI integration. AI

IMPACT Provides a structured approach for organizations to effectively implement and leverage generative AI technologies.
- Generative AI
- Springer Professional
TOOL · Mastodon — sigmoid.social English(EN) · 6h

Part 6 of my # ReinforcementLearning math series is live! Dynamic Programming iteratively solves the Bellman optimality equations, but requires knowing the envi

This article is the sixth installment in a series on the mathematics of reinforcement learning. It focuses on dynamic programming, a method for solving the Bellman optimality equations. The author notes that dynamic programming requires prior knowledge of the environment's dynamics. AI

IMPACT Explains a core mathematical technique used in reinforcement learning.
TOOL · arXiv cs.CV English(EN) · 17h

Hyperspectral Smoke Segmentation via Mixture of Prototypes

Researchers have developed a new method for hyperspectral smoke segmentation, crucial for wildfire management and industrial safety. Existing visible-light methods struggle with semi-transparent smoke and cloud interference. The proposed Mixture of Prototypes (MoP) network addresses spectral contamination, limited pattern modeling, and complex weighting issues by employing band splitting, prototype-based spectral representation, and a dual-stage router for adaptive band weighting. This approach demonstrates superior performance on both hyperspectral and multispectral data, establishing a new standard for spectral-based smoke segmentation. AI

IMPACT This research could lead to more accurate wildfire detection and industrial safety monitoring systems.
TOOL · arXiv cs.CV English(EN) · 17h

Hummus: A Dataset of Humorous Multimodal Metaphor Use

Researchers have introduced the Hummus Dataset, a new collection of 1,000 image-caption pairs designed to evaluate multimodal large language models (MLLMs) on their understanding of humorous multimodal metaphors. The dataset, inspired by theories of humor and metaphor, was created using an expert-developed annotation scheme. Initial experiments using the Hummus Dataset revealed that current MLLMs struggle to effectively integrate visual and textual information to comprehend humorous multimodal metaphors. AI

IMPACT Highlights current limitations in AI's ability to understand nuanced humor and metaphor, indicating areas for future model development.
TOOL · arXiv cs.CV English(EN) · 17h

Embedded Graph Convolutional Networks for Real-Time Event Data Processing on SoC FPGAs

Researchers have developed an embedded graph convolutional network (EFGCN) specifically designed for real-time event data processing on System-on-Chip (SoC) FPGAs. This approach significantly reduces model size, by up to 100-fold compared to previous methods, while maintaining competitive accuracy on classification tasks. The EFGCN achieves high throughput and low latency, making it suitable for embedded systems, particularly in the automotive sector. AI

IMPACT Enables more efficient real-time AI processing on edge devices with limited resources.
- ZCU104
- TinyML
- EFGCN
- SoC FPGAs
- PointNetConv
- AEGNN
- N-Caltech101
TOOL · arXiv cs.CV English(EN) · 17h

Polaffini: A feature-based approach for robust affine and polyaffine image registration

Researchers have introduced Polaffini, a new framework for robust medical image registration that leverages deep learning advancements. This approach uses centroids of segmented anatomical regions to establish feature points, enabling efficient affine and polyaffine transformations. Polaffini demonstrates superior structural alignment and provides improved initialization for subsequent non-linear registration, outperforming traditional intensity-based methods in speed and accuracy. AI

IMPACT Enhances medical image processing pipelines with more accurate and efficient registration techniques.
- Antoine Legouhy
- Polaffini
TOOL · arXiv cs.CV English(EN) · 17h

Region-Wise Correspondence Prediction between Manga Line Art Images

Researchers have developed a novel Transformer-based framework to predict region-wise correspondences between manga line art images. This method addresses the challenge of aligning sparse black-and-white strokes, which lack the rich visual cues found in natural images. The system achieves high accuracy in patch-level feature alignment and robust region-level correspondence, demonstrating potential for applications in manga colorization and animation. AI

IMPACT This method could improve efficiency and quality in digital manga and animation production pipelines.
- Transformer
- Yingxuan Li
TOOL · arXiv cs.CV English(EN) · 17h

Muses: Designing, Composing, Generating Nonexistent Fantasy 3D Creatures without Training

Researchers have developed Muses, a novel method for generating 3D fantasy creatures without requiring any training data. This approach utilizes a 3D skeleton to guide the composition and generation of diverse elements, ensuring a coherent structure and appearance. Muses integrates design, composition, and generation into a unified pipeline, starting with a graph-constrained reasoning process to create a well-structured skeleton, followed by a voxel-based assembly within a latent space, and concluding with appearance modeling for style-consistent texturing. The method demonstrates state-of-the-art performance in visual fidelity and alignment with textual descriptions. AI

IMPACT Introduces a training-free method for 3D asset generation, potentially simplifying content creation pipelines.
- Hexiao Lu
- Muses
TOOL · arXiv cs.CV English(EN) · 17h

CardioMorphNet: Cardiac Motion Prediction Using a Shape-Guided Bayesian Recurrent Deep Network

Researchers have developed CardioMorphNet, a novel Bayesian recurrent deep learning framework for predicting cardiac motion from short-axis cardiac MRI images. This method utilizes a recurrent variational autoencoder and posterior models for segmentation and motion estimation, guiding the network to focus on anatomical regions without relying on intensity-based registration. CardioMorphNet has demonstrated superior performance in motion estimation and clinical index accuracy compared to existing state-of-the-art methods, while also providing uncertainty maps for its predictions. AI

IMPACT This new framework offers improved accuracy and uncertainty assessment for cardiac motion estimation, potentially aiding in earlier and more precise diagnosis of cardiac abnormalities.
TOOL · arXiv cs.CV English(EN) · 17h

COMPASS: Complete Multimodal Fusion via Proxy Tokens and Shared Spaces for Ubiquitous Sensing

Researchers have developed COMPASS, a novel framework designed to enhance multimodal sensing by addressing the challenge of missing data modalities. This system ensures a consistent fusion interface by using proxy tokens to fill in absent modalities with estimated representations derived from the observed ones. COMPASS demonstrates improved robustness across various datasets and missing modality scenarios, outperforming traditional imputation and translation-based methods. AI

IMPACT Enhances robustness in multimodal AI systems by providing a consistent method for handling missing data during fusion.
- Hao Wang
TOOL · arXiv cs.CV English(EN) · 17h

STGBD-Net: Spatio-temporal Gradient Basis Decomposition Network for Infrared Small Target Detection

Researchers have developed a novel framework for infrared small target detection (IRSTD) called STGBD-Net, which utilizes Basis Decomposition Theory to improve feature fusion. This approach reformulates the process into an adaptive decomposition-and-reconstruction paradigm, employing Gradient Decomposition Modules (GDMs) to treat normalized gradient features as basis vectors. The resulting networks, including spatial and spatio-temporal variants, demonstrate state-of-the-art performance on multiple benchmarks with enhanced accuracy and computational efficiency. AI

IMPACT Introduces a novel approach to feature fusion for improved accuracy and efficiency in infrared small target detection.
- STGBD-Net
TOOL · arXiv cs.CV English(EN) · 17h

Chain of Flow: ECG-Conditioned 4D Cardiac Cine Generation from Patient-Specific Anatomical Anchor

Researchers have developed a new framework called Chain of Flow (COF) that generates 4D cardiac cine images using electrocardiography (ECG) and patient-specific MRI data. This method aims to provide functional cardiac assessment even when a complete cine sequence is not readily available. COF has demonstrated strong performance on the UK Biobank dataset, showing stable image quality and reliable downstream functional analysis, with potential applications in serial patient monitoring. AI

IMPACT Enables more accessible and comprehensive cardiac functional assessment through AI-driven image synthesis.
- UK Biobank
- Haofan Wu
TOOL · arXiv cs.CV English(EN) · 17h

HiMat: DiT-based Ultra-High Resolution SVBRDF Generation

Researchers have developed HiMat, a new framework for generating ultra-high-resolution (4K) spatially varying bidirectional reflectance functions (SVBRDFs). This method addresses the computational and memory challenges of creating detailed 3D content by operating in a compressed latent space and using a diffusion transformer with linear attention for efficiency. HiMat also incorporates a novel convolutional module called CrossStitch to ensure consistency across different reflectance maps without the overhead of global attention, outperforming prior methods in fidelity, efficiency, and diversity. AI

IMPACT Enables more efficient and detailed 3D content creation, potentially impacting real-time rendering and virtual environments.
- Zixiong Wang