Brief

last 24h

[50/286] 186 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · arXiv cs.LG · 1d

Fast and Stable Triangular Inversion for Delta-Rule Linear Transformers

Researchers have developed a new method for triangular inversion, a crucial operation in linear attention mechanisms used by advanced models like Qwen3.5/3.6 and Kimi Linear. This technique significantly improves the speed and numerical stability of this sub-routine, which is often a performance bottleneck. Experiments show up to a 4.3x speed-up on NPUs compared to existing implementations, leading to overall layer performance gains without sacrificing accuracy. AI

IMPACT Improves efficiency of linear attention mechanisms, potentially enabling faster and more accurate long-context models.
TOOL · arXiv cs.AI · 1d

SAVER: Selective As-Needed Vision Evidence for Multimodal Information Extraction

Researchers have developed SAVER, a novel framework designed to improve multimodal information extraction from social media posts. This system selectively uses visual evidence only when necessary, preventing computational waste and the amplification of misleading visual cues. SAVER employs a Conformal Groundability Gate to determine the relevance of images and a submodular selector to choose the most pertinent subset for analysis, ultimately enhancing accuracy while reducing processing load and latency. AI

IMPACT This research introduces a more efficient approach to multimodal information extraction, potentially improving the accuracy and speed of AI systems analyzing social media content.
- Conformal Groundability Gate
- Set Transformer
TOOL · arXiv cs.AI · 1d

Heartbeat-Bound Hierarchical Credentials: Cryptographic Revocation for AI Agent Swarms

Researchers have developed a new cryptographic protocol called Heartbeat-Bound Hierarchical Credentials (HBHC) to address the safety gap in autonomous AI agent swarms. This protocol binds credential validity to periodic liveness proofs from parent agents, enabling rapid revocation without requiring network connectivity to a central authority. Experiments with GPT-4o-mini agent swarms demonstrated a significant reduction in the 'zombie agent' window, with zero post-revocation tool calls observed even under prompt injection attacks. AI

IMPACT Enhances AI agent safety by enabling rapid revocation of credentials, preventing unauthorized actions from 'zombie agents'.
TOOL · arXiv cs.LG · 1d

Optimized Federated Knowledge Distillation with Distributed Neural Architecture Search

Researchers have developed FedKDNAS, a novel federated learning framework that optimizes model selection and knowledge distillation for heterogeneous client devices. This approach allows each client to autonomously choose a lightweight model tailored to its specific accuracy and resource constraints. The framework then uses a hybrid objective for training, incorporating both supervised learning and knowledge distillation, and shares only predictions on a public reference set. Evaluations show FedKDNAS significantly improves accuracy under non-IID conditions, reduces CPU usage, and drastically cuts communication overhead compared to existing baselines. AI

IMPACT Enhances federated learning efficiency and accuracy on heterogeneous devices, potentially accelerating collaborative AI development.
TOOL · arXiv cs.AI · 1d

TextReg: Mitigating Prompt Distributional Overfitting via Regularized Text-Space Optimization

Researchers have developed TextReg, a new regularization framework designed to address prompt distributional overfitting in large language models. This method aims to improve how prompts generalize to new data by controlling representation in text-space optimization. TextReg combines several techniques, including dual-evidence gradient purification and semantic edit regularization, to achieve better out-of-distribution performance. AI

IMPACT Improves out-of-distribution generalization for LLMs, potentially leading to more robust AI applications.
- LLMs
- TextGrad
- TextReg
TOOL · arXiv cs.LG · 1d

A New Framework to Analyse the Distributional Robustness of Deep Neural Networks

Researchers have developed a new framework to analyze the distributional robustness of deep neural networks, a key challenge for real-world AI deployment. The framework models interactions between layer weights and activations using Bernoulli distributions, with class separation serving as a proxy for robustness. Experiments on CIFAR-10 and ImageNet demonstrate that the proposed metrics can differentiate between networks that have memorized training data and those that have not, and show that distributional shifts reduce separation. AI

IMPACT Provides new diagnostic tools for understanding and improving the reliability of AI models when faced with changing data distributions.
TOOL · arXiv cs.AI · 1d

Deformba: Vision State Space Model with Adaptive State Fusion

Researchers have introduced Deformba, a novel vision state space model designed to overcome limitations in applying SSMs to visual tasks. Deformba addresses the challenges of fixed scanning methods and the difficulty in fusing distinct information streams by employing adaptive state fusion. This approach dynamically enhances spatial structural information while preserving the linear complexity of SSMs and enabling multi-modal fusion. AI

IMPACT Introduces a new architecture for vision tasks that may improve efficiency and fusion capabilities.
TOOL · arXiv cs.CV · 1d

Hyper-V2X: Hypernetworks for Estimating Epistemic and Aleatoric Uncertainty in Cooperative Bird's-Eye-View Semantic Segmentation

Researchers have developed Hyper-V2X, a novel framework utilizing hypernetworks to estimate both epistemic and aleatoric uncertainties in cooperative semantic segmentation for autonomous driving. This approach conditions a Bayesian hypernetwork on fused multi-agent features from V2X communication to generate weight distributions for stochastic Bird's-Eye-View segmentation. The method is architecture-agnostic and demonstrated on the OPV2V benchmark to provide accurate uncertainty estimates with minimal computational overhead, enhancing overall perception reliability. AI

IMPACT Enhances reliability of autonomous driving perception systems by providing accurate uncertainty estimates.
- autonomous driving
- V2X
- OPV2V
- CoBEVT
- Hyper-V2X
TOOL · arXiv cs.AI · 1d

Declarative Data Services: Structured Agentic Discovery for Composing Data Systems

Researchers have developed Declarative Data Services (DDS), a new architecture designed to improve how AI agents discover and compose data systems. Traditional agentic discovery methods struggle with the complexity and heterogeneity of data backends. DDS addresses this by using a layered contract system that breaks down the search into smaller, manageable sub-searches, enabling more consistent convergence on functional data stacks. AI

IMPACT Introduces a structured approach to agentic discovery for data systems, potentially improving AI's ability to compose complex data backends.
TOOL · arXiv cs.AI · 1d

From Circuit Evidence to Mechanistic Theory: An Inductive Logic Approach

Researchers have developed a formal framework for cumulative mechanistic science in neural networks, treating circuit interpretation as inductive theory construction. This approach uses Causal Functional Signatures (CFS) and architectural signatures learned via inductive logic programming (ILP) to make mechanistic claims explicit and comparable. The system demonstrates improved structural separation compared to baseline methods and supports transferability across different model scales and architectures. AI

IMPACT Provides a formal infrastructure for cumulative mechanistic science, enabling more systematic and comparable analysis of neural network circuits.
TOOL · arXiv cs.AI · 1d

DIVE: Embedding Compression via Self-Limiting Gradient Updates

Researchers have developed DIVE, a new method for compressing high-dimensional embeddings from large language models to reduce storage and computational costs in vector search systems. Unlike previous methods that overfit with scarce labeled data, DIVE uses a self-limiting triplet loss to bound perturbations and a contrastive loss to provide dense self-supervised gradients. This approach reportedly outperforms existing compression adapters across multiple datasets and compression ratios, with an open-source implementation available. AI

IMPACT This new embedding compression technique could significantly reduce the resource requirements for deploying and scaling vector search systems, making LLM-powered applications more efficient.
TOOL · arXiv cs.LG · 1d

Automatic Discovery of Disease Subgroups by Contrasting with Healthy Controls

Researchers have developed a new method called Deep UCSL for identifying distinct subgroups within patient populations by contrasting them with healthy controls. This approach uses a deep feature extractor to learn a representation space that isolates disease-specific factors, ignoring common variations shared with healthy individuals. The method optimizes a novel loss function through an Expectation-Maximization strategy and has shown quantitative improvements in subgroup quality on both synthetic and real medical imaging datasets. AI

IMPACT Introduces a novel contrastive learning approach for more precise disease subgroup identification in medical imaging.
- arXiv
- Deep UCSL
TOOL · arXiv cs.AI · 1d

TimeSRL: Generalizable Time-Series Behavioral Modeling via Semantic RL-Tuned LLMs -- A Case Study in Mental Health

Researchers have developed TimeSRL, a novel two-stage framework that leverages Large Language Models (LLMs) for generalizable time-series behavioral modeling. This approach first abstracts raw data into natural language semantic concepts, then predicts outcomes solely from these abstractions, aiming for better cross-dataset generalization. Optimized using Reinforcement Learning from Verifiable Rewards, TimeSRL demonstrates state-of-the-art performance in mental health prediction, significantly outperforming existing methods in cross-cohort generalization and transfer learning. AI

IMPACT Introduces a novel method for improving generalization in time-series analysis, potentially impacting fields requiring robust behavioral modeling.
TOOL · arXiv cs.CL · 1d

Beyond Semantic Similarity: A Two-Phase Non-Parametric Retrieval Workflow for Corporate Credit Underwriting

Researchers have developed a novel two-phase retrieval system designed to improve corporate credit underwriting by addressing the limitations of standard RAG pipelines. This new workflow separates candidate retrieval from utility ranking, using an adaptive controller and an LLM-as-a-Judge to prioritize passages based on analytical usefulness rather than just semantic similarity. Deployed on-premise for data governance, the system has been shown to drastically reduce document review times for analysts, from hours to minutes, by preserving structural fidelity across various document types. AI

IMPACT This new retrieval workflow could significantly accelerate decision-making in document-intensive fields like corporate credit underwriting.
- LLM-as-a-Judge
- arXiv
TOOL · arXiv cs.CV · 1d

DriveMA: Rethinking Language Interfaces in Driving VLAs with One-Step Meta-Actions

Researchers have introduced DriveMA, a new approach for driving vision-language-action models that replaces complex natural language reasoning with simpler, one-step meta-actions. This method addresses bottlenecks in annotation, model complexity, and inference latency associated with traditional reasoning-centric interfaces. DriveMA achieves new state-of-the-art results on the Waymo End-to-End Driving Challenge, demonstrating the effectiveness of its action-centric supervised training and reinforcement learning framework. AI

IMPACT Simplifies driving AI interfaces, potentially improving efficiency and scalability for autonomous vehicle development.
TOOL · arXiv cs.CV (CA) · 1d

Let EEG Models Learn EEG

Researchers have developed a new framework called Just EEG Transformer (JET) for generating high-fidelity electroencephalogram (EEG) data. Unlike previous methods that use discrete denoising objectives, JET models EEG as continuous temporal sequences, better capturing the inherent dynamics and spectral structure of neural activity. This approach allows JET to preserve long-range temporal dependencies and generate more realistic signals, achieving over 40% reduction in TS-FID compared to existing baselines across multiple benchmarks. AI

IMPACT Enables more realistic EEG data generation, potentially accelerating research in neural modeling and brain-computer interfaces.
- arXiv
- Just EEG Transformer (JET)
TOOL · arXiv cs.AI · 1d

MONET: A Massive, Open, Non-redundant and Enriched Text-to-image dataset

Researchers have introduced MONET, a new open dataset designed to facilitate text-to-image model training. The dataset comprises approximately 104.9 million image-text pairs, meticulously curated through stages of filtering, deduplication, and re-captioning. MONET aims to lower the barriers for large-scale, reproducible research in text-to-image generation by providing a high-quality, enriched corpus. AI

IMPACT Provides a large, open dataset to accelerate research and development in text-to-image generation models.
- Clément Chadebec
TOOL · arXiv cs.CV · 1d

Vision Transformers and Convolutional Neural Networks for Land Use Scene Classification

A new research paper compares the effectiveness of Vision Transformers (ViTs) and Convolutional Neural Networks (CNNs) for land use scene classification using remote sensing imagery. The study evaluated AlexNet and ViT on the UC Merced Land Use and EuroSAT datasets, analyzing metrics like accuracy, precision, recall, and F1-score. Results indicate that CNNs are more robust with limited data and strong local textures, while ViTs excel at capturing global spatial relationships with sufficient training data, though they require more computational resources. AI

IMPACT Provides insights for selecting appropriate deep learning models for remote sensing land use classification tasks.
TOOL · arXiv cs.AI · 1d

How Much Online RL is Enough? Informative Rollouts for Offline Preference Optimization in RLVR

Researchers have developed G2D, a novel three-stage pipeline that combines a short online reinforcement learning (RL) warm-up with offline fine-tuning for language models. This approach aims to mitigate the computational expense of continuous online rollouts required by methods like GRPO. By constructing a static preference dataset after a brief GRPO phase and then using DPO for offline training, G2D has shown to match or exceed the performance of GRPO at a significantly reduced compute cost. AI

IMPACT Reduces computational costs for training language models using RLVR, making advanced techniques more accessible.
TOOL · arXiv cs.LG · 1d

FedCoE: Bridging Generalization and Personalization via Federated Coordinated Dual-level MoEs

Researchers have introduced FedCoE, a novel framework for Federated Learning that aims to balance global generalization with local personalization. Unlike traditional methods that struggle with non-IID data or overfit to local information, FedCoE utilizes a dual-level Mixture-of-Experts approach. This system maintains independent global expert models and uses a shared gating network to manage client-expert correlations, preventing expert drift. FedCoE also includes an adaptive mechanism to help new clients quickly utilize global experts without extensive local training, showing significant accuracy improvements in both general and cold-start scenarios. AI

IMPACT Introduces a new method to improve federated learning performance, potentially enabling more robust and personalized AI models in distributed environments.
TOOL · arXiv cs.CL · 1d

Reliable Automated Triage in Spanish Clinical Notes: A Hybrid Framework for Risk-Aware HIV Suspicion Identification

Researchers have developed a hybrid framework for identifying potential HIV cases in Spanish clinical notes, addressing the limitations of standard NLP benchmarks that can overstate accuracy on ambiguous data. This new approach uses a dual-verification method, combining conformal prediction for aleatoric uncertainty and a Mahalanobis distance veto for epistemic uncertainty. The framework aims to establish a reliable operational domain for medical triage by ensuring clinical narratives meet both probabilistic and geometric safety standards, outperforming traditional uncertainty metrics and classifiers. AI

IMPACT Introduces a novel risk-aware NLP framework for safer medical triage, potentially improving diagnostic accuracy in sensitive clinical applications.
TOOL · arXiv cs.LG · 1d

On the Cost and Benefit of Chain of Thought: A Learning-Theoretic Perspective

Researchers have developed a new learning-theoretic framework to understand Chain of Thought (CoT) reasoning in AI models. This framework models CoT as an interaction between an answer map and a chain rule that generates intermediate questions. The framework decomposes the reasoning risk into two components: the benefit of CoT (oracle-trajectory risk) and the cost of CoT (trajectory-mismatch risk) due to error accumulation. AI

IMPACT Provides a theoretical understanding of Chain of Thought, potentially guiding future model development for more reliable reasoning.
- arXiv
TOOL · arXiv cs.CV · 1d

IndusAgent: Reinforcing Open-Vocabulary Industrial Anomaly Detection with Agentic Tools

Researchers have introduced IndusAgent, a novel framework designed to enhance open-vocabulary industrial anomaly detection using agentic tools. This system addresses limitations in multimodal large language models by integrating domain-specific reasoning and external tools for clearer visual interpretation. IndusAgent utilizes a structured dataset, Indus-CoT, and a reinforcement learning objective to optimize anomaly classification, localization, and efficient tool usage, achieving state-of-the-art zero-shot performance across multiple benchmarks. AI

IMPACT Enhances zero-shot anomaly detection capabilities in industrial settings, potentially improving quality control and reducing manual inspection needs.
TOOL · arXiv cs.CV · 1d

DarkShake-DVS: Event-based Human Action Recognition under Low-light andShaking Camera Conditions

Researchers have introduced DarkShake-DVS, a new benchmark dataset designed for human action recognition in challenging low-light and high-motion scenarios. The dataset includes over 18,000 real-world clips captured with synchronized IMU data to address limitations in existing event-based vision research. They also propose EIS-HAR, a novel method that combines motion compensation with a hybrid architecture for improved spatiotemporal feature extraction and action recognition. AI

IMPACT Introduces a new benchmark and method to improve AI's ability to recognize actions in challenging real-world conditions.
TOOL · arXiv cs.CV · 1d

Local-sensitive connectivity filter (ls-cf): A post-processing unsupervised improvement of the frangi, hessian and vesselness filters for multimodal vessel segmentation

Researchers have developed a new unsupervised method called the local-sensitive connectivity filter (LS-CF) to improve the segmentation of retinal blood vessels. This technique enhances existing filters like the Frangi filter by addressing discontinuities and ensuring pixel-level continuity. The LS-CF demonstrated superior performance on several multimodal datasets, outperforming state-of-the-art approaches in accuracy on the OSIRIX and IOSTAR datasets, and showing competitive results on DRIVE, STARE, and CHASE-DB. AI

IMPACT Introduces a novel unsupervised method for medical image analysis, potentially improving diagnostic accuracy in ophthalmology.
TOOL · arXiv cs.LG · 1d

Graph Navier Stokes Networks

Researchers have introduced Graph Navier Stokes Networks (GNSN), a new architecture designed to address the oversmoothing problem in Graph Neural Networks. Unlike traditional diffusion-based methods, GNSN incorporates convection to create a dynamic velocity field for more efficient message propagation. This approach allows GNSN to better handle datasets with varying homophily and has demonstrated superior performance on multiple real-world classification tasks. AI

IMPACT Introduces a novel architecture to improve GNN performance and address oversmoothing, potentially enhancing graph-based machine learning tasks.
TOOL · arXiv cs.AI · 1d

RePCM: Region-Specific and Phenotype-Adaptive Bi-Ventricular Cardiac Motion Synthesis

Researchers have developed a novel method called RePCM for synthesizing cardiac motion from a single end-diastolic frame. This approach addresses limitations in traditional methods that often oversmooth data by creating models optimized for global patterns. RePCM utilizes a two-stage process: first, a reconstruction network and clustering identify region-specific motion descriptors, and second, a specialized module enforces synchronized region exchange within a conditional VAE to preserve localized dynamics. The system also incorporates a phenotype-adaptive prior to model inter-disease variability, showing improved geometric and functional metrics across multiple datasets. AI

IMPACT This new method could improve the analysis of regional cardiac function and disease-specific dynamics by enabling more accurate motion synthesis from limited data.
TOOL · arXiv cs.AI · 1d

Dynamic TMoE: A Drift-Aware Dynamic Mixture of Experts Framework for Non-Stationary Time Series Forecasting

Researchers have introduced Dynamic TMoE, a novel framework designed to improve time series forecasting for non-stationary data. This approach addresses limitations in existing Mixture-of-Experts models by dynamically creating and removing experts based on detected distribution shifts. A temporal memory router further enhances stability by using recurrent states and an anomaly repository for context-aware expert selection, leading to significant performance gains. AI

IMPACT Introduces a novel framework that improves time series forecasting accuracy for non-stationary data, potentially benefiting applications relying on predictive modeling.
TOOL · arXiv cs.CV · 1d

LER-YOLO: Reliability-Aware Expert Routing for Misaligned RGB-Infrared UAV Detection

Researchers have developed LER-YOLO, a novel framework designed to improve the detection of small unmanned aerial vehicles using misaligned RGB and infrared imagery. The system incorporates an Uncertainty-Aware Target Alignment module to estimate spatial reliability and guide expert selection. This reliability-guided approach adaptively chooses experts for cross-modal fusion, effectively suppressing unreliable data and enhancing detection accuracy. AI

IMPACT Enhances drone detection capabilities by improving the fusion of multi-modal sensor data.
- MBU benchmark
- LER-YOLO
TOOL · arXiv cs.CV · 1d

SR-Ground: Image Quality Grounding for Super-Resolved Content

Researchers have introduced SR-Ground, a new dataset designed to improve image quality assessment for super-resolved images. This dataset features pixel-level annotations for various artifact types introduced by modern super-resolution models. By training models on SR-Ground, researchers have shown improved performance in identifying and even reducing these artifacts, demonstrating practical applications for the dataset. AI

IMPACT This dataset could lead to more reliable and interpretable image quality assessment for AI-generated images, improving user trust and downstream applications.
- arXiv
- SR-Ground
TOOL · arXiv cs.LG · 1d

Divide and Contrast: Learning Robust Temporal Features without Augmentation

Researchers have developed a new unsupervised framework called Divide and Contrast (Di-COT) for learning robust temporal features from time-series data without relying on data augmentation. Di-COT works by contrasting informative substructures within data windows, rather than individual timesteps, which allows for efficient and meaningful contrast while avoiding false positives. This method has demonstrated state-of-the-art performance across various tasks including classification and clustering on multiple large-scale datasets and benchmarks, while also significantly reducing training time. AI

IMPACT Introduces a novel unsupervised learning method for time-series data that improves efficiency and performance on downstream tasks.
TOOL · arXiv cs.CV · 1d

GSA-YOLO: A High-Efficiency Framework via Structured Sparsity and Adaptive Knowledge Distillation for Real-Time X-ray Security Inspection

Researchers have developed GSA-YOLO, a new lightweight framework designed for real-time X-ray security inspection. This model, based on YOLOv8n, incorporates structured sparsity and adaptive knowledge distillation to improve detection accuracy and inference speed. GSA-YOLO integrates Group Lasso, Sparse Structure Selection, and an Adaptive Knowledge Distillation mechanism to enhance feature representation and reduce model size. Evaluations on the HiXray and PIDray datasets show GSA-YOLO achieves a leading inference speed of 189.62 FPS with reduced computational cost, alongside improved mAP50:95 scores compared to the baseline. AI

IMPACT This new framework offers improved speed and accuracy for X-ray security inspections, potentially enhancing threat detection capabilities.
- YOLOv8n
- PIDray
- GSA-YOLO
- HiXray
TOOL · arXiv cs.AI · 1d

On the limits and opportunities of AI reviewers: Reviewing the reviews of Nature-family papers with 45 expert scientists

A new study evaluated AI reviewers on Nature-family papers, finding that while they can outperform top human reviewers in identifying correct, significant, and well-evidenced criticisms, they also exhibit distinct weaknesses. The research involved 45 scientists annotating over 2,900 criticisms from human and AI reviews. While AI reviewers like GPT-5.2, Gemini 3.0 Pro, and Claude Opus 4.5 showed strengths in accuracy and identifying unique issues, they also demonstrated limitations in specialized knowledge, handling multiple files, and an overly critical stance on minor points, suggesting they are best used as complements to human reviewers. AI

IMPACT AI reviewers show promise in scientific critique but require human oversight, potentially speeding up peer review.
TOOL · arXiv cs.AI · 1d

AMAR: Lightweight Attention-Based Multi-User Activity Recognition from Wi-Fi CSI

Researchers have developed AMAR, a novel framework for recognizing multiple simultaneous human activities using Wi-Fi channel state information (CSI). This attention-based system treats activity recognition as a set prediction problem, employing learnable query embeddings to detect concurrent actions from complex CSI data. AMAR utilizes an edge-cloud split architecture, with edge devices performing initial feature extraction and the cloud component handling final prediction, significantly outperforming existing methods in multi-user environments. AI

IMPACT This research could enable more sophisticated contactless sensing applications by improving the ability to track multiple individuals simultaneously using existing Wi-Fi infrastructure.
TOOL · arXiv cs.AI · 1d

REFLECTOR: Internalizing Step-wise Reflection against Indirect Jailbreak

Researchers have developed a new framework called Reflector to enhance the safety of large language models (LLMs) against complex, multi-step jailbreak attacks. This two-stage approach first uses teacher-guided generation for supervised fine-tuning to establish reflection patterns, then employs reinforcement learning for autonomous self-reflection. Reflector demonstrates over 90% defense success against indirect attacks and improves performance on benchmarks like GSM8K by 5.85%, without adding significant computational overhead. AI

IMPACT Enhances LLM safety against sophisticated jailbreaks, potentially improving reliability for critical applications.
TOOL · arXiv cs.AI · 1d

PREFINE: Preference-Based Implicit Reward and Cost Fine-Tuning for Safety Alignment

Researchers have developed PREFINE, a novel method for fine-tuning reinforcement learning policies to incorporate safety constraints without full retraining. This approach adapts Direct Preference Optimization (DPO), commonly used for language models, to continuous control environments. PREFINE leverages trajectory-level preferences to balance reward retention with safety alignment, demonstrating a significant reduction in constraint violations and failures while maintaining original reward performance. AI

IMPACT Introduces a more efficient method for aligning AI behavior with safety constraints in continuous control tasks.
TOOL · arXiv cs.AI · 1d

SURGE: An Event-Centric Social Media Sentiment Time Series Benchmark with Interaction Structure

Researchers have introduced SURGE, a new benchmark dataset designed to analyze social media sentiment dynamics around public events. SURGE organizes over 800,000 posts from 67 events across five categories into time-series data, preserving the interaction structure between posts. This benchmark aims to improve opinion forecasting and crisis response by enabling the study of how post interactions influence collective dynamics and event evolution. AI

IMPACT Provides a new dataset for training and evaluating models in social media sentiment analysis and event forecasting.
TOOL · arXiv cs.LG · 1d

Reinforcement Learning-based Control via Y-wise Affine Neural Networks: Comparative Case Studies for Chemical Processes

Researchers have developed a new reinforcement learning (RL) approach called Y-wise Affine Neural Network (YANN-RL) for controlling chemical processes. This method aims to overcome the typical challenges of trust and lengthy training times associated with RL in this domain. By providing interpretable starting points, YANN-RL significantly reduces training time and data requirements compared to other RL algorithms and approaches the performance of nonlinear model predictive control without needing a full nonlinear model. AI

IMPACT This new RL method could significantly reduce training time and data needs for controlling complex chemical processes.
TOOL · arXiv cs.AI · 1d

SAM-Sode: Towards Faithful Explanations for Tiny Bacteria Detection

Researchers have developed a new explainable AI (XAI) framework called SAM-Sode to improve the interpretability of tiny bacteria detection in medical diagnostics. Traditional methods struggle with the fine details and complex backgrounds inherent in this task, leading to unclear explanations. SAM-Sode addresses this by converting feature attribution maps into geometry-aware prompts, using the SAM3 foundation model for spatial refinement and morphological reconstruction. It also incorporates a dual-constraint mechanism to denoise explanations and align them with expert intuition, enhancing transparency in tiny object detection. AI

IMPACT Enhances transparency in medical diagnostics by providing more intuitive explanations for tiny object detection models.
- SAM3
- SAM-Sode
TOOL · arXiv cs.AI · 1d

Jointly Learning Predicates and Actions Enables Zero-Shot Skill Composition

Researchers have developed a new method called Predicate Action Skills (PACTS) that allows robots to learn and compose skills without retraining. PACTS models both the physical actions and the symbolic outcomes of these actions, enabling better generalization. This approach facilitates zero-shot skill composition through planning by using predicted outcomes to sequence and monitor task execution. AI

IMPACT Enables robots to learn and combine skills more flexibly, potentially accelerating the development of more adaptable robotic systems.
- Benedict Quartey
TOOL · arXiv cs.CV · 1d

PGC: Peak-Guided Calibration for Generalizable AI-Generated Image Detection

Researchers have developed a new framework called Peak-Guided Calibration (PGC) to improve the detection of AI-generated images. This method focuses on aggregating salient, local features using a peak-sensitive mechanism to overcome the limitations of detectors that rely solely on global image representations. PGC effectively calibrates global decisions by accentuating subtle, discriminative clues that might otherwise be lost. The framework demonstrates state-of-the-art performance, significantly improving accuracy on a new benchmark dataset, CommGen15, and setting new records on existing benchmarks. AI

IMPACT Improves the ability to distinguish real images from AI-generated ones, crucial for combating misinformation.
TOOL · arXiv cs.AI · 1d

Design for Manufacturing: A Manufacturability Knowledge-Integrated Reinforcement Learning Framework for Free-Form Pipe Routing in Aeroengines

Researchers have developed a new reinforcement learning framework called FPRO to optimize pipe routing in aeroengines, integrating manufacturing knowledge directly into the design process. This approach represents pipe paths using curvature and torsion profiles, with manufacturing constraints applied to these parameters. The framework uses proximal policy optimization to generate paths that are then translated into fabrication instructions for a six-axis bending machine, demonstrating improved manufacturability and design accuracy compared to existing methods. AI

IMPACT This framework could streamline the design and manufacturing of complex aeroengine components by integrating AI-driven optimization with domain-specific knowledge.
TOOL · arXiv cs.CV · 1d

RankE: End-to-End Post-Training for Discrete Text-to-Image Generation with Decoder Co-Evolution

Researchers have introduced RankE, a novel end-to-end post-training framework designed to improve discrete text-to-image generation models. Unlike previous methods that kept the VQ decoder frozen, RankE co-evolves both the policy and the decoder through alternating optimization. This approach addresses latent covariate shift, where policy improvements lead to degraded image quality. Experiments on LlamaGen-XL and Janus-Pro models demonstrate that RankE simultaneously enhances both alignment (CLIP score) and image fidelity (FID score), breaking the trade-off seen in earlier techniques. AI

IMPACT Introduces a new method to improve image fidelity and alignment in discrete text-to-image models, potentially enhancing generative AI capabilities.
TOOL · arXiv cs.CV · 1d

Semantic Granularity Navigation in Image Editing

Researchers have developed NaviEdit, a new method to improve image editing by decoupling the editing process from the scale of the diffusion or flow model used. This approach aims to resolve the trade-off between semantic editability and structural fidelity by reallocating computational steps towards semantically relevant scales. NaviEdit operates at inference time without altering the pretrained model, showing improved results across various compatible editors and flow backbones. AI

IMPACT Enhances image editing capabilities by improving semantic control and structural fidelity in generative models.
- diffusion models
- NaviEdit
TOOL · arXiv cs.CL · 1d

Metaphors in Literary Post-Editing: Opening Pandora's Box?

A new paper explores how human post-editors handle metaphors translated by Neural Machine Translation and Large Language Models in literary texts. The study found that post-editors frequently altered metaphors, rating the machine translation output as poor and the post-editing process as more demanding than translating from scratch. These findings suggest that current NMT and LLM approaches struggle with figurative language in literary contexts, potentially limiting translator creativity and ownership. AI

IMPACT Reveals significant challenges for LLMs and NMT in translating nuanced figurative language, potentially impacting literary translation workflows.
TOOL · arXiv cs.AI · 1d

Trusted Weights, Treacherous Optimizations? Optimization-Triggered Backdoor Attacks on LLMs

Researchers have identified a new security vulnerability in large language models (LLMs) that exploits inference optimization techniques, particularly compilation. This vulnerability allows attackers to implant hidden backdoors into LLMs, causing them to misbehave on specific inputs only when compiled. These attacks achieve high success rates while maintaining near-perfect accuracy on normal inputs, bypassing standard safety checks. AI

IMPACT Reveals a new attack surface in LLM deployment, potentially requiring new security measures for optimized models.
- LLMs
TOOL · arXiv cs.LG · 1d

Q-SYNTH: Hybrid Quantum-Classical Adversarial Augmentation for Imbalanced Fraud Detection

Researchers have developed Q-SYNTH, a novel hybrid quantum-classical framework designed to address the challenge of imbalanced data in credit card fraud detection. This system uses a parameterized quantum circuit as the generator and a classical neural network as the discriminator to synthesize minority-class fraud samples. Evaluations show Q-SYNTH offers a promising balance between statistical fidelity to real fraud data and improved downstream fraud detection performance, outperforming some classical baselines in specific metrics. AI

IMPACT Introduces a novel hybrid quantum-classical approach to improve AI model performance on imbalanced datasets, potentially enhancing fraud detection systems.
TOOL · arXiv cs.AI · 1d

Pareto-Enhanced Portrait Generation: Vision-Aligned Text Supervision for Alignment, Realism, and Aesthetics

Researchers have developed a new method to improve text-to-image diffusion models for generating human portraits, addressing the common trade-off between text alignment, realism, and aesthetics. Their approach uses a feature supervision paradigm with a lightweight cross-modal alignment mechanism that extracts vision-aligned text representations from SigLIP 2. This method injects guidance into the image generation process without degrading the model's original capabilities or requiring extra inference time, while also optimizing for human-perceived aesthetics. AI

IMPACT Introduces a novel technique to improve the quality and coherence of AI-generated portraits, potentially impacting creative tools and applications.
- MM-DiT
- SigLIP 2
TOOL · arXiv cs.CL · 1d

ChunkFT: Byte-Streamed Optimization for Memory-Efficient Full Fine-Tuning

Researchers have developed ChunkFT, a novel framework designed to significantly reduce the memory required for full-parameter fine-tuning of large language models. This method dynamically activates a working set of parameters, enabling gradient computation on sub-tensors without altering the model architecture. Experiments show ChunkFT can fine-tune models like Llama 3-8B on a single consumer GPU, achieving performance comparable to traditional full fine-tuning while using substantially less memory. AI

IMPACT Enables fine-tuning of large language models on consumer hardware, potentially democratizing advanced model customization.
TOOL · arXiv cs.CV · 1d

FTerViT: Fully Ternary Vision Transformer

Researchers have developed FTerViT, a fully ternary Vision Transformer that compresses all weight matrices and normalization parameters. This approach significantly reduces the model's memory footprint, making it more feasible for deployment on resource-constrained devices like microcontrollers. FTerViT achieves competitive accuracy on ImageNet while offering substantial compression compared to standard floating-point models. AI

IMPACT Enables more efficient deployment of advanced vision models on low-power edge devices.