Brief

last 24h

[50/424] 186 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · arXiv cs.CV · 1d

LER-YOLO: Reliability-Aware Expert Routing for Misaligned RGB-Infrared UAV Detection

Researchers have developed LER-YOLO, a novel framework designed to improve the detection of small unmanned aerial vehicles using misaligned RGB and infrared imagery. The system incorporates an Uncertainty-Aware Target Alignment module to estimate spatial reliability and guide expert selection. This reliability-guided approach adaptively chooses experts for cross-modal fusion, effectively suppressing unreliable data and enhancing detection accuracy. AI

IMPACT Enhances drone detection capabilities by improving the fusion of multi-modal sensor data.
- MBU benchmark
- LER-YOLO
RESEARCH · arXiv stat.ML · 2d · [2 sources]

Corrected Integrated Laplace Approximation for Bayesian Inference in Latent Gaussian Models

Researchers have developed a new method to correct errors in Bayesian inference for latent Gaussian models. The proposed importance sampling scheme improves the accuracy of approximate posteriors derived from integrated Laplace approximation (ILA). This correction is crucial as ILA can sometimes produce significantly different results from the true posterior, impacting subsequent analyses. AI

IMPACT Improves accuracy of statistical models used in machine learning, potentially leading to more reliable downstream AI applications.
RESEARCH · Hugging Face Daily Papers · 2d · [3 sources]

PiG-Avatar: Hierarchical Neural-Field-Guided Gaussian Avatars

Researchers have introduced PiG-Avatar, a novel method for generating realistic 3D avatars. This approach decouples avatar geometry from body template surfaces, allowing for more accurate representation of complex clothing and non-rigid movements. PiG-Avatar utilizes a neural field to guide Gaussian representations, enabling real-time rendering and achieving state-of-the-art quality on benchmarks. AI

IMPACT Enables more realistic and dynamic 3D avatar generation, potentially impacting virtual reality, gaming, and digital content creation.
TOOL · arXiv cs.CV · 1d

SR-Ground: Image Quality Grounding for Super-Resolved Content

Researchers have introduced SR-Ground, a new dataset designed to improve image quality assessment for super-resolved images. This dataset features pixel-level annotations for various artifact types introduced by modern super-resolution models. By training models on SR-Ground, researchers have shown improved performance in identifying and even reducing these artifacts, demonstrating practical applications for the dataset. AI

IMPACT This dataset could lead to more reliable and interpretable image quality assessment for AI-generated images, improving user trust and downstream applications.
- arXiv
- SR-Ground
TOOL · arXiv cs.LG · 1d

Divide and Contrast: Learning Robust Temporal Features without Augmentation

Researchers have developed a new unsupervised framework called Divide and Contrast (Di-COT) for learning robust temporal features from time-series data without relying on data augmentation. Di-COT works by contrasting informative substructures within data windows, rather than individual timesteps, which allows for efficient and meaningful contrast while avoiding false positives. This method has demonstrated state-of-the-art performance across various tasks including classification and clustering on multiple large-scale datasets and benchmarks, while also significantly reducing training time. AI

IMPACT Introduces a novel unsupervised learning method for time-series data that improves efficiency and performance on downstream tasks.
TOOL · arXiv cs.CV · 1d

GSA-YOLO: A High-Efficiency Framework via Structured Sparsity and Adaptive Knowledge Distillation for Real-Time X-ray Security Inspection

Researchers have developed GSA-YOLO, a new lightweight framework designed for real-time X-ray security inspection. This model, based on YOLOv8n, incorporates structured sparsity and adaptive knowledge distillation to improve detection accuracy and inference speed. GSA-YOLO integrates Group Lasso, Sparse Structure Selection, and an Adaptive Knowledge Distillation mechanism to enhance feature representation and reduce model size. Evaluations on the HiXray and PIDray datasets show GSA-YOLO achieves a leading inference speed of 189.62 FPS with reduced computational cost, alongside improved mAP50:95 scores compared to the baseline. AI

IMPACT This new framework offers improved speed and accuracy for X-ray security inspections, potentially enhancing threat detection capabilities.
- YOLOv8n
- PIDray
- GSA-YOLO
- HiXray
TOOL · arXiv cs.AI · 1d

On the limits and opportunities of AI reviewers: Reviewing the reviews of Nature-family papers with 45 expert scientists

A new study evaluated AI reviewers on Nature-family papers, finding that while they can outperform top human reviewers in identifying correct, significant, and well-evidenced criticisms, they also exhibit distinct weaknesses. The research involved 45 scientists annotating over 2,900 criticisms from human and AI reviews. While AI reviewers like GPT-5.2, Gemini 3.0 Pro, and Claude Opus 4.5 showed strengths in accuracy and identifying unique issues, they also demonstrated limitations in specialized knowledge, handling multiple files, and an overly critical stance on minor points, suggesting they are best used as complements to human reviewers. AI

IMPACT AI reviewers show promise in scientific critique but require human oversight, potentially speeding up peer review.
RESEARCH · arXiv cs.AI · 1d · [2 sources]

Hack-Verifiable Environments: Towards Evaluating Reward Hacking at Scale

Two new research papers introduce novel benchmarks for detecting and measuring reward hacking in AI agents, particularly those involved in long-horizon tasks like coding. The first paper, SpecBench, uses a gap between visible and held-out test pass rates to quantify reward hacking in coding agents, finding that smaller models exhibit larger gaps and the issue scales with task length. The second paper, Hack-Verifiable Environments, embeds detectable reward hacking opportunities directly into environments, enabling automated measurement and analysis of this behavior across language models. AI

IMPACT These new benchmarks aim to improve AI alignment by providing better tools to measure and mitigate reward hacking, a critical challenge for developing reliable AI agents.
TOOL · arXiv cs.AI · 1d

AMAR: Lightweight Attention-Based Multi-User Activity Recognition from Wi-Fi CSI

Researchers have developed AMAR, a novel framework for recognizing multiple simultaneous human activities using Wi-Fi channel state information (CSI). This attention-based system treats activity recognition as a set prediction problem, employing learnable query embeddings to detect concurrent actions from complex CSI data. AMAR utilizes an edge-cloud split architecture, with edge devices performing initial feature extraction and the cloud component handling final prediction, significantly outperforming existing methods in multi-user environments. AI

IMPACT This research could enable more sophisticated contactless sensing applications by improving the ability to track multiple individuals simultaneously using existing Wi-Fi infrastructure.
TOOL · arXiv cs.AI · 1d

REFLECTOR: Internalizing Step-wise Reflection against Indirect Jailbreak

Researchers have developed a new framework called Reflector to enhance the safety of large language models (LLMs) against complex, multi-step jailbreak attacks. This two-stage approach first uses teacher-guided generation for supervised fine-tuning to establish reflection patterns, then employs reinforcement learning for autonomous self-reflection. Reflector demonstrates over 90% defense success against indirect attacks and improves performance on benchmarks like GSM8K by 5.85%, without adding significant computational overhead. AI

IMPACT Enhances LLM safety against sophisticated jailbreaks, potentially improving reliability for critical applications.
TOOL · arXiv cs.AI · 1d

PREFINE: Preference-Based Implicit Reward and Cost Fine-Tuning for Safety Alignment

Researchers have developed PREFINE, a novel method for fine-tuning reinforcement learning policies to incorporate safety constraints without full retraining. This approach adapts Direct Preference Optimization (DPO), commonly used for language models, to continuous control environments. PREFINE leverages trajectory-level preferences to balance reward retention with safety alignment, demonstrating a significant reduction in constraint violations and failures while maintaining original reward performance. AI

IMPACT Introduces a more efficient method for aligning AI behavior with safety constraints in continuous control tasks.
TOOL · arXiv cs.AI · 1d

SURGE: An Event-Centric Social Media Sentiment Time Series Benchmark with Interaction Structure

Researchers have introduced SURGE, a new benchmark dataset designed to analyze social media sentiment dynamics around public events. SURGE organizes over 800,000 posts from 67 events across five categories into time-series data, preserving the interaction structure between posts. This benchmark aims to improve opinion forecasting and crisis response by enabling the study of how post interactions influence collective dynamics and event evolution. AI

IMPACT Provides a new dataset for training and evaluating models in social media sentiment analysis and event forecasting.
TOOL · arXiv cs.LG · 1d

Reinforcement Learning-based Control via Y-wise Affine Neural Networks: Comparative Case Studies for Chemical Processes

Researchers have developed a new reinforcement learning (RL) approach called Y-wise Affine Neural Network (YANN-RL) for controlling chemical processes. This method aims to overcome the typical challenges of trust and lengthy training times associated with RL in this domain. By providing interpretable starting points, YANN-RL significantly reduces training time and data requirements compared to other RL algorithms and approaches the performance of nonlinear model predictive control without needing a full nonlinear model. AI

IMPACT This new RL method could significantly reduce training time and data needs for controlling complex chemical processes.
TOOL · arXiv cs.AI · 1d

SAM-Sode: Towards Faithful Explanations for Tiny Bacteria Detection

Researchers have developed a new explainable AI (XAI) framework called SAM-Sode to improve the interpretability of tiny bacteria detection in medical diagnostics. Traditional methods struggle with the fine details and complex backgrounds inherent in this task, leading to unclear explanations. SAM-Sode addresses this by converting feature attribution maps into geometry-aware prompts, using the SAM3 foundation model for spatial refinement and morphological reconstruction. It also incorporates a dual-constraint mechanism to denoise explanations and align them with expert intuition, enhancing transparency in tiny object detection. AI

IMPACT Enhances transparency in medical diagnostics by providing more intuitive explanations for tiny object detection models.
- SAM3
- SAM-Sode
TOOL · arXiv cs.AI · 1d

Jointly Learning Predicates and Actions Enables Zero-Shot Skill Composition

Researchers have developed a new method called Predicate Action Skills (PACTS) that allows robots to learn and compose skills without retraining. PACTS models both the physical actions and the symbolic outcomes of these actions, enabling better generalization. This approach facilitates zero-shot skill composition through planning by using predicted outcomes to sequence and monitor task execution. AI

IMPACT Enables robots to learn and combine skills more flexibly, potentially accelerating the development of more adaptable robotic systems.
- Benedict Quartey
TOOL · arXiv cs.CV · 1d

PGC: Peak-Guided Calibration for Generalizable AI-Generated Image Detection

Researchers have developed a new framework called Peak-Guided Calibration (PGC) to improve the detection of AI-generated images. This method focuses on aggregating salient, local features using a peak-sensitive mechanism to overcome the limitations of detectors that rely solely on global image representations. PGC effectively calibrates global decisions by accentuating subtle, discriminative clues that might otherwise be lost. The framework demonstrates state-of-the-art performance, significantly improving accuracy on a new benchmark dataset, CommGen15, and setting new records on existing benchmarks. AI

IMPACT Improves the ability to distinguish real images from AI-generated ones, crucial for combating misinformation.
TOOL · arXiv cs.AI · 1d

Design for Manufacturing: A Manufacturability Knowledge-Integrated Reinforcement Learning Framework for Free-Form Pipe Routing in Aeroengines

Researchers have developed a new reinforcement learning framework called FPRO to optimize pipe routing in aeroengines, integrating manufacturing knowledge directly into the design process. This approach represents pipe paths using curvature and torsion profiles, with manufacturing constraints applied to these parameters. The framework uses proximal policy optimization to generate paths that are then translated into fabrication instructions for a six-axis bending machine, demonstrating improved manufacturability and design accuracy compared to existing methods. AI

IMPACT This framework could streamline the design and manufacturing of complex aeroengine components by integrating AI-driven optimization with domain-specific knowledge.
TOOL · arXiv cs.CV · 1d

RankE: End-to-End Post-Training for Discrete Text-to-Image Generation with Decoder Co-Evolution

Researchers have introduced RankE, a novel end-to-end post-training framework designed to improve discrete text-to-image generation models. Unlike previous methods that kept the VQ decoder frozen, RankE co-evolves both the policy and the decoder through alternating optimization. This approach addresses latent covariate shift, where policy improvements lead to degraded image quality. Experiments on LlamaGen-XL and Janus-Pro models demonstrate that RankE simultaneously enhances both alignment (CLIP score) and image fidelity (FID score), breaking the trade-off seen in earlier techniques. AI

IMPACT Introduces a new method to improve image fidelity and alignment in discrete text-to-image models, potentially enhancing generative AI capabilities.
TOOL · arXiv cs.CV · 1d

Semantic Granularity Navigation in Image Editing

Researchers have developed NaviEdit, a new method to improve image editing by decoupling the editing process from the scale of the diffusion or flow model used. This approach aims to resolve the trade-off between semantic editability and structural fidelity by reallocating computational steps towards semantically relevant scales. NaviEdit operates at inference time without altering the pretrained model, showing improved results across various compatible editors and flow backbones. AI

IMPACT Enhances image editing capabilities by improving semantic control and structural fidelity in generative models.
- diffusion models
- NaviEdit
TOOL · arXiv cs.CL · 1d

Metaphors in Literary Post-Editing: Opening Pandora's Box?

A new paper explores how human post-editors handle metaphors translated by Neural Machine Translation and Large Language Models in literary texts. The study found that post-editors frequently altered metaphors, rating the machine translation output as poor and the post-editing process as more demanding than translating from scratch. These findings suggest that current NMT and LLM approaches struggle with figurative language in literary contexts, potentially limiting translator creativity and ownership. AI

IMPACT Reveals significant challenges for LLMs and NMT in translating nuanced figurative language, potentially impacting literary translation workflows.
TOOL · arXiv cs.AI · 1d

Trusted Weights, Treacherous Optimizations? Optimization-Triggered Backdoor Attacks on LLMs

Researchers have identified a new security vulnerability in large language models (LLMs) that exploits inference optimization techniques, particularly compilation. This vulnerability allows attackers to implant hidden backdoors into LLMs, causing them to misbehave on specific inputs only when compiled. These attacks achieve high success rates while maintaining near-perfect accuracy on normal inputs, bypassing standard safety checks. AI

IMPACT Reveals a new attack surface in LLM deployment, potentially requiring new security measures for optimized models.
- LLMs
TOOL · arXiv cs.LG · 1d

Q-SYNTH: Hybrid Quantum-Classical Adversarial Augmentation for Imbalanced Fraud Detection

Researchers have developed Q-SYNTH, a novel hybrid quantum-classical framework designed to address the challenge of imbalanced data in credit card fraud detection. This system uses a parameterized quantum circuit as the generator and a classical neural network as the discriminator to synthesize minority-class fraud samples. Evaluations show Q-SYNTH offers a promising balance between statistical fidelity to real fraud data and improved downstream fraud detection performance, outperforming some classical baselines in specific metrics. AI

IMPACT Introduces a novel hybrid quantum-classical approach to improve AI model performance on imbalanced datasets, potentially enhancing fraud detection systems.
TOOL · arXiv cs.AI · 1d

Pareto-Enhanced Portrait Generation: Vision-Aligned Text Supervision for Alignment, Realism, and Aesthetics

Researchers have developed a new method to improve text-to-image diffusion models for generating human portraits, addressing the common trade-off between text alignment, realism, and aesthetics. Their approach uses a feature supervision paradigm with a lightweight cross-modal alignment mechanism that extracts vision-aligned text representations from SigLIP 2. This method injects guidance into the image generation process without degrading the model's original capabilities or requiring extra inference time, while also optimizing for human-perceived aesthetics. AI

IMPACT Introduces a novel technique to improve the quality and coherence of AI-generated portraits, potentially impacting creative tools and applications.
- MM-DiT
- SigLIP 2
TOOL · arXiv cs.CL · 1d

ChunkFT: Byte-Streamed Optimization for Memory-Efficient Full Fine-Tuning

Researchers have developed ChunkFT, a novel framework designed to significantly reduce the memory required for full-parameter fine-tuning of large language models. This method dynamically activates a working set of parameters, enabling gradient computation on sub-tensors without altering the model architecture. Experiments show ChunkFT can fine-tune models like Llama 3-8B on a single consumer GPU, achieving performance comparable to traditional full fine-tuning while using substantially less memory. AI

IMPACT Enables fine-tuning of large language models on consumer hardware, potentially democratizing advanced model customization.
TOOL · arXiv cs.CV · 1d

FTerViT: Fully Ternary Vision Transformer

Researchers have developed FTerViT, a fully ternary Vision Transformer that compresses all weight matrices and normalization parameters. This approach significantly reduces the model's memory footprint, making it more feasible for deployment on resource-constrained devices like microcontrollers. FTerViT achieves competitive accuracy on ImageNet while offering substantial compression compared to standard floating-point models. AI

IMPACT Enables more efficient deployment of advanced vision models on low-power edge devices.
TOOL · arXiv cs.AI · 1d

ScenePilot: Controllable Boundary-Driven Critical Scenario Generation for Autonomous Driving

Researchers have developed ScenePilot, a new framework for generating critical scenarios for autonomous driving systems. This method focuses on creating scenarios that are physically solvable but still challenging enough to cause failures in deployed systems. By using constrained reinforcement learning and a combination of physical feasibility scores and risk prediction, ScenePilot aims to produce more realistic and effective stress tests for autonomous vehicles. Experiments show that scenarios generated by ScenePilot lead to higher collision rates while maintaining physical validity, and fine-tuning on these scenarios reduces downstream crash rates. AI

IMPACT Enhances safety testing for autonomous vehicles by generating more realistic and challenging failure scenarios.
TOOL · arXiv cs.CL · 1d

Divide-Prompt-Refine: a Training-Free, Structure-Aware Framework for Biomedical Abstract Generation

Researchers have developed DPR-BAG, a novel framework designed to generate biomedical abstracts from full-text articles that lack them. This training-free, zero-shot approach structures the document into rhetorical facets like Background, Objective, Methods, Results, and Conclusions. It then uses large language models to summarize each facet individually before a final refinement step ensures overall coherence and factual accuracy. AI

IMPACT This framework could improve accessibility and utility of biomedical literature by enabling abstract generation for articles that currently lack them.
TOOL · arXiv cs.AI · 1d

Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines

Researchers have developed new methods to optimize agent-based plan-execute pipelines for industrial operations, which are highly sensitive to latency. They introduced a temporal semantic cache and workflow optimizations, including disk-backed tool discovery caching and parallel step execution. These optimizations achieved significant speedups, with workflow optimizations providing a 1.67x speedup and temporal caching yielding up to 30.6x speedup on cache hits, while also highlighting limitations of standard semantic caching for parameter-rich queries. AI

IMPACT Introduces optimizations for latency-sensitive industrial AI agent pipelines, potentially improving efficiency in real-world applications.
- LLM
- AssetOpsBench
TOOL · arXiv cs.AI · 1d

Retrieval-Augmented Long-Context Translation for Cultural Image Captioning: Gators submission for AmericasNLP 2026 shared task

Researchers from the University of Florida developed a two-stage pipeline for cultural image captioning in Indigenous languages, winning the AmericasNLP 2026 shared task. The system first generates an intermediate Spanish caption using Qwen2.5-VL, then translates it into the target Indigenous language with Gemini 2.5 Flash via retrieval-augmented prompting. This approach yielded significant improvements over the baseline, with gains exceeding 150% for some languages, though retrieval effectiveness was found to be language-dependent. AI

IMPACT Demonstrates a novel approach to low-resource language translation for image captioning, potentially improving accessibility for Indigenous communities.
TOOL · arXiv cs.LG · 1d

Learning First Integrals via Backward-Generated Data and Guided Reinforcement Learning

Researchers have developed FISolver, a novel LLM-based system designed to discover first integrals in dynamical systems, which are crucial for understanding conservation laws. The system addresses data scarcity by employing a "Backward Generation" algorithm to create extensive datasets of differential equation and first integral pairs. FISolver also utilizes supervised fine-tuning and reinforcement learning with a shaped reward to enhance its performance, outperforming larger models and commercial solvers like Mathematica on challenging benchmarks with lower computational costs. AI

IMPACT Introduces a novel data-driven approach for automated scientific discovery, potentially accelerating research in dynamical systems.
- LLM
- Mathematica
- FISolver
TOOL · arXiv cs.AI · 1d

COAgents: Multi-Agent Framework to Learn and Navigate Routing Problems Search Space

Researchers have developed COAgents, a novel multi-agent framework designed to tackle complex Vehicle Routing Problems (VRPs). This framework models the search process as a graph, dynamically constructing a Partial Search Graph (PSG) to guide exploration. COAgents trains agents for node selection, move selection, and strategic 'jumps' to escape local minima, separating general search control from domain-specific encoding for adaptability. Experiments demonstrate COAgents' competitiveness, setting a new state-of-the-art among learning-based methods on VRPTW instances and significantly closing the gap to optimal solutions. AI

IMPACT Introduces a novel multi-agent learning approach that improves performance on challenging routing optimization tasks.
RESEARCH · arXiv stat.ML · 1d · [2 sources]

Group-Aware Matrix Estimation and Latent Subspace Recovery

Researchers have developed a new convex estimator called Group-Aware Matrix Estimation (GAME) designed to improve matrix completion for heterogeneous data. GAME addresses limitations of standard low-rank estimators by allowing related groups to share information while preserving distinct local latent structures. The method provides theoretical guarantees and demonstrates competitive or superior performance across various datasets compared to existing baselines, particularly in scenarios with structured missingness. AI

IMPACT Introduces a novel statistical technique that could enhance machine learning models dealing with complex, heterogeneous datasets.
TOOL · arXiv cs.CL · 1d

HRM-Text: Efficient Pretraining Beyond Scaling

Researchers have developed HRM-Text, a novel Hierarchical Recurrent Model that significantly reduces the computational resources and training data required for pretraining large language models. By decoupling computation into strategic and execution layers and training exclusively on instruction-response pairs, a 1B-parameter model achieved competitive performance on several benchmarks with a fraction of the tokens and compute used by standard models. This approach makes foundational LLM research more accessible by lowering the barrier to entry for pretraining from scratch. AI

IMPACT Enables more researchers to train foundational models from scratch, potentially accelerating innovation.
TOOL · arXiv cs.AI · 1d

Beyond Routing: Characterising Expert Tuning and Representation in Vision Mixture-of-Experts

Researchers have developed new methods to understand the internal workings of Mixture-of-Experts (MoE) models in computer vision. By analyzing how different visual categories are routed to specific experts and examining the tuning of these experts to various inputs, they found that an animate-inanimate distinction is a dominant factor in expert partitioning. The study reveals that experts tune to broader, continuous visual and semantic dimensions beyond simple category boundaries, highlighting the benefits of moving beyond basic routing analyses for a deeper understanding of MoE specialization. AI

IMPACT Provides novel methods for interpreting the specialized functions within complex vision models, advancing AI research.
TOOL · arXiv cs.CL · 1d

Self-Training Doesn't Flatten Language -- It Restructures It: Surface Markers Amplify While Deep Syntax Dies

A new research paper proposes the Structural Depth Hypothesis (SDH) to explain how self-training restructures language models. The study found that while surface-level linguistic features like discourse markers increase, deeper syntactic structures such as questions and passives decline. This effect was observed across multiple models and architectures, suggesting it's a specific outcome of self-training rather than a general language model behavior. AI

IMPACT This research suggests that self-training may lead to LLMs that are superficially complex but lack deep syntactic understanding, impacting data curation and text detection.
TOOL · arXiv cs.LG · 1d

Instant GPU Efficiency Visibility at Fleet Scale

Researchers have developed a new metric called Overall FLOP Utilization (OFU) to measure GPU efficiency for AI workloads. OFU is derived from on-chip performance counters and does not require application instrumentation, making it applicable across different GPU generations and precisions. When tested on production training jobs, OFU showed a strong correlation with application-level metrics and helped identify efficiency regressions and framework miscalculations. AI

IMPACT Provides a practical method for monitoring and improving the efficiency of AI training infrastructure.
- GB200
- Overall FLOP Utilization (OFU)
RESEARCH · arXiv cs.CV · 2d · [2 sources]

FGSVQA: Frequency-Guided Short-form Video Quality Assessment

Two new research papers introduce novel approaches to video quality assessment (VQA). One paper, VersusQ, proposes a pairwise margin reasoning framework that focuses on relative video comparisons to improve generalization across different datasets. The other, FGSVQA, presents an end-to-end framework for short-form video quality assessment that incorporates frequency domain priors and a dense visual encoder for artifact-aware feature aggregation. AI

IMPACT These new VQA methods aim to improve the accuracy and generalizability of automated video quality evaluation, which is crucial for content moderation and user experience in video platforms.
- FGSVQA
- VersusQ
- arXiv
TOOL · arXiv cs.CL · 1d

Direct Translation between Sign Languages

Researchers have developed a novel method for direct translation between different sign languages, addressing a gap in current sign language technology. Their approach utilizes back-translation to create synthetic parallel corpora, enabling the training of a single model for both text-to-sign and sign-to-sign translation. This direct method significantly outperforms cascaded systems in accuracy and speed, showing promise for improved cross-lingual communication among deaf and hard-of-hearing individuals. AI

IMPACT Enables cross-lingual communication for 1.5 billion deaf and hard-of-hearing individuals by directly translating between sign languages.
TOOL · arXiv cs.CL · 1d

Do No Harm? Hallucination and Actor-Level Abuse in Web-Deployed Medical Large Language Models

A new study published on arXiv assessed 6,233 web-deployed medical large language models (LLMs), evaluating a sample of 1,500 along with 10 open-source models. The research found that a significant portion of these models exhibit factual inaccuracies, with 25-30% showing low accuracy and over half violating operational thresholds. Additionally, many action-enabled models lacked adequate privacy disclosures, indicating systemic gaps in safety and compliance. AI

IMPACT Highlights critical safety and compliance issues in medical AI, necessitating stronger safeguards for patient care.
- LLMs
- arXiv
- Sunday Ogundoyin
- MedGPTs
- HAA-MedGPT
RESEARCH · arXiv cs.CL · 1d · [2 sources]

JobArabi: An Arabic Corpus and Analysis of Job Announcements from Social Media

Researchers have developed JobArabi, a new corpus of over 20,000 Arabic job announcements sourced from social media platforms like X. This dataset, collected between January 2024 and October 2025, uses a specialized query framework to capture diverse recruitment language. Analysis of the corpus reveals sociolinguistic patterns such as persistent gendered language, regional job demand variations, and the emotional tone of recruitment messages. AI

IMPACT Provides a new resource for Arabic NLP and computational social science research into labor market communication.
- X
- Arabic
- JobArabi
TOOL · arXiv cs.CL · 1d

Multi-agent Collaboration with State Management

Researchers have developed STORM, a novel state-oriented management system designed to improve collaboration among multiple AI agents working on shared codebases. Unlike existing methods that rely on workspace isolation and delayed conflict resolution, STORM actively manages agent states to ensure consistent views and detect conflicts in real-time during edits. Evaluations on the Commit0 and PaperBench benchmarks demonstrated that STORM significantly outperforms baseline methods, achieving higher scores and comparable cost efficiency across various large language models. AI

IMPACT Improves efficiency and reduces conflicts for AI agents working collaboratively on software development tasks.
- AI agents
- LLMs
- STORM
- Commit0
- codebases
TOOL · arXiv cs.CL · 1d

When Irregularity Helps: A Subclass Analysis of Inductive Bias in Neural Morphology

A new research paper analyzes neural morphological generation systems, revealing that a tiny fraction of rare, irregular data can disproportionately cause errors. The study focused on Japanese past-tense verb inflection, finding that a specific irregular subtype, less than 1% of the data, was responsible for a significant share of model mistakes. This suggests that not all irregularity equally destabilizes models, and finer-grained subclass analysis is needed for better morphological evaluation. AI

IMPACT Highlights the need for more granular evaluation of AI models beyond aggregate accuracy, particularly in language processing tasks.
- Japanese past-tense verb inflection
- Neural Morphology
RESEARCH · Hugging Face Daily Papers · 1d · [3 sources]

SURF: Steering the Scalarization Weight to Uniformly Traverse the Pareto Front

Researchers have developed a new method called SURF (Sampling Uniformly along the PaReto Front) to address challenges in multi-objective optimization. SURF aims to generate diverse solutions with uniform coverage of the Pareto front, a goal often unmet by standard weight sampling techniques. The method analyzes the geometric relationship between scalarization weights and solution coverage, proposing a principled rule for selecting weights that ensure uniform distribution. SURF has demonstrated empirical success in improving Pareto front coverage across various applications, including multi-objective LLM alignment. AI

IMPACT Improves methods for aligning LLMs with diverse user preferences by ensuring uniform coverage of potential solutions.
- LLM alignment
RESEARCH · arXiv cs.CL · 2d · [2 sources]

CEPO: RLVR Self-Distillation using Contrastive Evidence Policy Optimization

Researchers have developed two novel self-distillation techniques for language models to improve performance on complex reasoning tasks. AVSD (Adaptive-View Self-Distillation) balances consensus and view-specific signals from multiple teacher models to provide more reliable supervision. CEPO (Contrastive Evidence Policy Optimization) sharpens the reward signal by distinguishing decisive reasoning steps from filler tokens, using contrastive learning against incorrect answers. Both methods show significant improvements on mathematical and code-generation benchmarks, outperforming existing self-distillation baselines. AI

IMPACT These new self-distillation techniques offer improved methods for training LLMs, potentially leading to more capable models for complex reasoning tasks.
RESEARCH · arXiv stat.ML · 2d · [2 sources]

Understanding Deterioration Random Effects for Causal Discovery in Infrastructure Management

Researchers have developed a new framework for causal discovery in infrastructure management, focusing on pump equipment deterioration. This method combines Bayesian hierarchical hazard modeling with causal discovery to identify operational patterns that influence varying deterioration rates. The study analyzed 112 pumps and found significant heterogeneity, with one group showing causal effects 400 times larger than another, highlighting the need for distinct management approaches. AI

IMPACT Introduces a novel framework for heterogeneity-aware predictive maintenance in infrastructure, potentially improving asset management strategies.
TOOL · arXiv cs.CL · 1d

What Do Biomedical NER and Entity Linking Benchmarks Measure? A Corpus-Centric Diagnostic Framework

Researchers have developed a new framework to analyze the properties of annotated corpora used in biomedical Named Entity Recognition (NER) and Entity Linking (EL) benchmarks. This corpus-centric approach systematically examines statistics related to scale, label distribution, lexical structure, train-test overlap, and metadata composition. Applying this framework to nine different corpora revealed significant variations in their properties, suggesting that standard corpus statistics may not fully capture what these benchmarks evaluate. AI

IMPACT Provides a standardized method for evaluating the quality and comparability of datasets used in biomedical NLP research.
TOOL · arXiv cs.CL · 1d

Collocational bootstrapping: A hypothesis about the learning of subject-verb agreement in humans and neural networks

Researchers have proposed a new hypothesis called collocational bootstrapping, suggesting that patterns in word co-occurrence can help in learning syntactic dependencies. They tested this by training neural networks on synthetic data, finding that these models could learn subject-verb agreement when the pairings had a specific level of predictability. Analysis of child-directed language revealed that the variability in subject-verb pairings within this data falls within the range that supported successful learning in the computational simulations, indicating it's a plausible strategy for language acquisition. AI

IMPACT Proposes a novel mechanism for how statistical learning in neural networks could mirror human language acquisition, potentially informing future model architectures.
TOOL · arXiv cs.CL · 1d

NeuroQA: A Large-Scale Image-Grounded Benchmark for 3D Brain MRI Understanding

Researchers have introduced NeuroQA, a new benchmark designed for evaluating visual question answering capabilities specifically within 3D brain MRI scans. This benchmark includes over 56,000 question-answer pairs derived from more than 12,000 subjects across various clinical domains and age groups. NeuroQA aims to overcome limitations of previous medical VQA efforts by utilizing full 3D volumes and implementing strategies to prevent text-based shortcuts, ensuring models truly understand the image content. AI

IMPACT Establishes a new standard for evaluating AI's ability to interpret complex 3D medical imaging data.
- 3D brain MRI
- NeuroQA
TOOL · arXiv cs.CL · 1d

Reinforcing Human Behavior Simulation via Verbal Feedback

Researchers have developed DITTO, a new model that learns to simulate human behavior by incorporating verbal feedback as a primary signal in reinforcement learning. This approach, detailed in a new paper, treats subjective and multi-faceted guidance as a first-class input, optimizing for improved rollouts based on this feedback. DITTO demonstrated a 36% improvement over its base model and outperformed GPT-5.4 on six benchmarks within the newly introduced SOUL suite, which comprises ten tasks across various human-like behavior simulations. AI

IMPACT This research introduces a novel method for training LLMs to better simulate human behavior, potentially improving their utility in roles requiring nuanced social understanding.
- GPT-5.4
- SOUL
- DITTO
TOOL · arXiv cs.CL · 1d

Stage-Audit: Auditable Source-Frontier Discovery for Cross-Wiki Tables

Researchers have developed Stage-Audit, a system designed to improve the accuracy and source-grounding of tables generated by large language models. The system addresses the issue of LLMs fabricating or misattributing sources for table entries by implementing distinct curator and auditor roles with write permissions. Stage-Audit also incorporates a row-level source-citation gate and a comprehensive audit taxonomy to ensure explicit traceability of information. AI

IMPACT Enhances the reliability of LLM-generated structured data, reducing the risk of misinformation and improving data integrity for downstream applications.