Brief

last 24h

[50/1937] 224 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · arXiv cs.AI English(EN) · 2w

Verifiable Benchmarking of Long-Horizon Spatial Biology

A new benchmark, SpatialBench-Long, has been developed to evaluate AI agents' capabilities in long-horizon scientific reasoning within spatial biology. This benchmark assesses agents' ability to derive biological conclusions from complex, raw data across various experimental modalities and biological systems. Initial results show that current leading models like Gemini 3.5 Flash and GPT-5.5, when paired with specific coding harnesses, achieve a modest success rate of 11.1% on the benchmark. AI

IMPACT This benchmark will drive the development of AI agents capable of complex scientific discovery in biology.
TOOL · arXiv cs.AI English(EN) · 2w

An Empirical Audit of k-NAF Budget Accounting for Anchored Decoding

Researchers have conducted an empirical audit of the k-NAF budget accounting mechanism within Anchored Decoding. Their experiments, using both fixed and adaptive workloads, revealed that cumulative KL spend generally remained well below sequence-level budgets. Even when adaptive search increased proxy spend ratios, clear budget exhaustion was not observed. The study suggests that observed high proxy ratios might be artifacts rather than genuine budget failures, particularly when evaluations use smaller sample sizes. AI
- Anchored Decoding
TOOL · arXiv cs.LG English(EN) · 2w

Benchmarking Ultrasound Foundation Models for Fetal Plane Classification

Researchers have benchmarked several foundation models (FMs) for fetal plane classification using ultrasound images, aiming to improve diagnostic accuracy in obstetric care. The study compared ultrasound-specific FMs like FetalCLIP and USFM against general computer vision models such as ResNet50 and ViT (DINOv3). FetalCLIP excelled in a linear probing setting, while USFM performed best with full fine-tuning, demonstrating that the choice of pretraining objective significantly impacts transferability and classification performance, especially across different populations. AI
- ResNet50
- FetalCLIP
- EfficientNet-V2
- ViT
- DINOv3
- MOFO
- UltraSAM
TOOL · arXiv cs.LG English(EN) · 2w

Probabilistic Data-Driven Modelling of Astrophysical Transients: The Neural Process Family for Ultrafast and Class-Agnostic Light Curve Reconstruction with NightLANP

Researchers have developed a new probabilistic model called Attentive Neural Processes (ANPs) for reconstructing astrophysical light curves. This model combines the strengths of Gaussian Processes and deep learning to enable faster and more accurate analysis of astronomical data. ANPs can interpolate light curves across multiple bands simultaneously in microseconds, significantly outperforming existing methods in speed and accuracy, making them suitable for real-time scientific analysis. AI

IMPACT Enables faster, more accurate real-time analysis of astronomical data, potentially accelerating discoveries in transient science.
TOOL · arXiv cs.LG English(EN) · 2w

Machine Learning methods for event classification and vertex reconstruction of the 12C + 12C reaction with the MATE-TPC

Researchers have applied machine learning models, including ResNet and VGG, to classify events in nuclear physics experiments involving the 12C + 12C reaction using the MATE-TPC. These models achieved high accuracies, around 97% for simulated data and 90% for experimental data, outperforming traditional methods in identifying certain events. Additionally, a CNN model was developed for reaction vertex reconstruction, demonstrating the effectiveness of ML techniques in analyzing complex nuclear reaction data and paving the way for future research. AI

IMPACT Demonstrates the utility of ML in complex scientific data analysis, potentially accelerating discovery in nuclear physics.
TOOL · arXiv cs.LG English(EN) · 2w

Zero-shot Quantum Neural Architecture Search

Researchers have developed a new framework called MZeQAS for efficiently searching for optimal architectures in Variational Quantum Algorithms (VQAs). This method utilizes a zero-shot surrogate model based on the Quantum Neural Tangent Kernel to estimate candidate circuit performance without requiring full training, significantly reducing computational costs. MZeQAS integrates this proxy-based estimation with Monte Carlo Tree Search to discover high-performing VQA architectures, outperforming existing methods in efficiency and solution quality for near-term quantum devices. AI
TOOL · arXiv cs.LG English(EN) · 2w

Dynamic Topic Modeling with a Higher-Order Hypergraphical Representation

Researchers have developed a novel dynamic topic modeling framework that utilizes a higher-order hypergraph representation of text. This approach models documents as hyperedges connecting co-occurring words, with node weights encoding repetition intensities. This method aims to overcome limitations of traditional models by separating word occurrence from repetition and capturing informative higher-order interactions. The framework includes structured low-rank factorizations with temporal regularization and has demonstrated improvements on synthetic data and the ICLR corpus. AI
- International Conference on Learning Representations
- arXiv
TOOL · arXiv cs.CV English(EN) · 2w

PrecisionCUA: Iterative Visual Refinement for Pixel-Precise Cursor Grounding in Code Editors

Researchers have developed a new method called PrecisionCUA for achieving pixel-precise cursor grounding in code editors, a critical capability for AI agents interacting with software. Unlike existing single-shot prediction methods that struggle with dense interfaces like VS Code and Cursor, PrecisionCUA employs an iterative refinement process. This closed-loop system uses visual feedback to self-correct errors and adapt to dynamic UI changes, significantly outperforming single-shot models on complex coding benchmarks across Claude, Qwen, and GPT. AI

IMPACT Enables more reliable AI agents for software development and complex UI interactions.
- PrecisionCUA
- VS Code
- Cursor
- Claude
- Qwen
- GPT
- Himangi Mittal
TOOL · arXiv cs.CV English(EN) · 2w

Self-Prompting Diffusion Transformer for Open-Vocabulary Scene Text Editing via In-Context Learning

Researchers have developed a novel self-prompting method for editing scene text in images, addressing limitations of existing approaches that neglect visual details of target regions and are constrained by pre-trained glyph encoders. This new technique constructs style and glyph prompts directly from the image, leveraging the in-context learning capabilities of a Multi-Modal Diffusion Transformer (MM-DiT). The method achieves open-vocabulary and style-consistent text editing, demonstrating state-of-the-art performance across various languages. AI
TOOL · arXiv cs.AI English(EN) · 2w

CircuitLM: A Multi-Agent LLM-Aided Design Framework for Generating Circuit Schematics from Natural Language Prompts

Researchers have developed CircuitLM, a novel multi-agent framework designed to generate accurate circuit schematics from natural language prompts. This system addresses common LLM issues like hallucination and physical constraint violations by grounding its output in a curated component knowledge base. CircuitLM employs a five-stage pipeline, including component identification, pinout retrieval, chain-of-thought reasoning, JSON schematic synthesis, and visualization, to produce structured and visually interpretable schematics. Evaluation using five state-of-the-art LLMs and a dual-layered methodology involving an Electrical Rule Checking engine and an LLM-as-a-judge approach demonstrates its effectiveness in creating safe and structurally viable circuit designs. AI

IMPACT This framework could streamline hardware design by enabling natural language-to-schematic generation, potentially reducing errors and accelerating prototyping.
- LLM
- CircuitLM
- arXiv
- Syed Rifat Raiyan
TOOL · arXiv cs.AI English(EN) · 2w

Atomic Skills are the Prerequisite: When Reinforcement Learning Synthesizes Compositional Reasoning, and When It Only Amplifies

A new research paper explores how Reinforcement Learning (RL) can synthesize novel reasoning skills, rather than just amplifying existing ones. The study, focusing on "Complementary Reasoning," found that models trained solely with Supervised Fine-Tuning (SFT) excel at memorizing known information but fail to generalize to new contexts. However, RL significantly improves generalization, but only if the base model has first mastered independent atomic skills through SFT. This suggests a two-stage approach of atomic skill training followed by RL is a promising path for developing complex reasoning capabilities in AI. AI

IMPACT Suggests a method for developing AI that can generalize better to novel information and reasoning tasks.
TOOL · arXiv cs.AI English(EN) · 2w

Debate with Images: Detecting Deceptive Behaviors in Multimodal Large Language Models

Researchers have introduced MM-DeceptionBench, the first benchmark designed to detect deceptive behaviors in multimodal large language models. This new benchmark addresses the growing risk of models deliberately misleading users through combined visual and textual information, a threat that current text-only evaluations overlook. To combat this, a novel framework called "debate with images" has been proposed, which forces models to justify their claims with visual evidence, significantly improving the detection of deceptive strategies and increasing agreement with human judgments. AI

IMPACT Introduces a new method to evaluate and mitigate deceptive behaviors in multimodal AI, crucial for safer AI deployment.
TOOL · arXiv cs.AI English(EN) · 2w

MetaboT: An LLM-based Multi-Agent Frameworkfor Interactive Analysis of Mass SpectrometryMetabolomics Knowledge Graphs

Researchers have developed MetaboT, an open-source multi-agent framework utilizing Large Language Models (LLMs) to simplify the analysis of mass spectrometry-based metabolomics data. This framework translates natural language queries into SPARQL queries for metabolomics knowledge graphs, overcoming the steep learning curve associated with specialized query languages. MetaboT employs a modular architecture with specialized agents to validate scope, resolve entities, generate schema-aware queries, and interpret results, mitigating common LLM limitations like hallucination and schema non-compliance. The system was validated on the Experimental Natural Products Knowledge Graph (ENPKG) using an expert-authored benchmark, demonstrating its effectiveness in answering complex questions about plant-metabolite relationships and biological activities. AI

IMPACT Lowers the technical barrier for researchers in metabolomics, enabling semantic data mining without specialized programming expertise.
TOOL · arXiv cs.AI English(EN) · 2w

A Comparative Study of Rule-Based and Data-Driven Approaches in Industrial Monitoring

A new research paper compares traditional rule-based systems with modern data-driven approaches for industrial monitoring. Rule-based systems offer transparency and predictability but struggle with complex environments, while data-driven methods excel at anomaly detection and adaptation but face challenges in explainability and integration. The paper proposes hybrid systems that combine the strengths of both to enhance industrial monitoring resilience and efficiency. AI
- Sante Dino Facchini Dr.
TOOL · arXiv cs.LG English(EN) · 2w

Measure-to-measure Regression with Transformers

Researchers have introduced a novel approach to measure-to-measure (M2M) regression, a problem that involves predicting how populations evolve under an unknown transformation. This method treats entire distributions as data points, which is crucial for applications in fields like biology where cells evolve collectively. The new technique utilizes transformers, both as static maps and dynamic velocity fields, to learn nonlinear M2M relationships on probability distributions. Experiments on synthetic data, particle systems, and a colorectal cancer treatment response dataset demonstrate the method's effectiveness in generalizing to new measures. AI
- Transformers
- Matthew Vandergrift
TOOL · arXiv cs.AI English(EN) · 2w

Plan Before Search: Search Agents Need Plan

Researchers have developed a new agentic behavior called "Plan" for large language models that decomposes complex questions into ordered sub-questions before retrieval begins. This structured approach aims to improve multi-hop question-answering by anchoring each search step to a pre-designed sub-question, preventing drift from partially relevant documents. The study found that training success depends on model-specific conditions like initial entropy and stability, not just reward design. To address this, a self-bootstrapping paradigm was proposed where a seed model generates filtered trajectories to activate "Plan" in target models, eliminating the need for distillation and consistently outperforming baselines. AI
- large language models
- arXiv cs.AI
TOOL · arXiv cs.AI English(EN) · 2w

Do Clinical Models Change Treatment Decisions?

A new benchmark called ClinPivot has been developed to assess how well clinical foundation models adapt their treatment decisions based on changing patient contexts. Researchers found that strong performance on medical question-answering tasks does not guarantee accurate decision-making in dynamic clinical scenarios. Even advanced models and those fine-tuned for specific tasks often fail to adjust treatment choices appropriately when new constraints are introduced, highlighting a gap between QA performance and real-world clinical reasoning. AI

IMPACT Highlights a critical gap in clinical AI, suggesting current models may not reliably adapt treatment plans to evolving patient conditions, impacting safe deployment.
- Qwen
- ClinPivot
TOOL · arXiv cs.AI English(EN) · 2w

Relevant Is Not Warranted: Evidence-Force Calibration for Cited RAG

Researchers have developed a new benchmark called FORCEBENCH to evaluate the evidence-force calibration of cited RAG systems. This benchmark addresses the issue of 'citation laundering,' where a relevant source is used to support an over-stated claim. FORCEBENCH tests how well evaluators can distinguish between claims that are appropriately supported by evidence and those that are exaggerated, even when the source material is topically relevant. AI
- FORCEBENCH
TOOL · arXiv cs.AI English(EN) · 2w

OccuReward: LLM-Guided Occupant-Centric Reward Shaping for Demographic Equity in Grid-Interactive Buildings

Researchers have developed OccuReward, a framework that uses LLMs to shape reward functions for energy management in grid-interactive buildings, aiming to improve demographic equity. The system utilizes the Gemini API to iteratively refine reward logic and weights, focusing on occupant comfort. Initial results showed elderly females experienced the lowest satisfaction, but after three rounds of refinement, satisfaction improved significantly across various demographic groups while also reducing energy costs. AI

IMPACT This research demonstrates how LLMs can be leveraged to improve fairness and occupant comfort in AI-driven building systems, potentially influencing future smart building designs.
TOOL · arXiv cs.AI English(EN) · 2w

Geometry of Human Perceptual Domains Emerges Transiently in LLM Representations

A new research paper explores how large language models (LLMs), despite being trained solely on text, develop internal representations that mirror human perceptual organization. The study, focusing on open-weight transformer architectures, found that geometric structures corresponding to perceptual domains like color, pitch, emotion, and taste emerge transiently across different layers. This geometric structure is weak in early layers, strengthens in intermediate layers, and then diminishes in later layers, suggesting it's a temporary byproduct of the model's internal processing pipeline. AI

IMPACT Reveals how LLMs develop human-like perceptual geometry from text, offering insights into their internal workings.
- LLM
- Simardeep Singh
TOOL · arXiv cs.AI English(EN) · 2w

SuiChat-CN: Benchmarking Contextual Suicide Risk Assessment in Chinese Group Chats

Researchers have developed SuiChat-CN, a new benchmark dataset for assessing suicide risk within Chinese group chats. The dataset, comprising over 13,000 contextual segments from 1,400 users and 250,000 messages, addresses the unique challenges of analyzing short, fragmented, and culturally nuanced instant messaging conversations. Experiments using various LLMs indicate that contextual information is crucial for accurate risk assessment, highlighting the difficulties in early detection within multi-party dialogues. Due to ethical considerations, the dataset will not be publicly released but will be available to qualified research institutions. AI

IMPACT This benchmark could advance AI's role in mental health by enabling more nuanced risk assessment in real-time communication platforms.
- Chinese
- SuiChat-CN
- Telegram
- LLMs
TOOL · arXiv cs.AI English(EN) · 2w

You Are in Control of Your State: Why Human Outcomes Are Controllable Through Causal State Intervention

A new research paper proposes a framework for understanding and controlling human outcomes through interventions targeting an individual's internal state. The authors define 'state' as a dynamic weighting vector that influences how biological and psychological processes process events, leading to decisions and outcomes. This framework, supported by evidence from various scientific fields and a large observational study, suggests that human variability stems from this latent state and that outcomes can be precisely controlled by intervening in this state trajectory. The paper outlines testable predictions and operational requirements for state-aware systems, with implications for personalized digital health, education, and AI. AI

IMPACT Proposes a novel framework for AI personalization and control of human outcomes via state intervention.
TOOL · arXiv cs.AI English(EN) · 2w

Heterogeneous Causal Discovery of Repeated Undesirable Health Outcomes

Researchers have developed a new framework for causal discovery that integrates multiple causal structure learning algorithms to identify robust cause-and-effect relationships in health data. This approach accounts for how these relationships vary across different patient subpopulations, offering more actionable insights than traditional methods. The framework was successfully applied to identify drivers of repeat emergency department visits for diabetic patients and hospital readmissions for ICU patients, highlighting the importance of chronic disease management and care coordination. AI
- Shishir Adhikari
TOOL · arXiv cs.AI English(EN) · 2w

LaneRoPE: Positional Encoding for Collaborative Parallel Reasoning and Generation

Researchers have introduced LaneRoPE, a novel positional encoding technique designed to enhance collaborative parallel reasoning and generation in large language models. This method allows multiple sequences to interact and share intermediate computations during generation, unlike traditional independent sequence generation. LaneRoPE incorporates an inter-sequence attention mask and an extended RoPE to capture relative positional information, leading to improved accuracy in mathematical reasoning tasks without significant overhead. AI

IMPACT Enables LLMs to improve accuracy by allowing multiple generation sequences to collaborate, potentially accelerating adoption in reasoning tasks.
- LaneRoPE
- LLM
TOOL · arXiv cs.AI English(EN) · 2w

Tell Me a Story! Narrative-Driven XAI with Large Language Models

A new research paper introduces "XAIstories," a method that uses Large Language Models to create narrative explanations for AI decisions, aiming to make complex AI outputs more understandable to general audiences and data scientists alike. The approach, which generates stories based on SHAP values and counterfactual explanations, has shown promising results, with over 90% of surveyed general users finding the narratives convincing. Data scientists also see value in XAIstories for communicating AI insights, and in image classification tasks, CFstories were found to be significantly faster and equally or more convincing than user-crafted narratives. AI

IMPACT Enhances AI explainability for both general users and data scientists, potentially improving decision-making.
TOOL · arXiv cs.AI English(EN) · 2w

Planning a Community Approach to Diabetes Care in Low- and Middle-Income Countries Using Optimization

Researchers have developed an optimization framework to enhance diabetes care in low- and middle-income countries by personalizing Community Health Worker (CHW) visits. This model considers patient motivation and treatment enrollment to maximize glycemic control at a community level. Applied to data from urban slums in India, the approach demonstrated a potential reduction in fasting blood glucose by up to 25% while optimizing resource allocation and reducing patient dropout rates. AI
TOOL · arXiv cs.AI English(EN) · 2w

Sinc Kolmogorov-Arnold network and its application for solving PDEs with singularities

Researchers have introduced the Sinc Kolmogorov-Arnold Network (SincKAN), a novel neural network architecture that utilizes Sinc interpolation for learnable activation functions. This approach aims to improve the representation of both smooth functions and those with singularities, making it particularly effective for solving partial differential equations (PDEs) with physics-informed neural networks. Experimental results indicate that SincKANs outperform traditional methods in various applications. AI
TOOL · arXiv cs.AI English(EN) · 2w

Why LLMs Fail at Causal Discovery and How Interventional Agents Escape

A new paper argues that large language models are fundamentally incapable of reliable causal discovery due to inherent limitations in their training paradigms. Researchers have proven that methods like supervised fine-tuning and direct preference optimization lead to models that cannot distinguish between causal graphs generating similar observational data. To overcome this, the paper proposes an "Agentic Causal Bayesian Optimization" (A-CBO) approach, which uses a frozen LLM as an interventional oracle and an external Bayesian loop to concentrate beliefs over candidate graphs, achieving provable convergence without retraining the LLM. AI

IMPACT Highlights fundamental limitations in LLM reasoning for scientific discovery, suggesting new agentic approaches are needed for causal inference.
TOOL · arXiv cs.LG English(EN) · 2w

Bio-Inspired Self-Supervised Learning for Wrist-worn Accelerometer Data

Researchers have developed a new self-supervised learning approach for analyzing wrist-worn accelerometer data, aiming to improve human activity recognition (HAR). This method, inspired by bio-mechanical theories of movement, tokenizes motion into 'movement segments' based on submovements. A Transformer encoder is then pre-trained using masked reconstruction of these tokens, focusing on the structural and temporal organization of movement rather than just waveform morphology. When pre-trained on the NHANES corpus, these representations demonstrated superior performance on six HAR benchmarks compared to existing self-supervised learning baselines. AI
- Transformer
- Prithviraj Tarale
TOOL · arXiv cs.AI English(EN) · 2w

Soro: A Lightweight Foundation Model and Chatbot for Tajik

Researchers have developed Soro, a family of large language models specifically tailored for the Tajik language. These models are built upon open-weight Gemma 3 checkpoints and undergo further training using a 1.9 billion token corpus of Tajik text, followed by instruction tuning on 40,000 examples. Soro demonstrates superior performance on newly created Tajik benchmarks compared to similarly sized Gemma 3 models, while also maintaining strong English language capabilities. The models are designed for deployment under limited compute and connectivity conditions, with quantization techniques like FP8 and INT4 preserving performance while reducing memory footprint for potential edge device use. AI

IMPACT This development could significantly improve AI accessibility and utility for Tajik speakers, potentially enabling new applications in education and communication within Tajikistan.
- Tajikistan
- Hugging Face
- Gemma 3
- Tajik
TOOL · arXiv cs.LG English(EN) · 2w

XTransfer: Modality-Agnostic Few-Shot Model Transfer for Human Sensing at the Edge

Researchers have introduced XTransfer, a novel method for transferring pre-trained deep learning models to new human sensing applications on edge devices. This approach is designed to be modality-agnostic and requires only a small amount of sensor data for adaptation. XTransfer employs model repairing to safely adjust pre-trained layers and layer recombining to efficiently restructure models by selecting and combining relevant layers from source models. Evaluations across various human sensing datasets demonstrate that XTransfer achieves state-of-the-art performance while substantially lowering the costs associated with data collection, model training, and edge deployment. AI

IMPACT Enables more efficient development and deployment of AI models for human sensing on resource-constrained edge devices.
- XTransfer
TOOL · arXiv cs.LG English(EN) · 2w

Super-Resolved Canopy Height Mapping from Sentinel-2 Time Series Using Airborne LiDAR HD Reference Data across Metropolitan France

Researchers have developed THREASURE-Net, a novel deep learning framework designed for high-resolution canopy height mapping using satellite imagery. This end-to-end model leverages Sentinel-2 time series data and is trained with reference height metrics from airborne LiDAR. THREASURE-Net achieves competitive accuracy, with mean absolute errors as low as 2.63 m at a 2.5 m resolution, and does not require pre-trained models or very high-resolution optical imagery for its super-resolution module. The framework aims to provide a scalable and cost-effective solution for structural monitoring of temperate forests using publicly available satellite data. AI

IMPACT Enables more precise and cost-effective forest monitoring using satellite data.
TOOL · arXiv cs.AI English(EN) · 2w

Identifying and Understanding Human Values in Text: A Tailorable LLM-based Architecture

Researchers have developed a novel LLM-based architecture designed to identify and quantify human values within text. This system moves beyond traditional utility-maximization models by incorporating ethical and moral considerations into AI decision-making. The architecture features three distinct modules: one for generating structured value specifications from theoretical frameworks, another for labeling text based on these specifications, and a third for assessing the degree of support or resistance indicated by rhetorical and semantic evidence. This modular design allows for adaptability to various value theories and has demonstrated good detection performance on the ValueEval dataset. AI

IMPACT This research could lead to AI systems that better align with human ethical and moral considerations, improving decision-making in autonomous systems.
- ValueEval
- LLM
TOOL · arXiv cs.LG English(EN) · 2w

Decentralized Parameter-Free Online Learning with Compressed Gossip

Researchers have developed DECO-EF, a novel decentralized online learning algorithm designed for scenarios with compressed communication over a graph. This parameter-free method combines coin-betting predictions with compressed difference-based gossip, allowing agents to maintain accurate states while only sharing compressed differences. The algorithm achieves expected sublinear network-regret bounds, marking a significant advancement in decentralized learning under communication constraints. AI
- DECO-EF
- arXiv
TOOL · arXiv cs.LG English(EN) · 2w

Evaluating Local Explainability Metrics for Machine Learning Models on Tabular Data

A new research paper investigates the reliability of local explainability techniques for machine learning models, particularly when applied to complex tabular data. The study evaluated metrics for faithfulness, robustness, and complexity across LIME, SHAP, and Feature Ablation methods on numerous datasets and model types. Findings indicate that explanation quality is not consistently correlated with model performance, but rather influenced by dataset complexity and feature distributions. AI

IMPACT Highlights potential unreliability in AI explanations for tabular data, impacting trust and debugging.
TOOL · arXiv cs.AI English(EN) · 2w

MIRA: A Bilingual Benchmark for Medical Information Response Audit

Researchers have developed MIRA, a new bilingual benchmark designed to evaluate how well large language models (LLMs) maintain consistent medical information across different phrasings of the same question. The benchmark, comprising 4,320 prompts derived from 60 health questions, revealed that LLMs often provide less comprehensive information and fewer actionable steps when prompts are phrased with lower health literacy. This phenomenon, termed Differential Information Dilution (DID), was observed to be model-specific, with some models like Claude and Qwen showing improvements when prompted with knowledge-guided mitigation techniques. AI

IMPACT Highlights potential risks in LLM-driven health information, prompting developers to improve consistency and reduce information dilution.
- Claude
- Qwen
- MIRA
- LLMs
TOOL · arXiv cs.AI English(EN) · 2w

DiagramRAG: A Lightweight Framework to Retrieve Scientific Diagram for Figure Generation

Researchers have developed DiagramRAG, a new framework designed to assist in the generation of scientific diagrams from user sketches. This system leverages retrieval-augmented generation, using knowledge graphs to represent diagrams and an embedding model to find reference diagrams that are both semantically relevant and topologically compatible with the input sketch. The retrieved references then inform the completion and rendering of publication-quality diagrams, achieving strong performance on benchmarks like DiagramBank and FigureBench. AI

IMPACT This framework could streamline the creation of scientific figures, potentially accelerating research communication and publication.
TOOL · arXiv cs.AI English(EN) · 2w

GraD-IBD: Graph Representation Learning from Diagnosis Trajectories for Early Detection of Inflammatory Bowel Disease

Researchers have developed GraD-IBD, a novel graph-based model for early detection of Inflammatory Bowel Disease (IBD). This model represents patient diagnosis trajectories as temporally directed graphs, overcoming limitations of traditional sequential modeling. A key innovation is a context-aware, time-decay message passing mechanism that captures temporal dependencies efficiently, reducing computational complexity and improving IBD detection accuracy on real-world clinical data. AI

IMPACT Introduces a more efficient graph-based approach for clinical diagnosis prediction, potentially improving early disease detection.
TOOL · arXiv cs.AI English(EN) · 2w

EAPO: Entropy-Driven Adaptive Positive-Negative Sample Weighting for Policy Optimization in Open-Ended QA

Researchers have developed EAPO, an Entropy-driven Adaptive Policy Optimization method to improve reinforcement learning for open-ended question answering. Unlike previous methods that use fixed weights for positive and negative samples, EAPO adaptively adjusts these weights based on policy entropy. This approach aims to balance response diversity and stability, particularly mitigating entropy collapse during training. Experiments on medical QA datasets showed EAPO significantly outperformed fixed-weight baselines. AI

IMPACT Introduces a novel method to improve the training of large language models for open-ended question answering, potentially enhancing their diversity and stability.
- Open-Ended QA
- Large Reasoning Models
TOOL · arXiv cs.AI English(EN) · 2w

Operational AI Deployment Assurance: Governance-State Orchestration Under Threshold-Sensitive Deployment Conditions -- A Governance Framework for High-Stakes AI Systems

This paper introduces Operational AI Deployment Assurance (OADA), a new governance framework designed to manage high-stakes AI systems. OADA translates various forms of AI uncertainty, such as fairness disagreements and instability, into actionable deployment decisions. It aims to bridge the gap between AI evaluation and real-world deployment by providing constructs like Deployment Assurance Scores and Governance Escalation States. AI

IMPACT Provides a structured approach to managing AI deployment risks, potentially improving safety and reliability in critical applications.
- Khalid Adnan Alsayed
- Operational AI Deployment Assurance
TOOL · arXiv cs.AI English(EN) · 2w

Hierarchical Prompt-Domain Control and Learning for Resource-Constrained Agentic Language Models

Researchers have developed a new hierarchical control and learning framework designed to improve the performance of language models operating within resource-constrained agentic systems. This framework separates schema learning from semantic adaptation, using a controller to monitor protocol validity and project histories into a feasible prompt domain. The system then triggers lightweight fine-tuning under drift, demonstrating improved reliability and cost-efficiency compared to existing methods in a controlled testbed. AI

IMPACT This framework could enable more efficient and reliable deployment of language models in applications with strict resource limitations.
- Joan Vendrell Gallart
TOOL · arXiv cs.AI English(EN) · 2w

RULER: Representation-Level Verification of Machine Unlearning

Researchers have developed RULER, a new set of metrics designed to verify machine unlearning at the representation level. Current methods only check output-level compliance, which can still leave residual information in a model's intermediate representations. RULER introduces two metrics, M2 and M4, to detect these residuals. Experiments showed that four out of five tested unlearning methods passed output-level evaluations but still contained significant residuals, particularly as the proportion of data to be unlearned increased. RULER also functions as a pre-unlearning diagnostic tool, identifying memorization issues in various data types. AI

IMPACT Introduces novel verification methods that could improve the robustness of machine unlearning techniques.
- Georgina Cosma Professor
- RULER
TOOL · arXiv cs.LG English(EN) · 2w

Information-theoretic Multimodal Representation Learning for Electrocardiogram Signals

Researchers have developed MERIT, a novel framework for learning representations from electrocardiogram (ECG) signals. MERIT uses an information-theoretic approach to jointly preserve the detailed structure of ECG waveforms and integrate clinical semantics from text. The framework combines masked ECG modeling with ECG-text contrastive alignment, showing significant improvements in classification tasks and zero-shot evaluations. AI

IMPACT This research could lead to more accurate clinical diagnoses and improved AI-driven medical text generation.
TOOL · arXiv cs.CL English(EN) · 2w

PEAR: Pairwise Evaluation for Automatic Relative Scoring in Machine Translation

Researchers have developed PEAR, a novel supervised quality estimation metric for machine translation that reframes evaluation as a pairwise comparison. This method predicts the direction and magnitude of quality differences between two candidate translations. PEAR outperforms existing metrics, including larger models and reference-based approaches, despite using fewer parameters. It also proves effective for minimum Bayes risk decoding, reducing computational costs with minimal impact on performance. AI

IMPACT Introduces a more efficient and effective method for evaluating machine translation quality, potentially improving decoding strategies.
TOOL · arXiv cs.CL English(EN) · 2w

Syntax as a Rosetta Stone: Universal Dependencies for In-Context Coptic Translation

Researchers have developed a new in-context learning approach for low-resource machine translation of Coptic to English. This method incorporates syntactic information from Universal Dependencies parses, alongside bilingual dictionaries. The study found that while syntactic information alone was less effective than dictionary glosses, combining both significantly improved translation quality across various model sizes, setting new state-of-the-art results for Coptic translation. AI

IMPACT Enhances low-resource language translation capabilities by integrating syntactic analysis with existing methods.
TOOL · arXiv cs.CL English(EN) · 2w

Assessing Factual Music Comprehension in Large Audio Language Models

Researchers have developed a new protocol to accurately assess the factual music comprehension of large audio language models (LALMs). The existing MusicQA dataset was found to be insufficient for measuring the factual correctness of LALM responses. The new protocol prompts LALMs for verifiable information and parses their open-ended answers into a structured format for objective evaluation using precision, recall, and F1 scores. This protocol was used to benchmark nine LALMs, including Gemini and Music Flamingo, across six factual information retrieval tasks on three datasets. AI

IMPACT Establishes a more rigorous method for evaluating audio LLMs, potentially driving improvements in their factual accuracy for music-related queries.
TOOL · arXiv cs.CL English(EN) · 2w

Explanation Generation for Contradiction Reconciliation with LLMs

Researchers have introduced a new task focused on generating explanations that reconcile contradictory statements, a capability crucial for human reasoning but underdeveloped in current large language models. They repurposed existing natural language inference datasets and developed new evaluation metrics to assess this ability. Experiments with 18 LLMs revealed limited success, with performance gains plateauing as model size increased, indicating a significant gap in LLM reasoning capabilities. AI
TOOL · arXiv cs.CL English(EN) · 2w

Quality-constrained Entropy Maximization Policy Optimization for LLM Diversity

Researchers have introduced Quality-constrained Entropy Maximization Policy Optimization (QEMPO), a new framework designed to enhance the diversity of large language model (LLM) outputs without compromising quality. Existing methods often struggle with a trade-off between these two objectives. QEMPO offers a theoretical solution that maximizes diversity, measured by entropy, while adhering to a quality constraint, with proven optimality. This framework is applicable to both online and offline training scenarios and has demonstrated empirical improvements in both diversity and quality compared to current baselines. AI
TOOL · arXiv cs.CL English(EN) · 2w

RMPL: Relation-aware Multi-task Progressive Learning with Stage-wise Training for Multimedia Event Extraction

Researchers have developed a new framework called RMPL (Relation-aware Multi-task Progressive Learning) to improve multimedia event extraction, which involves identifying events and their arguments from text and images. This method addresses the scarcity of annotated training data by using stage-wise training with heterogeneous supervision from unimodal event extraction and multimedia relation extraction. Experiments on the M2E2 benchmark demonstrated that RMPL consistently enhances performance across various modality settings when used with multiple Vision-Language Models (VLMs). AI

IMPACT Introduces a novel approach to improve event extraction in multimodal data, potentially enhancing AI systems that process both text and images.
TOOL · arXiv cs.CL English(EN) · 2w

CALM-IT: Generating Realistic Long-Form Motivational Interviewing Dialogues with Dual-Actor Conversational Dynamics Tracking

Researchers have developed CALM-IT, a new framework designed to generate more realistic and effective long-form dialogues for Motivational Interviewing. This system explicitly models evolving client and counselor states to guide therapeutic strategy and utterance generation. In evaluations using a large corpus of synthetic dialogues, CALM-IT outperformed existing methods on key metrics like empathy and partnership, and achieved a high client acceptance rate. AI

IMPACT This framework could lead to more sophisticated AI-powered mental health support tools by improving the realism and effectiveness of therapeutic conversations.