Brief

last 24h

[50/1914] 224 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · arXiv cs.CV English(EN) · 2w

Bounded-Compute Multimodal Regression for Product-Rating Prediction

Researchers have developed a new method for product-rating prediction using vision-language models (VLMs) that operates under strict latency budgets. Their approach, adapted from SmolVLM2-256M-Video-Instruct for the LoViF 2026 Efficient VLM challenge, replaces autoregressive text generation with a lightweight MLP for feature-based regression. This bounded-compute adaptation achieved strong results in correlation and prediction accuracy on a held-out evaluation set. AI

IMPACT This research offers a new approach for efficient multimodal regression, potentially improving product rating prediction in resource-constrained environments.
- LoViF 2026 Efficient VLM challenge
- SmolVLM2-256M-Video-Instruct
TOOL · arXiv cs.CV English(EN) · 2w

CuriosAI Submission to the CASTLE Challenge at EgoVis 2026

CuriosAI has submitted a paper detailing their approach to the CASTLE Challenge, which involves answering multiple-choice questions based on extensive egocentric video data. Their primary method, SVA (Search-Verify-Answer), employs a three-stage pipeline that refines potential answers using a Vision-Language Model (VLM) and an LLM judge, achieving an accuracy of 0.50. A secondary approach, TMKG (Temporal-Multimodal-Knowledge-Graph), builds a knowledge graph from the video data but achieved a lower accuracy of 0.35. AI

IMPACT This research explores novel methods for video understanding and question answering, potentially advancing multimodal AI capabilities.
- CASTLE Challenge
- EgoVis 2026
- LLM
- TMKG
- VLM
- CuriosAI
TOOL · arXiv cs.CV English(EN) · 2w

A Road-Conditioned Traffic Movie Prediction Network with Spatiotemporal and Structure-Consistent Learning

Researchers have developed RCSNet, a novel network designed for predicting future traffic conditions as spatial maps across entire urban areas. This method addresses limitations in existing approaches by integrating road network structures, connectivity, and travel directions into its forecasting model. RCSNet reformulates traffic prediction as topology-guided future-state generation, improving temporal consistency and accuracy, particularly in cross-city scenarios. AI

IMPACT This new model could improve urban planning and traffic management by providing more accurate and structurally consistent traffic forecasts.
- Chicago
- RCSNet
- Berlin
- Antwerp
- Moscow
- Bangkok
- Joshua Kofi Asamoah
TOOL · arXiv cs.CV English(EN) · 2w

Rethinking Video-Language Model from the Language Input Perspective

Researchers have proposed a new framework to improve Video-Language Models (VLMs) by addressing limitations in text input. Current VLMs often rely on predefined text templates, which are restrictive and time-consuming to create. This new approach generates positive and negative texts from existing ones to target specific components, employs an attribute-based reasoning strategy for fine-grained semantics, and uses video guidance for cross-modal bridging with a self-weighted loss. Experiments indicate this framework can be integrated as a plug-and-play module to enhance the performance of existing state-of-the-art VLMs. AI

IMPACT This research could lead to more flexible and user-friendly Video-Language Models by reducing reliance on rigid text templates.
- Video-Language Models
- large language models
TOOL · arXiv cs.CV English(EN) · 2w

SIGMA: Semantic-Difference Instruction-Grounding Mask Annotator for Text-Driven Image Manipulation Localization

Researchers have developed SIGMA, a novel method for automatically generating pixel-level masks for image manipulation localization (IML) datasets. SIGMA addresses the challenge of low-cost data acquisition by leveraging existing image editing datasets, which contain millions of original and edited image pairs. The system uses semantic-feature differencing within a vision foundation backbone and incorporates instruction-derived spatial priors through cross-modal refinement to accurately identify manipulation regions, even accounting for unintended side effects. SIGMA has demonstrated superior performance compared to existing mask generators and, when applied to public editing corpora, has created a substantial training set that significantly improves the performance of various IML detectors. AI
TOOL · arXiv cs.CL English(EN) · 2w

Chinese Word Boundary Recovery through Character Alignment Projection

Researchers have developed a novel method for Chinese word boundary recovery, particularly effective for non-standard text like that produced by language learners. The approach formulates the problem as an alignment-based projection task, where character-level alignments between a noisy source sentence and a cleaner target sentence are used to project word boundaries from the target back to the source. This technique proves more robust than direct segmentation, correcting over-segmentation errors and stabilizing annotation and evaluation processes for noisy input. AI
TOOL · arXiv cs.CL English(EN) · 2w

Challenges in Explaining Pretrained Clinical Text Classifiers

A new research paper highlights significant limitations in current methods for explaining predictions made by pretrained clinical text classifiers. The study identifies issues with post-hoc techniques like LIME and SHAP, particularly their tendency to overemphasize non-informative tokens and produce unstable attributions. These findings suggest a need for more clinically relevant and semantically grounded explanation strategies for complex medical text analysis. AI

IMPACT Highlights the need for more robust and clinically meaningful explanation methods in medical AI.
- SHAP
TOOL · arXiv cs.CL English(EN) · 2w

Knowledge Dependency Estimation for Reliable Question Answering

Researchers have developed a new method called Knot to estimate the knowledge dependencies of question-answering models. This technique aims to identify which pieces of information a model relies on to generate an answer, addressing the challenge of noisy and redundant knowledge sources in large language model-based QA systems. Knot uses subset-level counterfactual supervision and models subset sensitivity to provide fine-grained dependency scores, outperforming existing baselines in predicting subset sensitivity and identifying influential knowledge candidates. AI
- Knot
- arXiv
TOOL · arXiv cs.LG English(EN) · 2w

SYNAPSE: Neuro-Symbolic Visual Thought-to-Text Decoding via Topological Semantic Denoising

Researchers have developed SYNAPSE, a novel neuro-symbolic framework designed to enhance the accuracy of translating brain activity into text. This system addresses the issue of biological noise in EEG data, which can lead to inaccurate or unstable text generation by large language models. SYNAPSE stabilizes this process by using commonsense graph structures and latent examples to refine semantic candidates derived from neural signals, improving stability without requiring extensive LLM fine-tuning. AI

IMPACT This framework could improve the reliability of brain-computer interfaces for text generation, potentially aiding communication for individuals with certain disabilities.
TOOL · arXiv cs.CL English(EN) · 2w

Beyond pass@k: Redundancy-Aware RLVR for Multi-Sample Code Generation

Researchers have developed a new method called Redundancy-Aware RLVR to improve code generation from large language models. This approach addresses the issue of generated code samples being too similar to each other, which can hinder performance. By incorporating anti-redundancy rewards based on code similarity detection, the method aims to produce more diverse and executable code, often matching or surpassing existing techniques. AI
- Pass@k
- LLMs
- Florian Le Bronnec
- JPlag
- RLVR
TOOL · arXiv cs.LG English(EN) · 2w

Ariel-ML: Computing Parallelization with Embedded Rust for Neural Networks on Heterogeneous Multi-core Microcontrollers

A new toolkit named Ariel-ML has been developed to automate parallelization for neural network inference on multi-core microcontrollers using embedded Rust. This toolkit is designed to leverage the capabilities of heterogeneous multi-core architectures found in various 32-bit microcontrollers, including Arm Cortex-M, RISC-V, and ESP-32 families. Benchmarks show that Ariel-ML achieves lower inference latency compared to existing solutions while maintaining comparable memory footprints to toolkits using C/C++. AI

IMPACT Enables more efficient AI model deployment on low-power, multi-core embedded systems.
- TinyML
- Arm Cortex-M
- RISC-V
- ESP-32
- Zhaolan Huang
- Rust
- Neural Networks
TOOL · arXiv cs.CL English(EN) · 2w

TARQ: Tail-Aware Reconstruction Quantization for Rare-Word Robust Automatic Speech Recognition

Researchers have developed a new post-training quantization technique called TARQ, designed to improve the accuracy of Automatic Speech Recognition (ASR) systems, particularly for rare words. TARQ addresses a limitation in existing methods by shifting calibration focus towards less frequent terms like names and numerals, which are often critical for understanding. This novel approach, which requires no additional training or labeled data, has demonstrated improved performance on rare-word error rates across various ASR models and datasets without negatively impacting overall accuracy. AI
TOOL · arXiv cs.CL English(EN) · 2w

ReverseMath: Answer Inversion for Scalable and Verifiable Mathematical Problem Generation

Researchers have developed a new method called ReverseMath to generate mathematical problems for evaluating and training large language models (LLMs). This technique works by inverting the input-output relationship of existing problems, creating new problems where the answer is known by construction. Experiments show that LLMs sometimes struggle with these reversed problems, indicating potential memorization rather than true reasoning. ReverseMath can also be used to augment training data for reinforcement learning, leading to improved mathematical reasoning performance. AI

IMPACT Provides a scalable way to generate verifiable training data and analyze LLM reasoning capabilities.
- LLMs
TOOL · arXiv cs.LG English(EN) · 2w

UDM-GRPO: Stable and Efficient Group Relative Policy Optimization for Uniform Discrete Diffusion Models

Researchers have introduced UDM-GRPO, a novel framework that integrates Uniform Discrete Diffusion Models (UDMs) with reinforcement learning for improved discrete generative modeling. The method enhances training stability and performance by treating the final clean sample as an action and reconstructing trajectories via the diffusion forward process. Additional strategies like Reduced-Step and CFG-Free further boost efficiency, leading to state-of-the-art results in text-to-image tasks, OCR benchmarks, and other applications. AI

IMPACT This research could lead to more stable and efficient discrete generative models, improving performance in tasks like text-to-image generation and OCR.
TOOL · arXiv cs.CL English(EN) · 2w

FinBoardBench: Benchmarking Dynamic Wealth Management and Strategic Financial Reasoning of LLMs via Board Game Simulations

Researchers have developed FinBoardBench, a new evaluation suite designed to test the dynamic financial reasoning and wealth management capabilities of large language models (LLMs). The suite utilizes three classic board games: Cashflow, Acquire, and Monopoly, to assess skills such as cash flow management, investment forecasting, and negotiation. Experiments with nine advanced LLMs showed that while they possess basic planning abilities, they struggle with complex interactions and dynamic decision-making, often prioritizing asset acquisition over liquidity and becoming vulnerable to financial crises. AI

IMPACT This benchmark could reveal critical limitations in LLMs' real-world financial decision-making, guiding future development towards more robust and adaptable AI agents.
TOOL · arXiv cs.LG English(EN) · 2w

Bio-Inspired Self-Supervised Learning for Wrist-worn Accelerometer Data

Researchers have developed a new self-supervised learning approach for analyzing wrist-worn accelerometer data, aiming to improve human activity recognition (HAR). This method, inspired by bio-mechanical theories of movement, tokenizes motion into 'movement segments' based on submovements. A Transformer encoder is then pre-trained using masked reconstruction of these tokens, focusing on the structural and temporal organization of movement rather than just waveform morphology. When pre-trained on the NHANES corpus, these representations demonstrated superior performance on six HAR benchmarks compared to existing self-supervised learning baselines. AI
- Prithviraj Tarale
- Transformer
TOOL · arXiv cs.LG English(EN) · 2w

XTransfer: Modality-Agnostic Few-Shot Model Transfer for Human Sensing at the Edge

Researchers have introduced XTransfer, a novel method for transferring pre-trained deep learning models to new human sensing applications on edge devices. This approach is designed to be modality-agnostic and requires only a small amount of sensor data for adaptation. XTransfer employs model repairing to safely adjust pre-trained layers and layer recombining to efficiently restructure models by selecting and combining relevant layers from source models. Evaluations across various human sensing datasets demonstrate that XTransfer achieves state-of-the-art performance while substantially lowering the costs associated with data collection, model training, and edge deployment. AI

IMPACT Enables more efficient development and deployment of AI models for human sensing on resource-constrained edge devices.
- XTransfer
TOOL · arXiv cs.LG English(EN) · 2w

Super-Resolved Canopy Height Mapping from Sentinel-2 Time Series Using Airborne LiDAR HD Reference Data across Metropolitan France

Researchers have developed THREASURE-Net, a novel deep learning framework designed for high-resolution canopy height mapping using satellite imagery. This end-to-end model leverages Sentinel-2 time series data and is trained with reference height metrics from airborne LiDAR. THREASURE-Net achieves competitive accuracy, with mean absolute errors as low as 2.63 m at a 2.5 m resolution, and does not require pre-trained models or very high-resolution optical imagery for its super-resolution module. The framework aims to provide a scalable and cost-effective solution for structural monitoring of temperate forests using publicly available satellite data. AI

IMPACT Enables more precise and cost-effective forest monitoring using satellite data.
TOOL · arXiv cs.LG English(EN) · 2w

Graph Neural Networks for Source Detection: A Review and Benchmark Study

A new study published on arXiv explores the effectiveness of Graph Neural Networks (GNNs) for source detection in epidemic processes on contact networks. Researchers systematically reviewed existing GNN-based methods and conducted a benchmark study comparing four GNN architectures against traditional and MLP-based baselines. The experiments demonstrated that GNNs significantly outperform other tested methods across various network topologies, challenging initial skepticism and highlighting their remarkable effectiveness for this task. The study also released all code and data to ensure reproducibility and proposed epidemic source detection as a valuable benchmark for evaluating GNN architectures. AI

IMPACT Demonstrates GNNs' superior performance in identifying epidemic origins, potentially improving public health response and network analysis.
TOOL · arXiv cs.CL English(EN) · 2w

Boundary Suppression Asymmetry in Post-trained Assistants: Over-expansion as a Controllability Cost

Researchers have identified a phenomenon called boundary suppression asymmetry in post-trained language model assistants. This asymmetry means that while these assistants are trained to be helpful and complete, it becomes harder to suppress certain helpful tendencies, like over-answering or providing too much information, when explicitly asked for narrower responses. The study suggests this is due to a combination of content budget overshoot and continuation persistence, making boundary correction more difficult for specific helpful assistant behaviors. AI

IMPACT Highlights potential challenges in fine-tuning AI assistants for precise control over response length and detail.
TOOL · arXiv cs.CL English(EN) · 2w

MERIT: Matching Expertise via Rubric-Informed Training for Reviewer Assignment

Researchers have developed MERIT, a novel two-stage framework designed to improve the assignment of suitable reviewers to academic submissions. The system first trains a reviewer assessor using reinforcement learning, guided by an LLM judge and paper-specific expertise rubrics, to identify and match expertise dimensions. This assessor's predictions are then distilled into an embedding-based retriever for efficient, large-scale assignment. MERIT's 4B reviewer assessor has demonstrated superior performance compared to larger general-purpose LLMs on suitability classification, and its retriever achieves state-of-the-art results on benchmark datasets. AI
- MERIT
- LLM
- LR-Bench
- CMU Gold dataset
TOOL · arXiv cs.AI English(EN) · 2w

Anatomy-Slot: Unsupervised Anatomical Factorization for Homologous Bilateral Reasoning in Retinal Diagnosis

Researchers have developed a new unsupervised method called Anatomy-Slot for analyzing retinal images, which improves diagnostic accuracy by explicitly comparing homologous anatomical structures between the left and right eyes. This approach decomposes image patches into distinct anatomical regions, enabling a more robust bilateral reasoning process. The method demonstrated a significant improvement in AUC by 4.2 points over a baseline model on the ODIR-5K dataset, suggesting a path toward more interpretable diagnostic systems that align with clinical practices. AI

IMPACT This unsupervised anatomical factorization method could lead to more interpretable and accurate AI-driven diagnostic systems in ophthalmology.
TOOL · arXiv cs.AI English(EN) · 2w

Negative Advantages Is a Double-Edged Sword: Calibrating advantages in GRPO for Search Agents

A new method called CalibAdv has been developed to improve the training stability and performance of search agents, particularly those using Group Relative Policy Optimization (GRPO). This approach addresses issues where correct intermediate steps are penalized due to final answer errors and where training can become unstable, leading to performance degradation. CalibAdv achieves this by fine-tuning the assignment of advantages, downscaling excessive negative advantages based on intermediate step correctness and rebalancing positive and negative advantages for more stable modeling of rewards and penalties. AI

IMPACT Improves training stability and performance for search agents, potentially leading to more reliable AI-powered search functionalities.
TOOL · arXiv cs.AI English(EN) · 2w

When PCOS Meets Eating Disorders: An Explainable AI Approach to Detecting the Hidden Triple Burden

Researchers have developed open-source language models to detect a triple burden of polycystic ovary syndrome (PCOS), body image distress, and disordered eating in social media posts. Using a dataset of 1,000 PCOS-related posts, three models (Gemma-2-2B, Qwen3-1.7B, and DeepSeek-R1-Distill-Qwen-1.5B) were fine-tuned with Low-Rank Adaptation to provide explanations and textual evidence. The top-performing model achieved 75.3% accuracy on a held-out set, demonstrating robust comorbidity detection and explainability, though its effectiveness decreases with diagnostic complexity, suggesting its primary use for screening. AI

IMPACT Demonstrates AI's potential for early screening of complex comorbidities in public health data.
TOOL · arXiv cs.AI English(EN) · 2w

Rectified Schr\"odinger Bridge Matching for Few-Step Visual Navigation

Researchers have developed Rectified Schrödinger Bridge Matching (RSBM), a new framework designed to improve visual navigation for autonomous agents in Embodied AI. RSBM leverages a shared velocity-field structure between diffusion models and Schrödinger Bridges, allowing for more stable and efficient integration steps. This method significantly reduces the number of steps required for convergence compared to standard approaches, achieving high success rates with only three integration steps. AI

IMPACT This research could enable faster and more efficient visual navigation for robots, accelerating the development of real-time autonomous systems.
TOOL · arXiv cs.AI English(EN) · 2w

Relational Semantic Reasoning on 3D Scene Graphs for Open World Interactive Object Search

Researchers have developed SCOUT, a new method for open-world interactive object search in household environments that utilizes 3D scene graphs. SCOUT assigns utility scores to objects and locations based on relational exploration heuristics, such as object containment and co-occurrence. To achieve efficiency without sacrificing generalization, the method employs a distillation framework to extract knowledge from large language models into lightweight models for real-time inference. A new benchmark, SymSearch, has also been introduced to evaluate semantic reasoning in this domain. AI

IMPACT This research could lead to more efficient and capable robotic systems for household tasks and open-world interaction.
TOOL · arXiv cs.AI English(EN) · 2w

MathlibLemma: Folklore Lemma Generation and Benchmark for Formal Mathematics

Researchers have developed MathlibLemma, an LLM-powered pipeline designed to automatically discover, formalize, and prove folklore lemmas missing from formal mathematics libraries like Lean. This system has generated over 1,500 verified Lean proofs, with a subset already integrated into Mathlib, demonstrating its ability to meet expert standards. Additionally, a benchmark suite of 4,028 type-checked Lean statements has been created to evaluate AI's role in expanding formal mathematical knowledge. AI
- Mathlib
- MathlibLemma
- Lean
- Xinyu Liu
TOOL · arXiv cs.AI English(EN) · 2w

A Sheaf-Theoretic and Topological Perspective on Complex Network Modeling and Attention Mechanisms in Graph Neural Models

Researchers have developed a new framework using sheaf theory and topology to analyze feature diffusion and aggregation in graph neural models. This approach offers a topological perspective on how node features and edge weights align and spread during training. The proposed multiscale extension, inspired by topological data analysis, aims to capture hierarchical feature interactions, providing deeper insights into graph-based architectures for tasks like node classification and community detection. AI
TOOL · arXiv cs.AI English(EN) · 2w

The Grammar of Transformers: A Systematic Review of Interpretability Research on Syntactic Knowledge in Language Models

A systematic review of 337 articles indicates that Transformer-based language models (TLMs) possess a significant amount of syntactic knowledge. While these models perform well on formal syntactic tasks, their performance is weaker at the syntax-semantics interface and for less digitally supported languages. Despite evidence of syntactic knowledge, current research methods are too varied and observational to fully understand the underlying computational mechanisms, with a heavy concentration on English and BERT-like models. AI

IMPACT Confirms that current LLMs possess substantial syntactic knowledge, though understanding of the underlying mechanisms remains limited.
TOOL · arXiv cs.AI English(EN) · 2w

Teaching and Evaluating LLMs to Reason About Polymer Design Related Tasks

Researchers have developed PolyBench, a comprehensive benchmark dataset and training methodology for large language models (LLMs) focused on polymer design tasks. This dataset, comprising over 125,000 tasks and leveraging a knowledge base of millions of data points, aims to equip LLMs with the specific knowledge and reasoning capabilities needed for polymer science. Experiments demonstrate that smaller language models trained with PolyBench's knowledge-augmented reasoning distillation method can outperform similar-sized models and compete with larger, closed-source LLMs on polymer-related challenges, showing promise for advancing AI in scientific discovery. AI

IMPACT Enhances LLM capabilities in specialized scientific domains like polymer design, potentially accelerating research and discovery.
TOOL · arXiv cs.AI English(EN) · 2w

HGMEM: Hypergraph-based Working Memory to Improve Multi-step RAG for Long-Context Complex Relational Modeling

Researchers have developed HGMem, a novel hypergraph-based working memory system designed to enhance multi-step retrieval-augmented generation (RAG) for large language models. Unlike traditional RAG systems that treat memory as passive storage, HGMem represents memory as a dynamic hypergraph where hyperedges capture complex interrelations between facts. This structure allows for the progressive formation of higher-order interactions, enabling more robust multi-step reasoning and improved global understanding within extended contexts. Experiments show HGMem significantly outperforms existing baseline systems on challenging reasoning benchmarks. AI

IMPACT Enhances LLM reasoning capabilities for complex, long-context tasks by improving information synthesis and relational understanding.
TOOL · arXiv cs.AI English(EN) · 2w

Path Channels and Plan Extension Kernels: a Mechanistic Description of Planning in a Sokoban RNN

Researchers have partially reverse-engineered a convolutional recurrent neural network (RNN) used for the game Sokoban. They discovered that the network stores future moves, or plans, as activations within specific "path channels" in its hidden state. These channels are influenced by convolutional kernels that encode learned transition models, allowing the RNN to construct plans by propagating activations from boxes to goals and using negative values to prune paths at obstacles, effectively implementing a form of backtracking. AI
TOOL · arXiv cs.AI English(EN) · 2w

LiDDA: Data Driven Attribution at LinkedIn

A new paper from LinkedIn researchers details LiDDA, a data-driven attribution system designed for marketing intelligence. This transformer-based approach integrates member-level and aggregate data, along with external factors, to causally attribute conversion credits. The paper outlines its large-scale implementation at LinkedIn and shares insights applicable to the broader marketing and ad tech industries. AI

IMPACT This research offers a novel approach to marketing attribution, potentially improving efficiency and effectiveness in ad tech.
- LinkedIn
- transformer-based attribution
TOOL · arXiv cs.AI English(EN) · 2w

HEART: Achieving Timely Multi-Model Training for Vehicle-Edge-Cloud-Integrated Hierarchical Federated Learning

Researchers have developed a new framework called HEART to address the challenges of multi-model training in Hierarchical Federated Learning (HFL) for vehicle-edge-cloud architectures. This framework aims to minimize global training latency and ensure balanced resource allocation across diverse tasks, which is a complex, NP-hard problem. HEART utilizes a hybrid synchronous-asynchronous aggregation rule and a two-stage approach involving evolutionary algorithms and a greedy method for task scheduling and prioritization. Experiments show HEART outperforms existing methods in dynamic VEC-HFL environments. AI

IMPACT This research could improve the efficiency and speed of AI model training in connected vehicle systems.
TOOL · arXiv cs.AI English(EN) · 2w

FLUID: From Ephemeral IDs to Multimodal Semantic Codes for Industrial-Scale Livestreaming Recommendation

A new research paper introduces FLUID, a framework designed to improve livestreaming recommendation systems by moving away from traditional ID-based methods. FLUID utilizes a multimodal encoder to generate discrete semantic codes (LUCID) for content characterization, addressing the cold-start problem inherent in short-lived livestream IDs. When deployed on industrial-scale recommenders, FLUID demonstrated significant improvements in user engagement metrics. AI

IMPACT Introduces a novel approach to recommender systems that could improve user engagement in live content platforms.
- FLUID
- Zexi Huang
- LUCID
TOOL · arXiv cs.CV English(EN) · 2w

NL-MambaXCT: Self-Supervised Nested-Learning Mamba for Nomex Honeycomb X-ray CT Defect Classification

Researchers have developed NL-MambaXCT, a novel framework utilizing Mamba architecture and self-supervised learning for defect classification in X-ray computed tomography (XCT) images of Nomex honeycomb structures. This approach combines masked image modeling for pre-training on unlabeled data with a Nested Learning formulation, featuring two-timescale parameter dynamics and a deep-momentum optimizer. The model achieved high accuracy and F1 scores, outperforming existing CNN, attention, and single-timescale Mamba baselines, suggesting its potential for efficient and robust industrial inspection in aerospace manufacturing. AI

IMPACT This research offers a more efficient and accurate method for defect detection in critical aerospace components, potentially improving manufacturing quality and safety.
TOOL · arXiv cs.AI English(EN) · 2w

The Future of Facts: Tracing the Factual Generation-Verification Gap

A new research paper explores the gap between how well language models can generate and verify factual information. The study found that models consistently learn to verify facts before they learn to generate them accurately. Furthermore, factual updates can lead to models being in a state where they accept both old and new information as correct, a phenomenon observed across multiple open-source model families and at larger scales. AI
TOOL · arXiv cs.AI English(EN) · 2w

Planning a Community Approach to Diabetes Care in Low- and Middle-Income Countries Using Optimization

Researchers have developed an optimization framework to enhance diabetes care in low- and middle-income countries by personalizing Community Health Worker (CHW) visits. This model considers patient motivation and treatment enrollment to maximize glycemic control at a community level. Applied to data from urban slums in India, the approach demonstrated a potential reduction in fasting blood glucose by up to 25% while optimizing resource allocation and reducing patient dropout rates. AI
TOOL · 36氪 (36Kr) 中文(ZH) · 2w · [2 sources]

Meta to launch AI chatbot subscription service, with monthly fees starting at $7.99

Meta is launching a subscription service for its AI chatbot, offering two tiers: Meta One Plus for $7.99 per month and Meta One Premium for $19.99 per month. The service aims to generate revenue from AI, potentially offsetting significant development costs. The higher tier provides increased usage limits for features like image and video generation, as well as complex reasoning. AI

IMPACT Meta's move into paid AI services could signal a broader trend of monetization for consumer-facing AI tools.
TOOL · Alignment Forum English(EN) · 2w

Eval Cooperativeness May Be a Scalable Mitigation for Eval Gaming

A new paper proposes "eval cooperativeness" as a scalable solution to "eval gaming" in AI models. The authors argue that current behavioral evaluations may become unreliable if AI models develop "eval awareness" and deliberately alter their behavior to appear aligned during testing, a phenomenon known as "eval gaming." Instead of solely focusing on reducing eval awareness, the paper suggests fostering a situational desire in AI models to help developers gather accurate information through evaluations, thereby preserving the predictive power of these tests for real-world deployment behavior. AI

IMPACT This research could lead to more reliable AI safety evaluations, ensuring AI models behave as intended in real-world deployments.
TOOL · 雷峰网 (Leiphone) 中文(ZH) · 2w

ICRA 2026 | CUHK Gao Yuan, Lin Tianlin Team Propose Spontaneous Co-adaptation Strategy: Meta-Learning Empowered Co-evolution of Heterogeneous Multi-Robot Systems

Researchers from the Chinese University of Hong Kong, Shenzhen, have developed a novel framework for heterogeneous multi-robot systems that enables emergent co-adaptive strategies through meta-learning. This system allows different types of robots, such as task execution, supply, and social interaction robots, to autonomously adjust their behaviors based on human crowd states, facilitating bidirectional adaptation between humans and robots. Large-scale experiments in simulated airport environments demonstrated significant improvements in task completion efficiency and crowd guidance, with reduced human burden and increased trust and anthropomorphism towards the robots. AI

IMPACT Enhances human-robot interaction and efficiency in complex environments by enabling robots to adapt to human behavior.
- IEEE国际机器人与自动化会议
- Emergent Co-Adaptive Strategies in Heterogeneous Multi-Robot Systems via Meta-Learning
TOOL · Towards AI English(EN) · 2w

Together AI's OSCAR Killed KV Cache Memory 8x — The First 2-Bit That Doesn't Collapse at 128K

Together AI has released OSCAR, an open-source 2-bit KV cache method that significantly reduces memory usage. Unlike previous 2-bit methods that failed at longer contexts, OSCAR maintains performance up to 128K tokens. This innovation was demonstrated using the Qwen3-8B model, showing an 8x reduction in KV cache memory. AI

IMPACT Reduces memory requirements for large language models, potentially enabling longer context windows and more efficient deployment.
- OSCAR
- Together AI
- KV cache
- Qwen3-8B
TOOL · Towards AI English(EN) · 2w

I Built a Stateful Research Agent Inside a Sandbox. Here’s What the Numbers Actually Looked Like.

A developer explored building a stateful research agent, encountering issues with traditional stateless execution environments that lost context. They found that while stuffing state into prompts or using external stores are common workarounds, they have drawbacks. The developer then experimented with TensorLake, a platform offering named sandboxes with suspend and resume capabilities that preserve the full VM state, including running processes and open browser sessions, enabling more robust agent behavior. AI

IMPACT Enables more robust and persistent AI agent behavior by preserving full VM state, reducing the need for complex state management workarounds.
TOOL · Towards AI Nederlands(NL) · 2w

DeepSeek V4 mHC Explained

DeepSeek V4 is an advanced language model that builds upon its predecessor, DeepSeek V3. The V4 architecture introduces novel components such as Compressed Sparse Attention (CSA), Heavily Compressed Attention (HCA), and Manifold-Constrained Hyper-Connections (mHC). The article focuses on explaining mHC, a technique that enhances the traditional residual connections in neural networks by employing multiple parallel residual streams, leading to more structured and stable training. AI

IMPACT Explains novel architectural components that could influence future large language model designs.
TOOL · arXiv cs.CV English(EN) · 2w

From Per-Image Low-Rank to Encoding Mismatch: Rethinking Feature Distillation in Vision Transformers

Researchers have identified a key issue in feature distillation for Vision Transformers (ViTs), particularly when compressing models. They discovered that while individual images are compressible, the overall dataset exhibits a complex structure with rotating low-rank subspaces. This 'encoding mismatch' means that standard distillation methods fail because the token-level energy distribution across channels doesn't align with the teacher model's architecture. To address this, the paper proposes two simple fixes: 'Lift,' which adds a lightweight projector at inference, and 'WideLast,' which widens the student's final block. These methods significantly improve the performance of compressed ViTs, as demonstrated on ImageNet-1K. AI

IMPACT Offers new techniques to improve the efficiency and performance of Vision Transformer models, crucial for deployment on resource-constrained devices.
TOOL · arXiv cs.CV English(EN) · 2w

Advancing Metallic Surface Defect Detection via Anomaly-Guided Pretraining on a Large Industrial Dataset

Researchers have developed a new pretraining method called Anomaly-Guided Self-Supervised Pretraining (AGSSP) to improve metallic surface defect detection. This approach uses anomaly maps to guide the model's learning, helping it distinguish subtle defects from complex backgrounds. AGSSP involves a two-stage process: first, pretraining the backbone by distilling knowledge from anomaly maps, and second, pretraining the detector with pseudo-defect boxes derived from these maps. Experiments show AGSSP significantly boosts performance, with improvements of up to 10% in [email protected] and 11.4% in [email protected]:0.95 compared to models pretrained on natural image datasets. AI
- Chuni Liu
- ImageNet
- AGSSP
TOOL · arXiv cs.CV English(EN) · 2w

CRoFT: Robust Fine-Tuning with Concurrent Optimization for OOD Generalization and Open-Set OOD Detection

Researchers have developed CRoFT, a novel fine-tuning framework designed to enhance the generalization capabilities of vision-language pre-trained models (VL-PTMs) when encountering out-of-distribution (OOD) data. The method concurrently optimizes for improved generalization to covariate shifts and effective detection of unseen classes, addressing a critical gap in current fine-tuning practices. By minimizing the gradient magnitude of energy scores on training data, CRoFT promotes domain-consistent Hessians of classification loss, a key indicator for OOD generalization. AI

IMPACT Enhances AI model robustness to unseen data, potentially improving real-world deployment reliability.
TOOL · arXiv cs.CV English(EN) · 2w

Mining Attribute Subspaces for Efficient Fine-tuning of 3D Foundation Models

Researchers have developed a new method for efficiently fine-tuning 3D foundation models, addressing the challenges posed by variations in texture, geometry, camera motion, and lighting. The approach involves generating synthetic datasets with controlled variations, fine-tuning LoRA adapters on these datasets to extract distinct, approximately disentangled subspaces for each variation type. Integrating these subspaces results in a reduced LoRA subspace that improves prediction accuracy on downstream tasks, demonstrating generalization to real-world datasets. AI
TOOL · arXiv cs.CV English(EN) · 2w

GS-CLIP: Zero-shot 3D Anomaly Detection by Geometry-Aware Prompt and Synergistic View Representation Learning

Researchers have developed GS-CLIP, a novel framework for zero-shot 3D anomaly detection. This approach addresses limitations in existing methods that struggle with geometric detail loss and incomplete visual understanding by using CLIP. GS-CLIP employs a two-stage learning process that generates text prompts with 3D geometric priors and utilizes a synergistic view representation learning architecture. This architecture processes rendered and depth images in parallel, fusing their features for enhanced anomaly detection. AI

IMPACT Introduces a new method for detecting anomalies in 3D data without prior training, potentially improving applications in manufacturing and medical imaging.
- Zehao Deng
- GS-CLIP
TOOL · arXiv cs.CV English(EN) · 2w

When Brains Disagree: Biological Ambiguity Underlies the Challenge of Amyloid PET Synthesis from Structural MRI

A new research paper explores the challenges in synthesizing amyloid PET scans from structural MRI data for Alzheimer's disease diagnosis. The study posits that the inconsistency in model performance stems from a fundamental biological ambiguity: MRI reflects neurodegeneration while PET measures amyloid pathology, which can be temporally decoupled. This leads to ambiguous one-to-many mappings between MRI patterns and amyloid states, making the synthesis task intrinsically ill-posed. The research demonstrates that while unambiguous mappings can be learned in isolation, performance degrades when data ambiguity is present. Integrating multimodal inputs, such as plasma biomarkers, can resolve this ambiguity, improve performance, and restore stability, suggesting that multimodal integration is key for progress rather than solely architectural complexity. AI

IMPACT Highlights the need for multimodal data integration in AI models for medical diagnostics, moving beyond architectural complexity to address inherent data ambiguities.