Brief

last 24h

[50/8352] 224 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

RESEARCH · arXiv cs.CL English(EN) · 3d · [2 sources]

UniReason-Med: A Shared Grounded Reasoning Interface for 2D-to-3D Transfer in Medical VQA

Researchers have developed UniReason-Med, a novel framework designed to enhance 3D medical visual question answering (VQA) by leveraging supervision from 2D medical images. This system utilizes a shared reasoning interface that can process both 2D images and serialized 3D volumes, generating interleaved textual reasoning and localized visual evidence. The framework was trained on UniMed-CoT, a 220K sample instruction-tuning dataset, and demonstrated that joint 2D and 3D grounded supervision significantly improves 3D reasoning capabilities compared to 3D-only training. AI

IMPACT This research could lead to more accurate diagnostic tools by improving the ability of AI to reason about 3D medical data.
- UniMed-CoT
- UniReason-Med
RESEARCH · arXiv cs.CL English(EN) · 3d · [2 sources]

Substrate Asymmetry in User-Side Memory: A Diagnostic Framework

Researchers have developed a new diagnostic framework to analyze user-side memory in large language models, revealing that personalization capabilities are not a single metric but rather factor into distinct axes: behavioral consistency, factual presence, and factual absence. Their findings indicate that different memory substrates excel at different axes, with parametric memory (gamma-LoRA) favoring style and retrieval-based methods (RAG) excelling at factual absence. The study also identified an "alignment tax" on parametric user-memory in heavily RLHF-tuned models and proposed that substrate selection is a question-classification task rather than calibration. AI

IMPACT This research could lead to more nuanced evaluation of LLM personalization and improved memory systems by highlighting specific failure modes.
FRONTIER RELEASE · Google DeepMind English(EN) · 5d · [20 sources]

Introducing Gemma 4 12B: a unified, encoder-free multimodal model

Google DeepMind has released Gemma 4 12B, a new multimodal model designed for local execution on laptops with 16GB of VRAM. This model features a novel unified architecture that integrates audio and vision inputs directly into the LLM backbone without separate encoders, reducing latency and memory usage. Gemma 4 12B aims to bring advanced agentic multimodal capabilities to everyday hardware, with performance nearing its larger 26B MoE counterpart and broad developer support through open licensing and integration with popular tools. AI

IMPACT This release brings advanced multimodal capabilities to consumer hardware, potentially accelerating local AI agent development and use.
RESEARCH · arXiv cs.CL English(EN) · 3d · [2 sources]

UR-BERT: Scaling Text Encoders for Massively Multilingual TTS Through Universal Romanization and Speech Token Prediction

Researchers have developed UR-BERT, a novel text encoder designed to significantly expand the capabilities of massively multilingual text-to-speech (TTS) systems. Unlike traditional methods limited by grapheme-to-phoneme resources, UR-BERT unifies diverse writing systems into a common Romanization format, enabling support for 495 languages. The system also incorporates a speech token prediction objective to improve phonetic accuracy and text-speech alignment, demonstrating superior performance over existing baselines and strong generalization to new languages. AI

IMPACT Expands the reach of TTS technology to hundreds of new languages, potentially democratizing voice synthesis.
SIGNIFICANT · Medium — Claude tag English(EN) · 3d · [2 sources]

Anthropic Dropped the Most Powerful AI Ever Made.

Anthropic has released its latest AI model, Claude. The announcement was made across multiple sources, with some describing it as the company's most powerful AI yet. Early impressions suggest it represents a significant advancement in AI capabilities. AI

IMPACT Anthropic's latest Claude model release sets a new benchmark for AI capabilities.
FRONTIER RELEASE · Mastodon — fosstodon.org English(EN) · 4d · [58 sources]

https://www. anthropic.com/news/claude-fabl e-5-mythos-5 New Anthropic models! # AI # Claude # Anthropic

Anthropic has released Claude Fable 5, a version of its advanced Mythos-class AI model, to general users. This release incorporates enhanced safety features and routing mechanisms to manage responses, particularly in sensitive areas like cybersecurity and biology. While Fable 5 offers powerful AI capabilities, its pricing is positioned higher than previous models, with a limited-time free access window for subscribers. AI

IMPACT This release brings advanced AI capabilities with integrated safety measures to a wider user base, potentially influencing how AI is deployed in sensitive domains.
SIGNIFICANT · Medium — Claude tag English(EN) · 2d

I’ve Been Following Claude Since Day One. Yesterday Was Different.

Anthropic has released its most advanced model to date, Claude 4. The author, who has followed Claude since its inception, found the new model's capabilities surprising. This release marks a significant advancement for Anthropic's AI offerings. AI

IMPACT Sets a new benchmark for advanced AI models, potentially influencing future development and competition in the LLM space.
- Anthropic
- Claude 4
RESEARCH · Hugging Face Daily Papers English(EN) · 3d · [3 sources]

TextHOI-3D: Text-to-3D Hand-Object Interaction via Discrete Multi-View Generation and Joint Mesh Optimization

Researchers have developed TextHOI-3D, a novel framework for generating 3D hand-object interactions from text descriptions. This staged approach uses generated multi-view observations as an intermediate representation, bridging text-conditioned visual generation with geometry-aware recovery. The system significantly improves accuracy in object contact and reduces penetration volume compared to single-view methods, demonstrating the effectiveness of discrete multi-view tokens for this complex 3D generation task. AI

IMPACT Advances text-to-3D generation for complex interactions, potentially impacting virtual reality and content creation.
- TextHOI-3D
- HO3D
SIGNIFICANT · 36氪 (36Kr) 中文(ZH) · 3d · [2 sources]

A-share major indices collectively fall at midday break, over 4,300 stocks in the market turn red

Anthropic has released its most powerful model to date, Claude Fable 5, which is reportedly capable of completing tasks in a fraction of the time previously required. This new model is positioned as a significant advancement in AI capabilities, with its pricing described as exceptionally high. The release is part of a broader initiative by AI companies to push the boundaries of what is possible with artificial intelligence. AI

IMPACT Sets a new benchmark for AI model performance and pricing, potentially influencing future development and market strategies.
SIGNIFICANT · 36氪 (36Kr) 中文(ZH) · 3d · [2 sources]

Spot gold intraday decline expanded to 2%

Anthropic has released its latest AI model, Claude Fable 5, which is being touted as its most powerful to date. The company has also issued an apology, though the specific reason for the apology is not detailed in the provided snippets. Separately, Meta and Reliance Industries are collaborating to build an AI data center in India. AI

IMPACT Anthropic's release of Claude Fable 5 sets a new benchmark for model capabilities, potentially intensifying competition in the frontier AI space.
- Meta
- Claude Fable 5
- Anthropic
- 36Kr
SIGNIFICANT · One Useful Thing (Ethan Mollick) English(EN) · 3d · [2 sources]

What it feels like to work with Mythos

Ethan Mollick, an AI researcher, has tested Anthropic's new Claude 5 Fable model, describing it as a significant leap beyond previous AI capabilities. He found Fable to be exceptionally proficient across a wide range of tasks, from generating complex academic papers to creating intricate games and detailed maps, often with minimal prompting. Mollick highlights a shift in the user-AI relationship, noting that the model's advanced performance is both delightful and unnerving due to its autonomous execution of complex requests. AI

IMPACT Sets a new benchmark for complex task execution and suggests a fundamental shift in human-AI interaction.
TOOL · arXiv cs.LG English(EN) · 2d

Time-multiplexed layer reuse for physical neural networks

Researchers have developed a novel architecture called TIDAL-Net for physical neural networks (PNNs) to address their limited scale compared to digital neural networks. This new design reuses layers over time, effectively increasing the network's depth without a proportional increase in hardware cost. Experiments demonstrate that TIDAL-Net enhances performance on image classification and natural language processing tasks with minimal changes to existing PNN prototypes. AI

IMPACT Enhances physical neural network capabilities, potentially enabling more complex tasks on specialized hardware.
- Kohei Tsuchiyama
TOOL · arXiv cs.LG English(EN) · 2d

OmniLoc: A Geometry-Aware Foundation Model for Anchor-Free UE Localization Across Diverse Indoor Environments

Researchers have introduced OmniLoc, a novel foundation model designed for anchor-free localization within diverse indoor environments using wireless measurements. This model addresses the challenges posed by varying building geometries and signal heterogeneities that hinder existing methods. OmniLoc employs a unified input tokenization, a geometry-aware Transformer for feature extraction, and a location estimation module that leverages geometric embeddings for consistent predictions. Evaluations on in-house and public datasets demonstrate OmniLoc's superior performance and generalization capabilities compared to current approaches. AI

IMPACT This model could improve the accuracy and robustness of indoor positioning systems by leveraging existing wireless infrastructure.
- arXiv
- OmniLoc
TOOL · arXiv stat.ML English(EN) · 2d

Projected random forests and conformal prediction of circular data

Researchers have developed a new method for predicting outcomes in regression problems involving circular data, such as time of day or direction. This approach utilizes conformal prediction techniques to generate prediction sets with guaranteed coverage and adaptive arc lengths. By projecting existing linear-response regression models onto a circular space, the method can leverage high-performance models designed for linear data. AI

IMPACT Introduces a novel statistical technique for handling circular data in machine learning predictions.
- Paulo C. Marques F.
TOOL · arXiv cs.AI English(EN) · 2d

Forecasting Future Behavior as a Learning Task

Researchers have developed a new method for predicting the behavior of large reasoning models (LRMs) by training specialized "Behavior Forecasters." These forecasters learn directly from a model's reasoning trajectory, bypassing the need for traditional explanations. The approach proved more accurate than existing models like GPT-5.4 and Claude Opus-4.6 in predicting answer repetition and the impact of input changes, while also being more cost-efficient. AI

IMPACT This approach could lead to more reliable AI systems by enabling better prediction of their behavior without complex, potentially inaccurate, explanations.
TOOL · arXiv cs.AI English(EN) · 2d

PRInTS: Reward Modeling for Long-Horizon Information Seeking

Researchers have developed PRInTS, a new generative reward model designed to improve AI agents' ability to seek information over long periods. Unlike previous models that offered binary judgments on short tasks, PRInTS provides dense, multi-dimensional scoring for each step, considering factors like tool interpretation and output informativeness. It also compresses long contexts into summaries while retaining essential information for evaluation. Experiments on benchmarks like FRAMES and GAIA show that PRInTS significantly enhances information-seeking capabilities in various agents, even outperforming larger, frontier models. AI

IMPACT Enhances AI agent capabilities in complex, multi-step information gathering, potentially improving performance in tasks requiring extensive tool use and reasoning.
- FRAMES
- AI agents
- Jaewoo Lee
- PRInTS
- WebWalkerQA
TOOL · arXiv cs.AI English(EN) · 2d

KAN-MLP-Mixer: A comprehensive investigation of the usage of Kolmogorov-Arnold Networks (KANs) for improving IMU-based Human Activity Recognition

Researchers have developed a hybrid neural network architecture, KAN-MLP-Mixer, that combines the precision of Kolmogorov-Arnold Networks (KANs) with the noise robustness and efficiency of Multi-Layer Perceptrons (MLPs). This approach strategically integrates KAN modules for input embedding and classification, while utilizing MLPs for intermediate feature mixing. Tested across eight public datasets, the KAN-MLP model demonstrated a 5.33% average improvement in macro F1 score over pure-MLP models, significantly outperforming standalone KAN and MLP baselines. AI

IMPACT This hybrid architecture offers improved accuracy and robustness for human activity recognition tasks using wearable sensors.
TOOL · arXiv cs.AI English(EN) · 2d

Improving Detection of Rare Nodes in Hierarchical Multi-Label Learning

Researchers have developed a new weighted loss objective for neural networks to improve the detection of rare nodes in hierarchical multi-label learning. This approach combines node-wise imbalance weighting with focal weighting components, which leverage ensemble uncertainties. The method aims to address the challenge of fine-grained classifications by emphasizing rare nodes and focusing on uncertain nodes during training. Experiments on benchmark datasets showed improvements in recall by up to five times and statistically significant gains in F1 score. AI

IMPACT Enhances model performance on fine-grained classification tasks by improving the detection of rare categories.
- arXiv
- Hugging Face
TOOL · arXiv cs.AI English(EN) · 2d

The Latent Color Subspace: Emergent Order in High-Dimensional Chaos

Researchers have identified a "Latent Color Subspace" within the latent space of the FLUX.1 text-to-image model. This subspace reflects Hue, Saturation, and Lightness, offering a new understanding of how semantic information is encoded in image generation. The team demonstrated that this subspace can be used to predict and control image colors without requiring additional training, solely through manipulation of the latent space. AI

IMPACT Identifies a new method for fine-grained control over text-to-image generation, potentially improving artistic and design applications.
- Mateusz Pach
TOOL · arXiv cs.CL (CA) · 2d

LifeSentence: Language models can encode human life course trajectories from longitudinal panel data

Researchers have developed LifeSentence, a novel model that adapts large language models for analyzing longitudinal human life course data. This model can encode complex life trajectories from limited panel data by treating life events as natural-language records. LifeSentence demonstrates superior performance compared to traditional statistical methods and other deep learning approaches, showing significant improvements in predicting events and their timing, as well as reconstructing chronological order. Notably, it can identify social stratification patterns like the gender wage gap and motherhood penalty without explicit supervision, offering new avenues for biographical research and counterfactual exploration. AI

IMPACT Enables deeper analysis of human life trajectories from limited data, potentially improving social science research and personalized interventions.
TOOL · arXiv cs.CL English(EN) · 2d

ProHiFlo: Hierarchical Flow Matching with Functional Guidance for De Novo Protein Generation

Researchers have developed ProHiFlo, a new hierarchical flow matching framework for de novo protein generation. This method improves efficiency and accuracy by modeling backbone geometry before refining to all-atom coordinates. ProHiFlo also incorporates functional guidance using pretrained predictors to steer generation toward desired properties without retraining. Experiments show state-of-the-art performance, including a higher success rate in enzyme active site scaffolding compared to existing methods. AI

IMPACT Introduces a more efficient and targeted approach to protein design, potentially accelerating therapeutic and enzyme engineering.
- arXiv
- ProHiFlo
TOOL · arXiv cs.CL English(EN) · 2d

Vector Quantized Latent Concepts: A Scalable Alternative to Clustering-Based Concept Discovery

Researchers have introduced Vector Quantized Latent Concept (VQLC), a new framework for interpreting large language models by extracting latent concepts from their hidden states. This method aims to overcome the limitations of existing clustering techniques, which either scale poorly or produce less coherent concepts. VQLC offers a computationally efficient and scalable alternative that demonstrates competitive faithfulness and interpretability, particularly for decoder-only models. AI

IMPACT Provides a more scalable and interpretable method for understanding LLM internal representations.
TOOL · arXiv cs.CL English(EN) · 2d

Cross-modal Consistency Guidance for Robust Emotion Control in Auto-Regressive TTS Models

Researchers have developed a new method called Cross-modal Consistency Guided Classifier-Free Guidance (CCG-CFG) to improve emotion control in auto-regressive Text-to-Speech (TTS) models. This technique dynamically adjusts guidance scales based on the conflict between textual and desired speech emotions, enhancing emotional alignment. When applied to the CosyVoice2 model, this approach led to significant improvements in emotion recognition accuracy and subjective quality scores, outperforming existing methods like HierSpeech++ and Qwen3-TTS. AI

IMPACT Enhances TTS expressiveness and accuracy, potentially leading to more natural and emotionally resonant AI-generated speech.
TOOL · arXiv cs.LG English(EN) · 2d

DeepRHP: A Hybrid Variational Autoencoder for Designing Random Heteropolymers as Protein Mimics

Researchers have developed DeepRHP, a hybrid variational autoencoder designed to aid in the creation of synthetic random heteropolymers that can mimic protein functions. This model uses a semi-supervised framework, incorporating both chemical features and sequence patterns within its latent space. DeepRHP's effectiveness was demonstrated by successfully predicting monomer compositions that stabilize membrane proteins, with predictions validated against existing research. AI

IMPACT This AI model could accelerate the design of novel biomaterials and protein-like structures for various applications.
TOOL · arXiv cs.LG English(EN) · 2d

Probabilistic Contrastive Pretraining for Multi-task ADME Property Prediction

Researchers have developed a new pretraining framework called Probabilistic Contrastive Pretraining (PCP) to enhance the prediction of ADME properties crucial for drug discovery. This method combines chemistry-specific self-supervision with contrastive mutual information learning, encoding molecular graphs into latent variables and reconstructing SMILES strings. The framework integrates reconstruction, contrastive discrimination, and chemistry-specific tasks into a single probabilistic objective, showing significant improvements over existing baselines on multiple datasets. AI

IMPACT Enhances AI's utility in accelerating drug discovery pipelines by improving prediction accuracy for critical molecular properties.
TOOL · arXiv cs.LG English(EN) · 2d

Beyond the Golden Teacher: Enhancing Graph Learning through LLM-GNN Co-teaching

Researchers have developed a new method called LLM-GNN Co-Teaching to improve few-shot graph learning. This approach avoids designating one model as a "golden teacher," instead allowing a Graph Neural Network (GNN) and a Large Language Model (LLM) to learn collaboratively. The models exchange confident pseudo-labels and update each other, with supervision derived from their agreement over time. This co-teaching framework consistently outperforms previous methods on six benchmarks, showing significant gains in accuracy for tasks like node classification. AI

IMPACT Enhances few-shot learning capabilities for graph-based AI systems, potentially improving performance in areas like recommendation engines and social network analysis.
TOOL · arXiv cs.LG English(EN) · 2d

SPADE: Split-and-Delay Embeddings for Autoregressive High-Granularity Calorimeter Simulation

Researchers have developed SPADE, a novel autoregressive transformer model designed for simulating high-granularity calorimeter data in particle physics. Unlike previous methods that embed multiple features jointly, SPADE embeds them independently and introduces a delay between feature streams. This approach allows the standard self-attention mechanism to learn intra-token correlations effectively. SPADE demonstrates competitive performance against existing models for photon shower generation in the ILD detector and offers a new pathway for applying LLM-style pretraining to complex, multi-feature datasets. AI

IMPACT Introduces a new transformer architecture applicable to complex scientific simulation, potentially enabling LLM-style pretraining for high-dimensional data.
TOOL · arXiv cs.CV English(EN) · 2d

PT-WNO: Point Transformer with Wavelet Neural Operator for 3D Point Cloud Semantic Segmentation

Researchers have developed PT-WNO, a novel architecture for 3D point cloud semantic segmentation that enhances global context understanding. The model integrates a Wavelet Neural Operator (WNO) alongside a point cloud transformer backbone. This WNO branch captures multi-scale global spectral context through wavelet decomposition and reconstruction, complementing existing skip connections. Experiments show PT-WNO improves performance on benchmarks like S3DIS and DALES. AI

IMPACT Enhances 3D point cloud understanding, potentially improving applications in robotics, autonomous driving, and augmented reality.
TOOL · arXiv cs.CV English(EN) · 2d

How Auxiliary Reasoning Unleashes GUI Grounding in VLMs

Researchers have developed three zero-shot auxiliary reasoning methods to improve the ability of vision-language models (VLMs) to ground themselves within graphical user interfaces (GUIs). These methods involve providing explicit spatial cues like axes, grids, and labeled intersections within the input image, enabling VLMs to better articulate their implicit spatial understanding without costly fine-tuning. Experiments across four GUI grounding benchmarks and seven VLMs demonstrated significant performance gains, with one method, Mark-Grid Scaffold, boosting Gemini-3.1-Pro's accuracy on ScreenSpot-v2 from 11.72% to 95.20% and achieving state-of-the-art results on ScreenSpot. AI

IMPACT Enhances VLM capabilities for GUI interaction, potentially accelerating the development of autonomous agents.
SIGNIFICANT · 36氪 (36Kr) 中文(ZH) · 3d · [2 sources]

Tencent Cloud and ASUS Computer Reach Strategic Cooperation

Anthropic has released its most powerful model to date, Claude Fable 5, which is reportedly capable of handling complex tasks. However, the company advises caution for general users due to its advanced capabilities. This release marks a significant step in Anthropic's development of cutting-edge AI. AI

IMPACT Sets a new benchmark for AI model capabilities, potentially influencing future development and applications in the field.
- Anthropic
- Claude Fable 5
RESEARCH · arXiv cs.CL English(EN) · 3d · [2 sources]

Multi-Agent Reasoning with Adaptive Worker Allocation for Stance Detection

Researchers have developed a multi-agent reasoning framework for stance detection, which aims to improve accuracy by synthesizing explanations from multiple AI agents rather than relying on simple label aggregation. This Manager-Worker architecture adaptively assigns agents based on input complexity, with each worker providing a reasoning-only analysis. The framework demonstrated significant gains on challenging implicit and context-dependent stance detection tasks, achieving high Macro-F1 scores on datasets like COVID-19 Stance and SemEval-2016. AI

IMPACT Enhances LLM capabilities in nuanced text analysis, potentially improving applications requiring understanding of authorial intent.
SIGNIFICANT · 36氪 (36Kr) 中文(ZH) · 3d · [2 sources]

Shenzhen Component Index and ChiNext Index both fell more than 2%

ByteDance's AI pharmaceutical division is initiating a spin-off for fundraising, signaling a move towards industrialization in the AI for Science (AI4S) sector. Separately, Anthropic has released its latest and most powerful model, Claude Fable 5, which is priced exceptionally high. In other news, the 'Qingyu-11' liquid oxygen methane engine, developed by Dahang Yueqian, has successfully completed several key component tests and is progressing to full system trials. AI

IMPACT ByteDance's AI pharma spin-off signals industrialization in AI4S, while Anthropic's high-priced Claude Fable 5 may set new benchmarks for advanced models.
RESEARCH · arXiv cs.CL English(EN) · 3d · [3 sources]

Lius: Translation Model Based Instructional Lingustic Using Continual Instruction Tuning In Kupang Malay

Researchers have developed a new translation model named Lius, specifically designed to improve translation for low-resource languages like Kupang Malay. The model utilizes a novel Continual Instruction Tuning (CIT) method, which iteratively trains the model with various instruction types. This approach significantly outperforms standard instruction-tuned models and existing Neural Machine Translation (NMT) and multilingual LLM models, demonstrating a promising way to overcome the limitations of scarce parallel data. AI

IMPACT Enhances translation capabilities for underrepresented languages, potentially enabling wider access to information and communication.
TOOL · arXiv cs.CL English(EN) · 2d

Neuron-based Personality Trait Induction in Large Language Models

Researchers have developed a novel method to imbue large language models with specific personality traits without requiring model retraining. This approach involves identifying key neurons within the LLM that correlate with personality dimensions, based on the Big Five personality traits framework. By manipulating these identified neurons, the system can induce desired personality characteristics in the model's output, demonstrating comparable effectiveness to fine-tuned models but with greater efficiency and flexibility. AI

IMPACT Enables more nuanced and controllable AI interactions by allowing specific personality traits to be induced in LLMs without extensive retraining.
TOOL · arXiv cs.AI English(EN) · 2d

TouchThinker: Scaling Tactile Commonsense Reasoning to the Open World with Large-scale Data and Action-aware Representation

Researchers have introduced TouchThinker, a new framework designed to enhance tactile commonsense reasoning for embodied agents. This system addresses limitations in existing datasets and representation methods by introducing a million-scale dataset, TouchThinker-1M, covering 415 objects and various scenarios. Additionally, it incorporates an action-aware modeling mechanism to improve the efficiency and semantic expressiveness of tactile representations, enabling better open-world generalization. AI

IMPACT Enhances embodied agents' ability to interact with and understand the physical world through touch.
TOOL · arXiv cs.AI English(EN) · 2d

Physics-Distilled Neural Network enabled by Large Language Models for Manufacturing Process-Property Predictive Modeling

Researchers have developed a new knowledge distillation framework that uses Large Language Models (LLMs) to extract physics principles from scientific literature. This framework creates a 'teacher' model that imbues a 'student' model with predictive capabilities for manufacturing processes, even with limited data. The resulting student model is lightweight, capable of high-frequency inference for real-time deployment, and shows robustness even when the LLM-derived physics knowledge is imperfect. AI

IMPACT This framework could enable more accurate and efficient AI-driven predictive modeling in manufacturing, especially in data-scarce environments.
- Large Language Models
- Physics-Distilled Neural Network
TOOL · arXiv cs.AI English(EN) · 2d

MLaGA: Multimodal Large Language and Graph Assistant

Researchers have developed MLaGA, a novel model designed to enhance Large Language Models' (LLMs) ability to process and reason over multimodal graphs. This system addresses the challenge of graphs containing diverse attribute types, such as text and images, which have been underexplored by existing LLM-based graph methods. MLaGA employs a structure-aware multimodal encoder and a multimodal instruction-tuning approach to integrate these varied attributes and graph structures into LLMs. AI

IMPACT Enables LLMs to analyze complex graphs with mixed text and image data, potentially improving applications in areas like knowledge discovery and recommendation systems.
TOOL · arXiv cs.AI English(EN) · 2d

A New Perspective on Precision and Recall for Generative Models

Researchers have introduced a novel framework for estimating precision and recall curves in generative models, moving beyond single scalar metrics. This approach frames the estimation as a binary classification problem, offering a more detailed analysis of model performance. The framework also provides a minimax upper bound on estimation risk and unifies several existing precision-recall metrics. AI

IMPACT Provides a more nuanced evaluation method for generative models, potentially leading to better model development and comparison.
- Benjamin Sykes
TOOL · arXiv cs.AI English(EN) · 2d

Diffusing to Coordinate: Efficient Online Multi-Agent Diffusion Policies

Researchers have introduced OMAD, a novel framework for online multi-agent reinforcement learning (MARL) that utilizes diffusion policies to enhance agent coordination. This approach addresses the challenge of intractable likelihoods in diffusion models, which typically hinder exploration in online MARL settings. OMAD employs a relaxed policy objective that maximizes scaled joint entropy and a joint distributional value function for decentralized policy optimization, leading to significant improvements in sample efficiency. AI

IMPACT Introduces a novel approach to multi-agent reinforcement learning, potentially improving coordination and sample efficiency in complex AI systems.
- Zhuoran Li
TOOL · arXiv cs.AI English(EN) · 2d

Unifying Learning Dynamics and Generalization in Transformers Scaling Law

Researchers have developed a theoretical framework to unify the understanding of learning dynamics and generalization in transformer models. This work formalizes transformer training as an ordinary differential equation system, approximating it to kernel behaviors. The analysis reveals a two-stage scaling law for generalization error, with an initial exponential decay followed by a power-law decay after a resource threshold is met, proving this two-stage law to be tight. AI

IMPACT Provides a theoretical foundation for understanding and predicting transformer performance as resources scale.
- Transformer
- Chiwun Yang
TOOL · arXiv cs.AI English(EN) · 2d

Neural FOXP2 -- Language Specific Neuron Steering for Targeted Language Improvement in LLMs

Researchers have developed a new technique called Neural FOXP2 to improve the performance of large language models in non-English languages. This method works by identifying and steering "language neurons" within the model, which are responsible for controlling language defaultness. The process involves localizing these neurons, defining steering directions, and then applying targeted activation shifts to make languages like Hindi or Spanish primary, thereby reducing English dominance. AI

IMPACT Enables more equitable performance across languages in LLMs, reducing English bias.
- LLMs
- Vinija Jain
TOOL · arXiv cs.CL English(EN) · 2d

LatticeBridge: Rare-Event Sequential Inference for Faithful Structured Sequence Synthesis

Researchers have developed LatticeBridge, a novel method for structured sequence generation that addresses the challenge of satisfying multiple input-derived constraints within a single output. This approach frames the problem as a rare-event sequential inference task, combining a prefix language model with instance-compiled surface automata and a specialized Monte Carlo decoder. LatticeBridge aims to improve the faithfulness of generated sequences by ensuring all required anchors are jointly realized, outperforming baseline methods on benchmarks like CommonGen and WikiBio. AI

IMPACT Enhances faithfulness in structured sequence generation, potentially improving applications requiring precise output constraints.
TOOL · arXiv cs.CV English(EN) · 2d

VL-DINO: Leveraging CLIP Vision-Language Knowledge for Open-Vocabulary Object Detectio

Researchers have developed VL-DINO, a new object detection model that effectively integrates knowledge from CLIP, a vision-language model. The model uses novel modules to construct better training samples and fuse visual and textual information. In zero-shot tests on the LVIS benchmark, VL-DINO achieved state-of-the-art results, outperforming previous methods. AI

IMPACT Sets new SOTA on zero-shot object detection benchmarks, potentially improving image analysis capabilities.
- LVIS benchmark
- VL-DINO
TOOL · arXiv cs.CL English(EN) · 2d

Gumbel-BEARD: Automatic Layer Selection for Self-Supervised Adaptation of Whisper in Low-Resource Domains

Researchers have developed Gumbel-BEARD, a novel framework designed to improve the performance of speech foundation models in low-resource domains. This method automates the selection of Whisper encoder layers using a trainable Gumbel-Softmax selector and a self-supervised adaptation objective. Experiments show that Gumbel-BEARD can match fully supervised baselines with significantly less labeled data and establishes new state-of-the-art word error rates on challenging datasets like MyST and CORAAL. AI

IMPACT Enhances speech model performance in low-resource settings, potentially broadening AI accessibility for diverse linguistic communities.
- OGI Spontaneous
- MyST
- Whisper
- Gumbel-BEARD
- CORAAL
TOOL · arXiv cs.AI English(EN) · 2d

Lung-R1: A Knowledge Graph-Guided LLM for Pulmonary Diagnostic Reasoning

Researchers have developed Lung-R1, a novel large language model designed for pulmonary disease diagnosis. This model is guided by LungKG, a comprehensive knowledge graph containing over 59,000 nodes and 164,000 edges related to pulmonary medicine. Lung-R1 demonstrated state-of-the-art performance in a 20-system evaluation, particularly in EMR diagnosis, outperforming previous baselines. AI

IMPACT This model's knowledge graph integration could improve diagnostic accuracy for complex diseases by enhancing LLM reasoning capabilities.
- LungKG
- Lung-R1
- LLM
TOOL · arXiv cs.LG English(EN) · 2d

LakeFM: Toward a Foundation Model for Aquatic Ecosystems Using Irregular Multivariate Multi-depth Time Series Data

Researchers have developed LakeFM, a new foundation model designed to understand and forecast aquatic ecosystem dynamics. Unlike previous models, LakeFM can handle irregular time series data and generalize across lakes with varying characteristics. It was pre-trained on a large dataset of simulated and observed lakes, demonstrating strong forecasting performance and the ability to produce physically plausible predictions. AI

IMPACT Enables more accurate forecasting of lake dynamics and water quality monitoring.
- LakeFM
TOOL · arXiv cs.LG English(EN) · 2d

GLACIER: A Multimodal Student-Teacher Foundation Model for Molecular Property Prediction

Researchers have developed GLACIER, a novel student-teacher framework designed for molecular property prediction. This model integrates multiple data types, including molecular graphs, SMILES strings, and physicochemical descriptors, to create more robust and efficient molecular embeddings. By employing a three-stage process involving pretraining student encoders, fusing modalities with a Finsler geometry-aware module, and distilling knowledge from larger teacher models, GLACIER achieves high predictive performance while reducing computational burden. AI

IMPACT Introduces a more efficient framework for molecular property prediction, potentially accelerating drug discovery and materials science research.
- MolFormer
- GLACIER
TOOL · arXiv cs.LG English(EN) · 2d

Scaling Laws of Global Weather Models

A new research paper explores the scaling laws of data-driven global weather models, analyzing how performance relates to model size, dataset size, and compute budget. The study found that weather models favor wider architectures over deeper ones and that increasing training data yields greater performance gains than increasing model size under fixed compute budgets. Specifically, the Aurora model showed strong data-scaling behavior, with a 10x increase in training data leading to a 3.2x reduction in validation loss. AI

IMPACT Provides insights into optimizing AI model development for weather forecasting, suggesting wider architectures and larger datasets are key.
TOOL · arXiv cs.LG English(EN) · 2d

Why Depth Matters in Parallelizable Sequence Models: A Lie Algebraic View

Researchers have developed a Lie-algebraic framework to analyze the expressivity and error bounds of parallelizable sequence models like Transformers. Their theory establishes a direct link between a model's depth and its expressivity, showing that increasing depth exponentially reduces approximation error. This theoretical insight was validated through experiments on symbolic and continuous-valued state-tracking tasks, confirming the empirical performance of deep sequence models. AI

IMPACT Provides a theoretical foundation for understanding and improving the performance of deep sequence models.
- Transformer
- Gyuryang Heo
TOOL · arXiv cs.CV English(EN) · 2d

Lighting-aware Unified Model for Instance Segmentation

Researchers have developed a new adapter module called Lighting Convolutional-Attention (LCA) to improve the robustness of foundation models like SAM for instance segmentation under varied lighting conditions. LCA processes RGB features alongside contrast maps to distinguish structural changes from illumination artifacts, enhancing segmentation accuracy without needing to fine-tune the entire model. The module is trained using a pairwise strategy with a specific loss term to penalize discrepancies between clean and illuminated images, and its effectiveness is validated on existing benchmarks and a new synthetic dataset designed for complex lighting. AI

IMPACT Enhances robustness of foundation models for instance segmentation, potentially improving real-world AI applications in computer vision.