Brief

last 24h

[50/8388] 224 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · arXiv cs.AI English(EN) · 3d

Tractogram foundation model

Researchers have developed TractFM, a novel foundation model designed to learn representations directly from diffusion MRI tractograms. This model uniquely combines a local streamline encoder with a permutation-equivariant tractogram encoder, enabling it to process all streamlines from a subject simultaneously. By pretraining on anatomical parcellation, TractFM generates reusable embeddings for both individual streamlines and compact subject-level descriptors. The model demonstrates strong generalization capabilities, achieving accurate tract parcellation and predicting subject phenotypes like age and sex across different tractography algorithms and datasets. AI

IMPACT Enables more robust and generalizable analysis of brain white-matter pathways, potentially improving diagnostic and research capabilities in neuroscience.
- Human Brain
- TractFM
SIGNIFICANT · Hugging Face Blog English(EN) · 4d

Introducing North Mini Code: Cohere’s First Model For Developers

Cohere has released North Mini Code, a new 30 billion parameter Mixture-of-Experts model with 3 billion active parameters, designed for agentic software engineering tasks. This model is the first in Cohere's new family of models and is available under the Apache 2.0 license on Hugging Face. Benchmarks indicate North Mini Code performs competitively against other open-source coding models of similar size, and even surpasses some larger models on coding benchmarks. AI

IMPACT Sets a new benchmark for open-source coding models, potentially accelerating agentic software development.
RESEARCH · Hugging Face Daily Papers English(EN) · 4d · [4 sources]

Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models

A new approach called Dexterity-BEV is being introduced to address the data challenges in embodied intelligence by adapting the Bird's-Eye View (BEV) methodology from autonomous driving. This method aims to unify heterogeneous robot data, including visual inputs, sensor readings, and action commands, into a common spatial reference frame. This unified representation is intended to enable more scalable and transferable training for robots, moving beyond simple data aggregation to establishing a foundational data infrastructure for embodied AI. AI

IMPACT New frameworks like Dexterity-BEV and Embodied-R1.5 aim to standardize robot data and improve generalization, potentially accelerating the development of more capable and adaptable embodied AI systems.
RESEARCH · arXiv cs.LG English(EN) · 5d · [2 sources]

A Unifying Framework for Concept-Based Representational Similarity

Researchers have introduced a new framework to unify and clarify concept-based representational similarity in machine learning models. The framework decomposes alignment into representation vs. concept and instance-wise vs. distributional levels, identifying four key properties. They also developed an intervention-based benchmark called \InterVenchA to measure these properties and proposed the Coupled Sparse Autoencoder (CoSAE) method, which demonstrates that strong alignment emerges when multiple objectives are jointly enforced, even with minimal paired data. AI

IMPACT Clarifies concept alignment in ML, potentially leading to more robust and interpretable models.
RESEARCH · arXiv cs.AI English(EN) · 5d · [2 sources]

Do Video Foundation Models Understand Intuitive Physics? A Layerwise Probing Analysis

A new research paper investigates whether video foundation models possess an understanding of intuitive physics. The study probes frozen representations of models like V-JEPA, VideoMAE, and LTX-Video using benchmarks such as IntPhys2 and Minimal Video Pairs. Results indicate that V-JEPA performs best, particularly with temporal dynamics probes, while VideoMAE is competitive, and LTX-Video shows weaker but present signals. The research also found that physics knowledge is more accessible in intermediate to late layers of these models. AI

IMPACT Reveals emergent physics understanding in video models, potentially improving their real-world interaction capabilities.
RESEARCH · arXiv cs.AI English(EN) · 5d · [2 sources]

Next-Token Prediction Learns Generalisable Representations of Sleep Physiology

Researchers have developed Hypnos, a new foundation model for sleep physiology that utilizes next-token prediction for representation learning. Trained on eight different sensing modalities from over 20,000 polysomnography recordings, Hypnos tokenizes physiological signals and uses an auto-regressive RQ-Transformer to predict future data points. This approach significantly outperforms existing models on various benchmarks, including sleep stage classification and atrial fibrillation detection, while requiring substantially less labeled data. AI

IMPACT Demonstrates a novel self-supervised learning approach for multi-modal physiological data, potentially improving healthcare diagnostics with less labeled data.
RESEARCH · arXiv cs.CL English(EN) · 5d · [2 sources]

Automated IEP Generation from Traditional Chinese Parent-Teacher Interviews via Corpus-Grounded Feature Diffusion

Researchers have developed a novel method for automatically generating Individualized Education Programs (IEPs) in Traditional Chinese, addressing a significant gap in special-education NLP. The proposed Corpus-Grounded Feature Diffusion (CGFD) pipeline utilizes a low-resource fine-tuning approach with a modified Breeze-7B model. This system achieves state-of-the-art results on a held-out test set, outperforming several leading LLMs in zero-shot performance while ensuring privacy-preserving, local inference. AI

IMPACT Addresses a gap in special-education NLP for Traditional Chinese, offering a privacy-preserving local inference solution.
RESEARCH · arXiv cs.LG English(EN) · 5d · [2 sources]

Assessing Sample Quality in Conditional Generation under Compositional Shift

Researchers have developed a new method to evaluate the quality of generated samples from conditional models, particularly when exploring novel or unobserved conditions. This approach uses a post-hoc trust score that combines global realism and attribute faithfulness, requiring only the original training distribution for assessment. The score can effectively filter, rank, and abstain from generations, demonstrating improvements in downstream predictive performance in biological imaging and vision benchmarks. AI

IMPACT Enables more reliable evaluation of AI-generated content, especially in scientific domains where real-world data is scarce.
RESEARCH · arXiv cs.LG English(EN) · 5d · [2 sources]

Integrating gene regulatory priors into Transformer attention with scTransformer for interpretable scRNA-seq analysis

Researchers have developed scTransformer, a novel approach that integrates gene regulatory information into Transformer models for analyzing single-cell RNA sequencing data. This method enhances interpretability and robustness by incorporating prior biological knowledge into the model's attention mechanisms. Evaluations show scTransformer improves cell-type classification accuracy and produces more biologically meaningful representations compared to standard Transformers. AI

IMPACT Enhances interpretability of AI models in genomics, potentially leading to new biological discoveries.
RESEARCH · arXiv cs.AI English(EN) · 5d · [3 sources]

Reasoning Arena: Trace Tournaments When Verifiable Rewards Fall Short

Researchers have developed "Reasoning Arena," a new framework designed to enhance the reasoning capabilities of large language models. This system addresses a limitation in reinforcement learning with verifiable rewards where identical rewards across different reasoning traces lead to a lack of gradient signal. Reasoning Arena converts these uninformative reward groups into valuable training data by using trace tournaments for head-to-head comparisons, thereby generating richer relative reward signals. The method improves training efficiency and performance on benchmarks, outperforming standard RLVR by 7.6% on average. AI

IMPACT Enhances LLM reasoning by converting uninformative reward signals into useful training data, potentially accelerating development.
RESEARCH · arXiv cs.LG English(EN) · 5d · [3 sources]

Conan-embedding-v3: Fusing Modality-Specific Models for Omni-Modal Embedding

Researchers have developed Conan-embedding-v3, a new framework designed to create a unified embedding space for multiple data modalities including text, images, video, documents, and audio. The approach involves training modality-specific models independently, then fusing their task vectors into a single backbone. A key challenge addressed is "Projector Drift," which occurs when fusing models with external encoders, leading to performance degradation in specific modalities like audio. Conan-embedding-v3 employs "Projector Recovery" and multi-modal rehearsal to mitigate this issue, achieving strong performance on benchmarks like MMEB and MAEB. AI

IMPACT Introduces a novel framework for unifying diverse data types into a single embedding space, potentially improving cross-modal retrieval and understanding.
- Conan-embedding-v3
TOOL · Mastodon — fosstodon.org Polski(PL) · 1d

Discovery of a critical vulnerability in Zcash using the Claude Opus model shows that AI spots errors invisible to humans for years. The incident caused a 41 percent

A critical vulnerability in Zcash was discovered using Anthropic's Claude Opus model, highlighting AI's ability to find human-imperceptible flaws. This discovery led to a significant 41% drop in Zcash's value and signals a new era for automated cybersecurity. AI

IMPACT Demonstrates AI's potential to uncover complex security flaws, potentially revolutionizing cybersecurity practices.
SIGNIFICANT · r/StableDiffusion English(EN) · 1d

CEO Thoughts: What's Next at LTX

LTX, a company focused on generative AI models, is preparing to release its next-generation model, LTX-2. This update will feature architectural innovations, including a mixture-of-experts (MoE) approach for improved efficiency and quality, alongside a dense model option. The company is also enhancing the text encoder for better prompt understanding and optimizing performance for broader hardware compatibility. LTX plans to maintain open-source weights and provide new training infrastructure and tooling to enable community and enterprise fine-tuning for domain-specific applications. AI

IMPACT New MoE architecture and open-source commitment could accelerate specialized model development and adoption.
- Zeev
- LTX-2
TOOL · Medium — Claude tag English(EN) · 3d

Claude Mythos Just Broke the Benchmarks — Here’s What That Actually Means

A new AI model named Claude Mythos has reportedly surpassed existing benchmarks, signaling a significant advancement in AI capabilities. This development is presented as particularly relevant for small business owners considering AI adoption. The implications of this benchmark breakthrough are being analyzed for their real-world impact. AI

IMPACT This advancement could lower the barrier for AI adoption by demonstrating tangible performance gains relevant to business applications.
- Claude Mythos
- Medium
TOOL · dev.to — MCP tag English(EN) · 3d

OpenAI MCP: Use GPT-4o, DALL-E, and Whisper Directly in Claude or Cursor

OpenAI has released a new tool called MCP that allows users to integrate GPT-4o, DALL-E 3, and Whisper directly into AI clients like Claude and Cursor. This integration enables AI agents to call upon OpenAI's various models for specific tasks, such as image generation or audio transcription, without leaving their primary workflow. The setup is designed to be quick, allowing for seamless multi-model orchestration and enhanced capabilities within a single AI environment. AI

IMPACT Streamlines multi-model workflows by allowing AI agents to directly access OpenAI's capabilities within other platforms.
- Cursor
- OpenAI
- GPT-4o
- DALL-E 3
- Whisper
- Claude
RESEARCH · 36氪 (36Kr) 中文(ZH) · 3d

Yingli Co., Ltd.: If high-performance AIPC brings a wave of replacement, the company will directly benefit

SpaceX is preparing to demonstrate its space-based AI infrastructure by late 2027, with initial orbital computing capabilities planned for next year. The company's roadmap was presented by President Gwynne Shotwell and CFO Bret Johnsen. Separately, Anthropic has released its most powerful model to date, Claude Fable 5, which is reportedly so advanced that it is advised for cautious use by the general public. AI

IMPACT SpaceX's venture into space-based AI infrastructure could enable new applications and data processing capabilities beyond Earth, while Anthropic's advanced model release pushes the boundaries of AI performance.
TOOL · arXiv cs.AI English(EN) · 3d

TD-Grokking: Learning from Zero-Reward Problems by Training-Time Decomposition

Researchers have introduced TD-Grokking, a novel framework designed to enable large language models to learn from zero-reward problems. This method recursively breaks down complex, intractable problems into smaller, verifiable subproblems. These subproblems form a hierarchy, with solvable leaves providing the necessary optimization signals for model improvement. Evaluations on mathematical and medical tasks demonstrated that TD-Grokking significantly outperforms existing baseline approaches. AI

IMPACT Enables LLMs to learn from previously unsolvable zero-reward problems, potentially expanding their capabilities in complex reasoning tasks.
TOOL · arXiv cs.AI English(EN) · 3d

Beyond Static Evaluation: Co-Evolutionary Mechanisms for LLM-Driven Strategy Evolution in Adversarial Games

Researchers have developed a new framework called FAMOU to improve LLM-driven strategy evolution in adversarial games. This framework addresses the challenge of shifting evaluation landscapes by incorporating co-evolutionary mechanisms, hierarchical deep evaluation, and dynamic weakness pressure. Tested on the MCTF 2026 3v3 maritime capture-the-flag task, FAMOU demonstrated superior performance over existing methods, achieving the highest combined score and best generalization to unseen opponents. The evolved strategies also showcased novel algorithmic innovations, validating the approach's effectiveness and real-world transferability. AI

IMPACT Enhances LLM capabilities in complex strategic environments, potentially leading to more sophisticated AI agents in games and simulations.
- ShinkaEvolve
- OpenEvolve
- LLM
- MCTF 2026
- AAMAS 2026
TOOL · arXiv cs.AI English(EN) · 3d

Does Normalization Choice Matter for Causal Large Time-Series Models?

Researchers have investigated the impact of different normalization techniques on causal large time-series models, particularly those using transformer architectures with patching and efficient causal strategies. Their findings indicate that the choice of normalization significantly affects both the speed of training convergence and the accuracy of forecasting performance. The study highlights potential information leakage issues with standard normalization in causal settings and evaluates newer alternatives designed to mitigate this problem. AI

IMPACT Understanding normalization's effect is crucial for optimizing time-series forecasting models, potentially improving their accuracy and efficiency in real-world applications.
- arXiv
- Samy-Melwan Vilhes
TOOL · arXiv cs.LG English(EN) · 3d

Domain Adapted Large Language Models for Additive Manufacturing

Researchers have developed specialized large language models for additive manufacturing by adapting open-weight models like Gemma 3, Qwen 3, and Gemma 4. These models were trained on approximately 50 million tokens of additive manufacturing journal articles, incorporating both text and visual data. Evaluations using the Additive-Manufacturing-Benchmark show these domain-adapted models achieve over 90% accuracy on additive manufacturing knowledge tasks, demonstrating an effective method for LLM specialization. AI

IMPACT Demonstrates a viable method for specializing LLMs for niche industrial applications, potentially improving efficiency and knowledge access in fields like additive manufacturing.
TOOL · arXiv cs.LG English(EN) · 3d

Upper Bounds for Local Learning Coefficients of Three-Layer Neural Networks

Researchers have developed a new formula to calculate an upper bound for local learning coefficients in three-layer neural networks. This formula addresses singular points, which were a limitation in previous methods. The new approach offers a counting rule based on budget, demand, and supply constraints and extends to a broader range of activation functions, including swish and polynomial types under specific conditions. AI

IMPACT Provides a new theoretical framework for understanding the learning behavior of specific neural network architectures.
- Yuki Kurumadani
TOOL · arXiv stat.ML English(EN) · 3d

Interpretable deep convolutional model for nonlinear multivariate time series in complex systems

Researchers have developed a new deep learning model called the Deep Convolutional Interpreter for Time Series (DCIts). This architecture is designed to analyze nonlinear multivariate time series data and provides sample-specific, locally interpretable descriptions of interaction structures. DCIts achieves competitive forecasting accuracy while prioritizing intrinsic interpretability by explicitly learning a time- and lag-dependent transition tensor. AI

IMPACT Introduces a novel interpretable deep learning architecture for time series analysis, potentially improving model transparency in complex systems.
- DCIts
- Deep Convolutional Interpreter for Time Series
TOOL · arXiv cs.LG Italiano(IT) · 3d

PRISM: Parallel Residual Iterative Sequence Model

Researchers have developed PRISM, a novel sequence modeling architecture designed to balance the expressivity of Transformers with the efficiency of linear models. PRISM addresses the serial dependencies found in iterative methods like Test-Time Training by reconstructing the iterative process in a parallelizable form. This is achieved through a Write-Forget Decoupling strategy and a two-stage proxy architecture, enabling significantly higher throughput compared to existing optimization methods. AI

IMPACT Introduces a new parallelizable architecture that significantly boosts throughput for sequence modeling tasks.
TOOL · arXiv cs.AI English(EN) · 3d

LongMoE: Longitudinal Multimodal Learning via Trajectory-Aware Mixture-of-Experts

Researchers have introduced LongMoE, a novel framework designed to tackle the complexities of multimodal clinical learning. This approach effectively addresses two key challenges: missing data across different patient modalities and the temporal dynamics of disease progression. By integrating context-aware imputation with trajectory-aware encoding and a sparse Mixture-of-Experts system, LongMoE can model disease evolution over time even with incomplete or inconsistent patient data. AI

IMPACT Establishes a new foundation for multimodal clinical learning by addressing data missingness and temporal dynamics.
- MIMIC-IV
- LongMoE
- Maxx Richard Rahman
- ADNI
- OASIS-3
TOOL · arXiv cs.CV English(EN) · 3d

WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling

Researchers have developed WorldPlay, a novel streaming video diffusion model designed for real-time interactive world modeling. This model addresses the speed-memory trade-off in current systems by employing a Dual Action Representation for robust input control and a Reconstituted Context Memory with temporal reframing to maintain long-term geometric consistency. Additionally, Context Forcing, a distillation method, ensures the model can effectively utilize long-range information, enabling real-time 720p video generation at 24 FPS with improved consistency and generalization. AI

IMPACT Introduces a new method for real-time interactive video generation with improved consistency, potentially impacting content creation and simulation tools.
- Wenqiang Sun
TOOL · arXiv cs.CV English(EN) · 3d

FoA-SR: Faithful or Aesthetic? Profile-Aware Preference Optimization for Real-World Image Super-Resolution

Researchers have developed a new approach called FoA-SR for image super-resolution that can generate distinct restoration profiles. This method allows for either faithful reconstructions that prioritize structural integrity and reference consistency, or aesthetic reconstructions that focus on visually pleasing details. The system uses a supervised SR adapter trained with various losses, then fine-tunes separate LoRA adapters using profile-specific rewards to achieve these different objectives. AI

IMPACT Enables more nuanced control over image generation, allowing users to prioritize either accuracy or visual appeal.
- FoA-SR
- Flux2SR
- LoRA
- RealSR
- DIV2K
- Amjad Mahdi Alqarni
TOOL · arXiv cs.AI English(EN) · 3d

Sample Where You Struggle: Sharpening Base Model Reasoning via Entropy-Guided Power Sampling

Researchers have developed a new sampling method called Entropy-Guided Power Sampling (EGPS) to improve the reasoning capabilities of base language models. This method addresses the inefficiencies of traditional Metropolis-Hastings samplers by focusing on high-entropy regions within sequences, leading to faster and more effective sampling. EGPS demonstrated strong performance on benchmarks like MATH500, HumanEval, and GPQA, achieving significant speedups over existing techniques. AI

IMPACT Enhances LLM reasoning capabilities and sampling efficiency, potentially leading to more capable AI systems without costly retraining.
TOOL · arXiv cs.AI English(EN) · 3d

Mix, Don't Pick: Why Synthetic Corpus Composition Matters for Time Series Foundation Model Pretraining

A new research paper explores the critical role of synthetic data composition in pretraining time series foundation models. The study found that the choice of synthetic data generator can lead to a twofold difference in forecasting error, and these generator rankings are not consistent across different model architectures. Researchers propose that mixing multiple generators with real data creates the strongest pretraining corpora, framing the problem as one of corpus composition rather than generator selection. AI

IMPACT Highlights the importance of synthetic data composition for time series models, potentially improving forecasting accuracy and model development.
- Moirai-Small
- Chronos-T5-Mini
TOOL · arXiv cs.AI English(EN) · 3d

Forward-Only Convolutional Neural Networks with Learnable Channel-Class Assignment

Researchers have developed a new forward-only learning algorithm for convolutional neural networks (CNNs) that improves upon existing methods. This approach introduces a learnable mechanism for assigning channels to classes, allowing for more adaptive and data-driven specialization. Additionally, a loss-aware layer contribution strategy weights intermediate predictions based on their validation performance, enhancing inference. When integrated into residual CNNs, this method achieves state-of-the-art performance among forward-only models on several image datasets, significantly closing the gap with traditional backpropagation techniques. AI

IMPACT Introduces a more efficient learning paradigm for CNNs, potentially narrowing the performance gap with backpropagation.
TOOL · arXiv stat.ML English(EN) · 3d

Post-Training Augmentation Invariance

Researchers have developed a new framework for post-training augmentation invariance, allowing pretrained neural networks to gain new invariance properties without affecting their performance on original data. This method uses lightweight adapter networks appended to the latent space, trained with novel Markov-Wasserstein minimization or Wasserstein correlation maximization losses. Empirical results show significant improvements in classification accuracy for rotated and noisy images, with minimal corruption to the original features and no fine-tuning of the base network. AI

IMPACT Enables models to generalize better to augmented data without performance degradation on original inputs.
- Keenan Eikenberry
- DINO
TOOL · arXiv cs.AI English(EN) · 3d

Temporal Context Conditioning for Seasonality-Aware Precipitation Nowcasting of High-Intensity Rainfall

Researchers have developed a new deep learning model called the Time-Aware Small-Attention U-Net (TA-SmaAt-UNet) to improve precipitation nowcasting, particularly for high-intensity rainfall events. This model incorporates lightweight temporal conditioning layers that use cyclical encodings of time-of-day and time-of-year to enhance feature representations. Experiments demonstrated that this temporal context is most beneficial for rare, intense rainfall, while also improving the representation of seasonal variability and rainfall intensity distributions. AI

IMPACT Enhances deep learning models for weather forecasting, potentially improving accuracy for extreme weather events.
TOOL · arXiv cs.AI English(EN) · 3d

Speech Meets ELF: Audio Conditional Continuous-Target Diffusion for Speech Recognition and Translation

Researchers have introduced ELF-S2T, a novel approach to speech-to-text systems that operates in a continuous latent space rather than discrete text tokens. This model, built on the Embedded Language Flows (ELF) backbone, uses audio conditioning and flow-matching denoising for both speech recognition and translation tasks. Experiments on standard datasets demonstrate competitive performance and reveal that errors in both recognition and translation stem from similar confusions within this continuous latent space. AI

IMPACT This research suggests a unified approach to speech recognition and translation by leveraging continuous latent spaces, potentially simplifying future model development.
TOOL · arXiv cs.AI English(EN) · 3d

MoE Enhanced Federated Learning for Spatiotemporal Prediction

Researchers have developed MoE-FedTP, a new framework for spatiotemporal prediction that uses a Mixture-of-Experts (MoE) approach within a federated learning system. This method aims to improve traffic prediction accuracy, especially in cities with limited data, by enabling knowledge transfer from data-rich cities without compromising privacy. Experiments show MoE-FedTP outperforms existing cross-city and federated learning techniques. AI

IMPACT This framework could improve traffic management and urban planning in data-scarce regions by enabling more accurate predictions.
TOOL · arXiv cs.AI English(EN) · 3d

Whisper-GPT -- Continuous Discrete Hybrid Representation Language Models For Speech And Music

Researchers have developed Whisper-GPT, a novel language model designed for generating speech and music. This model uniquely integrates continuous audio representations, like spectrograms, with discrete tokens derived from neural compression algorithms. This hybrid approach aims to overcome the context length limitations often encountered with purely discrete token models, while retaining the predictive benefits of discrete spaces for tasks like sampling. AI

IMPACT Introduces a hybrid approach to audio generation that may improve context handling and predictive capabilities.
- Whisper-GPT
- Prateek Verma
TOOL · arXiv cs.AI English(EN) · 3d

Whisfusion: Parallel ASR Decoding with Masked Diffusion

Researchers have developed Whisfusion, a novel non-autoregressive system for automatic speech recognition (ASR) that utilizes masked diffusion models. This approach aims to match the accuracy of traditional autoregressive models while significantly improving inference speed. Whisfusion achieves this by training a diffusion decoder on top of frozen Whisper-large-v3 audio embeddings, enabling parallel decoding and outperforming existing models in both speed and accuracy across multiple languages. AI

IMPACT Establishes masked diffusion as a viable, high-throughput alternative for multilingual ASR, potentially accelerating real-time transcription applications.
TOOL · arXiv cs.CL English(EN) · 3d

Streaming Knowledge Compilation: Proactive Materiality-Scored Pinning for Time-Evolving LLM Wikis

Researchers have developed a new method called Streaming Knowledge Compilation to update LLM wikis with evolving information. This technique uses a "materiality signal" to proactively pin important documents within a fixed token budget, aiming to minimize regret against future queries. The system was tested on financial news and Wikipedia, demonstrating its ability to adapt to new information and providing a more reliable evaluation metric than standard QA scores for post-training knowledge. AI

IMPACT Introduces a novel approach for dynamic LLM knowledge updates, potentially improving the relevance and accuracy of LLM-generated information in rapidly changing domains.
- Llama 3.1 8B
TOOL · arXiv cs.CV English(EN) · 3d

Prompt Reinjection: Alleviating Prompt Forgetting in Multimodal Diffusion Transformers

Researchers have identified a "prompt forgetting" issue in Multimodal Diffusion Transformers (MMDiTs) used for text-to-image generation. This phenomenon occurs because the text prompt's semantic representation degrades as it passes through deeper layers of the model. To address this, a new training-free method called "prompt reinjection" has been proposed, which reintroduces early-layer prompt representations into later layers. Experiments on models like SD3, SD3.5, and FLUX.1 demonstrate that this technique improves instruction-following capabilities and overall generation quality. AI

IMPACT This research offers a technique to enhance the instruction-following capabilities of current text-to-image diffusion models.
TOOL · arXiv cs.CV English(EN) · 3d

SARA: Semantically Adaptive Relational Alignment for Video Diffusion Models

Researchers have developed SARA, a new method for improving video diffusion models by focusing supervision on semantically relevant parts of the video. This approach uses text-conditioned saliency to determine which token pairs in the video generation process are most important for aligning with the prompt. SARA demonstrates improved text alignment and motion quality compared to existing methods in evaluations. AI

IMPACT Enhances video generation quality by improving prompt adherence and semantic accuracy in diffusion models.
TOOL · arXiv cs.LG English(EN) · 3d

Synthesizable Molecular Generation via Soft-constrained GFlowNets with Rich Chemical Priors

Researchers have developed a new method called S3-GFN for generating molecules that are both synthesizable and possess desirable properties. This approach uses a sequence-based Generative Flow Network (GFlowNet) with soft regularization, incorporating rich molecular priors learned from large datasets. By employing contrastive learning with separate buffers of synthesizable and unsynthesizable molecules, S3-GFN effectively guides the generation process towards high-reward chemical spaces, achieving over 95% synthesizability in experiments. AI

IMPACT Introduces a more flexible and scalable approach to generating synthesizable molecules, potentially accelerating drug discovery.
TOOL · arXiv cs.AI English(EN) · 3d

SHAPE: Coalition-Aware Expert Pruning for Sparse Mixture-of-Experts LLMs

Researchers have developed a new framework called SHAPE for pruning experts in sparse Mixture-of-Experts (MoE) large language models. Unlike previous methods that evaluated experts independently, SHAPE considers the cooperative nature of MoE inference, where experts work in coalitions. The framework uses a Shapley-style attribution to identify experts crucial for high-utility collaborations, leading to more effective pruning. Experiments on models like Qwen3-30B-A3B, GPT-OSS-20B, and DeepSeek-V2-Lite demonstrated that SHAPE can significantly reduce memory footprint without substantial accuracy loss, even with up to 40% expert pruning. AI

IMPACT Enables more efficient deployment of large MoE models by reducing memory requirements without sacrificing accuracy.
TOOL · arXiv cs.LG English(EN) · 3d

Uncertainty-aware Multi-fidelity Closure via Conditional Normalizing Flows

Researchers have developed a new framework for improving the accuracy of reduced-order models (ROMs) used in complex multiscale systems. This uncertainty-aware approach utilizes conditional normalizing flows to learn a probabilistic mapping between low-fidelity and high-fidelity model coefficients. The method aims to enhance predictive accuracy while also quantifying the uncertainty in the learned closure, which is crucial for reliable application of ROMs. Experiments on a vortex merging problem demonstrated that this technique significantly improves ROM accuracy over uncorrected models. AI

IMPACT Enhances accuracy and uncertainty quantification for complex system modeling, potentially improving scientific simulations.
- Conditional Normalizing Flows
- Navier Stokes equations
TOOL · arXiv cs.LG English(EN) · 3d

Revisiting Positive Samples in Graph Contrastive Learning: From the Perspective of Message Passing

Researchers have developed a new method called SPGCL to improve Graph Contrastive Learning (GCL). They found that existing GCL methods often fail to effectively learn from positive samples due to the message-passing mechanism in graph encoders. SPGCL aims to fix this by selectively propagating high-energy features and using low-energy features for more reliable positive sampling, leading to better performance in experiments. AI

IMPACT Enhances graph representation learning, potentially improving downstream AI tasks that rely on graph data.
- Graph Contrastive Learning
TOOL · arXiv cs.LG English(EN) · 3d

One Step Closer to Ground Truth: A Multi-Scale Residual-Aware Representation Learning Pipeline for Predicting Time Series Data

Researchers have developed a new two-stage framework for time-series forecasting that aims to improve accuracy by explicitly modeling and correcting systematic residual biases. The approach uses a base transformer model for initial predictions, followed by a dedicated meta-corrector that learns to refine these predictions. This method has demonstrated state-of-the-art performance on eight benchmark datasets, showing significant improvements in standard metrics like MSE and MAE. AI

IMPACT This new framework could lead to more accurate time series predictions, benefiting applications in finance, weather forecasting, and demand planning.
- arXiv
- Transformer
TOOL · arXiv cs.LG English(EN) · 3d

ANCHOR: Autoregressive Non-intrusive Chunk-Ordered Refinement for Joint Multi-Resolution Speech Quality Modeling

Researchers have developed ANCHOR, a novel autoregressive model designed for incremental speech quality assessment. Unlike previous methods that require complete utterances, ANCHOR can estimate quality from partial audio streams, making it suitable for real-time applications. The model employs a dual-resolution token system and a hierarchical structure to refine quality predictions from coarse to fine, demonstrating a significant reduction in error on short audio prefixes. AI

IMPACT Enables real-time speech quality monitoring in streaming and generative AI systems.
TOOL · arXiv cs.LG English(EN) · 3d

Integrating Biological-Informed Recurrent Neural Networks for Glucose-Insulin Dynamics Modeling

Researchers have developed a novel framework called the Biological-Informed Recurrent Neural Network (BIRNN) to improve the modeling of glucose-insulin dynamics for Type 1 Diabetes management. This approach integrates a Gated Recurrent Units (GRU) architecture with physics-informed loss functions that embed physiological constraints. The BIRNN framework demonstrated superior glucose prediction accuracy compared to traditional linear models, even accounting for circadian variations in insulin sensitivity, according to validation using the UVA/Padova simulator. AI

IMPACT This new BIRNN framework could lead to more personalized and adaptive artificial pancreas systems for diabetes management.
TOOL · arXiv cs.AI English(EN) · 3d

When RL Fails after SFT: Rejuvenating Model Plasticity for Robust SFT-to-RL Handoff

Researchers have identified a phenomenon called "model plasticity loss" that hinders the effectiveness of Reinforcement Learning (RL) after Supervised Fine-Tuning (SFT) for large language models. Excessive SFT can lead to over-confident token distributions and difficult optimization landscapes, limiting RL's ability to further enhance model capabilities. To address this, a new method called "Rejuvenation" has been proposed, which uses base-anchored model fusion and targeted neuron resets to restore plasticity while retaining SFT benefits, showing improved performance on reasoning and agentic tasks. AI

IMPACT Addresses a key limitation in LLM training pipelines, potentially improving model performance on complex tasks.
TOOL · arXiv cs.AI English(EN) · 3d

VFUSE: Virulent Feature Understanding with Sparse autoEncoders

Researchers have developed VFUSE, a new method using sparse autoencoders to interpret generative models for protein design. This approach audits models like RoseTTAFold3 and RFDiffusion3 for potentially hazardous features. VFUSE's analysis in the latent space of these models improved the detection of dangerous protein designs, identifying specific features that activate only for hazardous outputs with high accuracy. AI

IMPACT Provides a new tool for ensuring safety and interpretability in generative AI for scientific applications like protein design.
TOOL · arXiv cs.AI English(EN) · 3d

Conditional Vendi Score: Prompt-Aware Diversity Evaluation for Generative AI Models and LLMs

Researchers have introduced Conditional-Vendi and Conditional-RKE, new metrics designed to evaluate the diversity of outputs from generative AI models, specifically when guided by text prompts. These methods build upon existing diversity measures by isolating variability that originates from the model itself, rather than just the prompts. The new scores have demonstrated effectiveness in tasks involving text-to-image generation, image captioning, and large language models, showing they can accurately reflect ground-truth diversity and even guide models to produce more varied outputs. AI

IMPACT Provides new tools for evaluating and improving the diversity of AI-generated content across various modalities.
TOOL · arXiv cs.CL English(EN) · 3d

CodeAlchemy: Synthetic Code Rewriting at Scale

Researchers have developed CodeAlchemy, a framework for generating large-scale synthetic code data to improve AI model training. The system employs five strategies, including code rewriting, question answering, developer tasks, conversational dialogues, and execution traces, producing over 500 billion tokens of synthetic code and 350 billion reasoning tokens. This extensive dataset aims to address the limitations of current models in understanding real-world code tasks, with new benchmarks like DevEval and TraceEval highlighting significant gaps in semantic comprehension among even frontier models. AI

IMPACT This extensive synthetic dataset could significantly improve AI code generation capabilities and understanding of complex programming tasks.
TOOL · arXiv cs.CL English(EN) · 3d

Density Field State Space Models: 1-Bit Distillation, Efficient Inference, and Knowledge Organization in Mamba-2

Researchers have developed Density Field State Space Models (DF-SSM), a novel framework for compressing large SSMs into a 1-bit scaffold with minimal performance loss. Applied to Mamba-2 1.3B, this method resulted in a model that is over nine times smaller and significantly faster for inference, while retaining performance close to a 1.58-bit model. The distillation process is remarkably efficient, requiring limited data and computational resources. Beyond compression, the study also analyzed the model's internal knowledge organization, revealing distinct phases for intent classification, knowledge retrieval, and output formatting, suggesting that representational structure can develop independently of strong factual recall. AI

IMPACT Introduces a highly efficient compression technique for SSMs, potentially enabling wider deployment on resource-constrained devices.