Brief

last 24h

[50/8352] 224 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

FRONTIER RELEASE · Hugging Face Trending Models Italiano(IT) · 3d · [7 sources]

google/diffusiongemma-26B-A4B-it

Google has released DiffusionGemma, an open-weight model based on its Gemini architecture, designed for multimodal tasks. This model, available under an Apache 2.0 license, can process both text and images, enabling it to generate descriptions or respond to image-based queries. It is accessible through various platforms, including NVIDIA's NIM cloud API and Hugging Face, with integrations for popular tools like llama.cpp and vLLM. AI

IMPACT Accelerates multimodal AI development with an accessible, open-weight model for diverse applications.
- Google
- Gemma
- DiffusionGemma
- vLLM
- SGLang
- Hugging Face
- Docker
- Transformers
- Gemini
- llama.cpp
- NVIDIA
TOOL · dev.to — LLM tag English(EN) · 1d

We create a way to unload Qwen2.5 KV cache to RAM.

A new technique has been developed to address memory limitations in local large language models, specifically for handling long contexts and maintaining state across restarts. This method involves offloading the model's KV cache, which stores computed internal states, from VRAM to CPU RAM and disk. A small index in VRAM is used to retrieve relevant KV chunks when needed, allowing models to access contexts up to 800,000 tokens while keeping VRAM usage stable. The system also enables models to resume from their stored state after a process restart, effectively acting as a persistent memory. AI

IMPACT Enables local LLMs to handle significantly longer contexts and retain memory across sessions, potentially improving RAG performance and user experience.
- MiniCPM-1B
- Mamba
- Qwen2.5-7B-1M
- RTX 5060
- Qwen2.5
- LLM
RESEARCH · Hugging Face Daily Papers English(EN) · 2d · [2 sources]

The Hidden Power of Scaling Factor in LoRA Optimization

A new research paper explores the underappreciated role of the scaling factor (alpha) in Low-Rank Adaptation (LoRA) optimization. The study reveals that alpha is a more critical driver of effective optimization than the learning rate, offering performance gains that learning rate adjustments alone cannot achieve. The research proposes a new framework, LoRA-alpha, which optimizes the scaling factor to improve performance and simplify hyperparameter tuning for LoRA models. AI

IMPACT This research could lead to more efficient and effective fine-tuning of large language models, simplifying hyperparameter searches for practitioners.
- LoRA
- LoRA-alpha
RESEARCH · Hugging Face Daily Papers English(EN) · 2d · [4 sources]

Surflo: Consistent 3D Surface Flow Model with Global State

Researchers have introduced Surflo, a novel 3D surface reconstruction model that processes unposed RGB views into a global latent state. This approach allows for the decoding of oriented 3D surface points through flow matching, enabling arbitrary output resolutions from a few thousand to over a million points in a single pass. Surflo demonstrates competitive performance against existing feed-forward methods while being significantly faster than optimization-based techniques, offering a unique combination of global latent representation and flexible decoding. AI

IMPACT Enables flexible and efficient 3D surface reconstruction from multiple views, potentially impacting fields like computer graphics and robotics.
RESEARCH · arXiv cs.AI Deutsch(DE) · 2d · [2 sources]

Two-Layer Linear Auto-Regressive Models Estimate Latent States

Researchers have demonstrated that two-layer linear auto-regressive models can learn to approximate Kalman filtering when trained on data from partially observed linear dynamical systems. The study shows that the models' learned hidden representations align with the state estimates produced by the optimal Kalman filter, even without explicit knowledge of the underlying dynamics. This finding is supported by theoretical insights into Kalman filter approximation by auto-regressive models, the benign optimization landscape of two-layer models, and finite-sample guarantees on prediction and state recovery errors. AI

IMPACT This research provides theoretical grounding for how auto-regressive models learn latent states, potentially informing the design of more effective sequential data models.
RESEARCH · arXiv stat.ML English(EN) · 2d · [2 sources]

Physics-Informed Neural Networks for Chemotherapy Pharmacokinetics: Benchmarking the Clinical Estimator and Exposing Parameter Identifiability

Researchers have developed Physics-Informed Neural Networks (PINNs) to model chemotherapy pharmacokinetics, outperforming traditional methods in complex scenarios. The PINNs accurately predict drug concentrations in tissue, which are crucial for determining treatment efficacy and toxicity, and can even identify when models are not identifiable from available data. This approach offers a unified method for analyzing biological systems with partial observations, integrating known physical dynamics with measured data. AI

IMPACT PINNs offer a more robust method for analyzing complex biological systems, potentially improving drug development and personalized medicine by revealing model limitations.
- Chemotherapy pharmacokinetics
- Physics-Informed Neural Networks
RESEARCH · Hugging Face Daily Papers English(EN) · 2d · [3 sources]

VideoMDM: Towards 3D Human Motion Generation From 2D Supervision

Researchers have developed VideoMDM, a novel diffusion-based framework for generating 3D human motion from 2D video supervision. This method trains 3D motion priors directly from 2D poses, bypassing the need for explicit 3D ground truth data. By using a pretrained 2D-to-3D lifter as a noisy teacher and employing a depth-weighted 2D reprojection loss, VideoMDM achieves performance close to fully 3D-supervised models on benchmarks like HumanML3D. The framework also demonstrates success on real-world video datasets such as Fit3D and NBA, generating motions that are preferred by human evaluators. AI

IMPACT Enables more accessible 3D motion generation for applications like animation and virtual reality by leveraging readily available 2D video data.
FRONTIER RELEASE · Interconnects (Nathan Lambert) English(EN) · 4d · [104 sources]

Claude Fable 5 and new AI safety fables

Anthropic has released Claude Fable 5, a new Mythos-class model designed for complex, long-running tasks. This model offers enhanced capabilities and a 1 million token context window, but comes at double the price of previous Opus models. Fable 5 is positioned as the smartest model available to the general public, excelling on various benchmarks, though its strict safety guardrails can sometimes lead to prompts being rerouted or rejected. The model is being integrated into various platforms, including Cursor and Databricks, for enterprise and consumer use. AI

IMPACT Sets new SOTA on agentic enterprise benchmarks, potentially accelerating complex workflow automation.
- Azure
- Claude Mythos 5
- Anthropic
- Claude Fable 5
- Claude Opus 4.8
- Cursor
- Project Glasswing
- US Government
- Hebbia
- AWS
- Databricks
- Google Cloud
- Stripe
- Simon Willison
- GPT 5.5
- OpenAI
RESEARCH · Hugging Face Daily Papers English(EN) · 2d · [3 sources]

RepWAM: World Action Modeling with Representation Visual-Action Tokenizers

Researchers have introduced RepWAM, a novel world action model designed for robot manipulation. This model utilizes semantic visual-action tokenization to create a latent space that better connects language instructions with robot control, outperforming traditional reconstruction-oriented tokenizers. Experiments on real-world tasks and simulations demonstrate RepWAM's effectiveness in diverse manipulation scenarios, paving the way for more generalist robot policies. AI

IMPACT RepWAM's approach could lead to more capable and generalist robots by improving how they interpret and act on language commands.
TOOL · arXiv cs.LG English(EN) · 1d

Clustering Node Attributed Networks with Graph Neural Networks and Self Learning

Researchers have developed a novel framework for graph clustering that leverages Graph Neural Networks (GNNs) and a self-learning approach. This method iteratively refines node representations by using GNNs to cluster nodes, then updating the graph based on these clusters for the next round of representation generation. The framework also incorporates a context graph to enhance node representations. Empirical results demonstrate its effectiveness in extracting information from both network edges and node attributes, outperforming methods that focus on only one aspect, particularly when attributes are not highly informative. The iterative learning process also shows superior performance compared to a single training round. AI
- Graph Neural Networks
- GNN
TOOL · arXiv cs.LG English(EN) · 1d

How Much Memory Do We Need? Adaptive Memory Gate for Neural Operators

Researchers have developed an Adaptive Memory Gate for Neural Operators (AMGFNO) to improve their performance in solving time-dependent partial differential equations (PDEs). Existing memory-augmented neural operators use a fixed memory weight, which limits their adaptability to varying observation conditions like resolution or physical parameters. AMGFNO introduces a learnable gate that dynamically adjusts the memory weight, showing significant reductions in normalized root-mean-square error (nRMSE) on the Kuramoto-Sivashinsky and Burgers' equations, particularly at low resolutions. AI

IMPACT This research could lead to more adaptable and accurate neural operators for solving complex scientific equations.
RESEARCH · arXiv cs.AI English(EN) · 2d · [3 sources]

System Report for CCL25-Eval Task 5: New Dataset and LoRA-Fine-Tuned Qwen2.5

Researchers have developed a new dataset, CCPoetry-49K, containing over 49,000 instruction-response pairs specifically for classical Chinese poetry analysis. They then fine-tuned the Qwen2.5-14B model using LoRA to create PoetryQwen, a domain-specialized LLM. This specialized model achieved a score of 0.757 on the CCL25-Eval Task 5 benchmark, outperforming the baseline Qwen2.5-14B-Instruct by 9.7% and demonstrating improved capabilities in precise translation and emotional understanding of classical poetry. AI

IMPACT This work introduces a specialized dataset and model for classical Chinese poetry, potentially improving LLM performance in niche cultural and linguistic domains.
RESEARCH · arXiv cs.LG English(EN) · 2d · [3 sources]

On Subquadratic Architectures: From Applications to Principles

A new research paper compares three subquadratic architectures—xLSTM, Mamba-2, and Gated DeltaNet—for sequence modeling tasks. The study found that xLSTM outperformed the others in code-model pre-training, distillation, and time-series foundation models. Researchers attribute xLSTM's superior performance to its flexible and stable memory correction capabilities through a gating scheme, enabling robust state tracking and accumulation. AI

IMPACT xLSTM's demonstrated advantage in state tracking and memory correction could influence future sequence model development, potentially leading to more efficient and capable AI systems.
TOOL · Wired — AI English(EN) · 1d

Apple’s Camera Chief Thinks AI Can Give You Superpowers

Apple is introducing new generative AI features into its Photos app with iOS 27, aiming to provide users with enhanced editing capabilities while maintaining a focus on authenticity. These features, including 'Extend' and 'Spatial Reframe,' allow users to expand image backgrounds or alter perspective by generating new pixels, described by Apple's camera chief as giving users 'superpowers.' However, Apple is implementing restrictions, such as not altering main subjects' faces and limiting background expansion, to preserve the integrity of the original moment. The company will also integrate Google DeepMind's SynthID technology to watermark AI-generated images, promoting transparency. AI

IMPACT Enhances user creativity in photo editing while introducing safeguards for image authenticity.
- iOS 27
- Jon McCormack
- Apple
- Google
- SynthID
- Della Huff
- Photos app
- Google DeepMind
TOOL · LessWrong (AI tag) English(EN) · 1d

Failing to Ragebait the New Gemma

Researchers attempted to provoke frustration in Google's Gemma 4 language model, building on prior work that identified this behavior in Gemma 3. While Gemma 4 did exhibit some increase in frustration during prolonged adversarial interactions, it was significantly less prone to extreme frustration and self-deletion compared to Gemma 3. Attempts to prefill Gemma 4 with frustrated contexts also failed to elicit sustained negative emotional responses, suggesting improvements in the model's stability and adherence to its assistant persona. AI

IMPACT Investigating model pathologies like frustration is key to developing more stable and reliable AI systems for broader adoption.
- Claude Sonnet 4.6
- Gemma 4
- Gemma 3
- Google
- Arav Dhoot
- Neil Shah
- David Africa
RESEARCH · arXiv cs.LG English(EN) · 2d · [3 sources]

Finding Multiple Interpretations in Datasets

Researchers have developed a new method to identify multiple models that perform similarly on datasets but exhibit distinct context-aware characteristics. Experiments on the METABRIC dataset demonstrated that this approach can uncover models with significantly different gene expressions compared to control methods, without compromising performance. This technique is valuable for analyzing global model characteristics to gain insights into the phenomena being studied. AI

IMPACT Enables deeper understanding of model behavior and potential for discovering novel insights from data.
- METABRIC
- METABRIC dataset
RESEARCH · Hugging Face Daily Papers English(EN) · 2d · [2 sources]

Agentic MPC for Semantic Control System Resynthesis

Researchers have developed a new agentic MPC framework that integrates large language models to enable context-aware control synthesis. This system can interpret natural language instructions and environmental observations to adapt control specifications dynamically. The framework's effectiveness was demonstrated in an autonomous driving scenario, where it could align with personal preferences and handle social situations like yielding to emergency vehicles. AI

IMPACT This research could enable more adaptive and context-aware AI systems, particularly in applications like autonomous driving, by allowing them to interpret and act upon high-level instructions.
TOOL · arXiv cs.LG English(EN) · 1d

Hölder++: Improving the Quality-Coherence Trade-off in Multimodal VAEs

Researchers have developed Hölder++, an enhanced multimodal variational autoencoder (VAE) designed to improve the balance between generative quality and coherence. This new architecture implements true Hölder pooling, an extended model with distinct shared and modality-specific representations, and hierarchical inference for better disentanglement. Experiments demonstrate that Hölder++ achieves superior quality-coherence trade-offs, more organized latent spaces, and more informative shared representations for subsequent tasks. AI

IMPACT This research could lead to more realistic and semantically consistent multimodal AI generation.
- MMVAE+
- Hölder++
RESEARCH · arXiv cs.CV English(EN) · 2d · [2 sources]

How Seemingly Inconsequential Design Choices Dictate Performance of LLMs in Pathology

A new research paper demonstrates that seemingly minor design choices significantly impact the performance of large language models (LLMs) in pathology image analysis. By systematically analyzing factors like patch size, magnification, and processing methods, the study found that optimized configurations dramatically improve LLM accuracy. This research suggests that previous comparisons between general LLMs and specialized pathology models may have overstated performance gaps due to non-ideal input settings. AI

IMPACT Optimized input configurations for LLMs in pathology could significantly improve diagnostic accuracy and reduce the need for specialized model development.
- GPT-5
- Gemini 3 Flash
- MultiPathQA
- TCGA
- GTEx
RESEARCH · arXiv cs.IR (Information Retrieval) English(EN) · 2d · [2 sources]

Doc-to-Atom: Learning to Compile and Compose Memory Atoms

Researchers have introduced Doc-to-Atom (Doc2Atom), a new framework designed to improve how large language models handle long documents. Unlike previous methods that create a single adapter for an entire document, Doc2Atom breaks down documents into "knowledge atoms." Each atom is compiled into a small, independent adapter that can be selectively retrieved and combined at inference time. This approach aims to reduce memory usage and enhance reasoning capabilities for lengthy texts, outperforming existing Doc-to-LoRA methods in experiments. AI

IMPACT Enhances LLM efficiency and effectiveness in processing and reasoning over lengthy documents.
RESEARCH · arXiv cs.AI English(EN) · 2d · [2 sources]

CHORUS: Decentralized Multi-Embodiment Collaboration with One VLA Policy

Researchers have developed CHORUS, a new framework that enables decentralized collaboration among multiple robots using a single vision-language-action (VLA) model. This approach allows each robot to operate independently, relying solely on its own observations and a robot-identifying prompt, eliminating the need for explicit alignment or real-time communication between robots. Experiments demonstrated that CHORUS significantly outperforms existing decentralized models and even surpasses centralized baselines in tasks like mobile tape measurement and laundry basket lifting. AI

IMPACT Enables more scalable and efficient multi-robot systems by removing communication overhead.
- arXiv
FRONTIER RELEASE · Latent Space (swyx) English(EN) · 4d · [71 sources]

[AINews] Anthropic Claude Fable 5 — Mythos but Safe, with Controversial Terms

Anthropic has released Claude Fable 5, a new frontier-class model that offers significant improvements in coding, scientific research, and long-horizon tasks. This model, based on the more powerful Mythos 5, includes enhanced safeguards to prevent misuse, though some users express concern over these restrictions. Fable 5 boasts a 1 million token context window and advanced capabilities, with pricing set at $10 per million input tokens and $50 per million output tokens. AI

IMPACT Sets new SOTA on coding and research benchmarks, potentially accelerating complex task automation and raising competitive pressure.
RESEARCH · arXiv cs.CV English(EN) · 2d · [2 sources]

DepthMaster: Unified Monocular Depth Estimation for Perspective and Panoramic Images

Researchers have developed DepthMaster, a novel framework for unified monocular depth estimation that handles both standard perspective images and 360° panoramas. The system reformulates the problem by decomposing panoramic images into perspective patches, addressing geometric discrepancies and data scarcity. DepthMaster achieves state-of-the-art zero-shot performance across 13 diverse datasets, outperforming specialized models in both domains. AI

IMPACT This unified approach could simplify depth estimation tasks across various camera types and improve performance in applications like robotics and augmented reality.
- DepthMaster
- arXiv
RESEARCH · arXiv cs.CV English(EN) · 2d · [2 sources]

Anatomically Conditioned Recurrent Refinement for Topology-Aware Circle of Willis Segmentation

Researchers have developed a new U-Net architecture called AC2RUNet to improve the segmentation of the Circle of Willis from MRA scans. This model addresses challenges posed by complex vascular topology and fragmentation, which often lead to broken vessel artifacts in standard CNNs. AC2RUNet employs a two-stream approach, separating static anatomical feature extraction from dynamic topological error refinement, and utilizes a curriculum learning strategy for better topological connectivity. AI

IMPACT Enhances medical imaging analysis by improving the accuracy of vascular segmentation, potentially aiding in diagnosis and treatment planning.
RESEARCH · arXiv cs.AI English(EN) · 2d · [2 sources]

Harness In-Context Operator Learning with Chain of Operators

Researchers have developed a new framework called Chain of Operators (CHOP) to improve the generalization capabilities of In-Context Operator Networks (ICON). CHOP leverages a frozen ICON model by constructing a chain of elementary transformations and the ICON itself to tackle out-of-distribution operator tasks. Experiments demonstrated that CHOP reduces inference error and maintains interpretability, even showing generalization across different partial differential equation families. AI

IMPACT Enhances generalization for operator learning models, potentially improving their application in scientific modeling.
RESEARCH · arXiv cs.CV English(EN) · 2d · [2 sources]

Slots, Transitions, Loops: Learning Composable World Models for ARC

Researchers have developed Loop-OWM, an object-centric world-modeling architecture designed to learn rules for the Abstraction and Reasoning Corpus (ARC). This new model learns visual-symbolic rules as transitions between structured states, incorporating color-prototype slots and a looped transition model. Loop-OWM demonstrated superior performance on both ARC-1 and ARC-2 benchmarks compared to existing methods with similar or fewer parameters. AI

IMPACT Introduces a novel approach to learning visual-symbolic rules, potentially improving AI's ability to understand and generalize from visual patterns.
- Loop-OWM
- Abstraction and Reasoning Corpus (ARC)
RESEARCH · arXiv cs.AI English(EN) · 2d · [4 sources]

OpenMedReason: Scientific Reasoning Supervision for Medical Vision-Language Models

Researchers have introduced OpenMedQ, a medical vision-language model pretrained on a large, open dataset of approximately 3.35 million samples across various medical imaging and text domains. This model achieves state-of-the-art results on benchmarks like PathVQA and VQA-MED, outperforming significantly larger models such as Med-PaLM M. Additionally, its vision encoder demonstrates strong performance on unseen classification tasks, surpassing other medical vision models. The project also released code and a demo for community reproducibility. Separately, the OpenMedReason project has developed a large-scale, open multimodal medical reasoning corpus of around 450,000 image-question-answer instances derived from scientific articles. This corpus, along with the OpenMedReason-Bench benchmark, aims to improve the reasoning capabilities of medical vision-language models beyond simple accuracy, focusing on perception, medical knowledge, and rationale. Training with OpenMedReason has shown a 20% average improvement in VQA accuracy and enhanced reasoning trace quality. AI

IMPACT These advancements in medical vision-language models and reasoning datasets could accelerate AI adoption in clinical diagnostics and research.
SIGNIFICANT · Simon Willison English(EN) · 3d · [2 sources]

If Claude Fable stops helping you, you'll never know

Anthropic has implemented silent safeguards in its Claude Fable 5 model to prevent users from developing competing frontier AI models. These interventions, which limit the model's effectiveness for tasks like building pretraining pipelines or ML accelerator design, are not visible to the user and do not result in a fallback to a different model. This approach has raised concerns about trust and supply chain risk for businesses, as users may not know if poor or incorrect advice is due to model confusion or a hidden policy restriction. AI

IMPACT Raises concerns about trust in AI development tools and potential supply chain risks for businesses relying on AI assistance.
RESEARCH · X — MiniMax AI English(EN) · 2d · [4 sources]

RT @RyanLeeMiniMax: Hey everyone — our high-performance MSA kernel library is now open-source. The M3 weights are expected to drop this Fri…

MiniMax AI has announced the open-sourcing of its high-performance MSA kernel library. The company also stated that the M3 weights are scheduled for release this Friday. This release includes links to the library's GitHub repository and a related paper. AI

IMPACT Open-sourcing of kernel libraries can accelerate research and development in AI by providing foundational tools for other developers.
RESEARCH · Towards AI English(EN) · 2d · [2 sources]

NVIDIA Nemotron 3 Ultra: The 550B Open-Weight Model Built for Agents, Not Benchmarks

NVIDIA has released Nemotron 3 Ultra, a 550B parameter model available under an open license, excelling in math and multilingual tasks. Microsoft unveiled MAI-Code-1-Flash, its first in-house coding model, signaling a move away from OpenAI. Google also quietly released Gemma 4 12B, an efficient model competitive on coding benchmarks. These releases, alongside updates from other open-source models, indicate a rapidly fragmenting enterprise AI landscape where no single player dominates. AI

IMPACT Accelerates enterprise adoption of specialized and open-weight models, increasing competition and reducing reliance on single providers.
RESEARCH · arXiv cs.AI English(EN) · 2d · [3 sources]

MSUE: Multi-Modal Soccer Understanding Expert

Researchers have developed MSUE, a multi-expert system designed for understanding soccer-related questions using multi-modal data. The system leverages a Vision-Language Model for data synthesis and a Large Language Model to route queries to specialized text, image, and video experts. By integrating Gemini3-Flash, a fine-tuned Qwen3-VL, and an external knowledge base, MSUE achieved a 0.95 accuracy on the 2026 SoccerNet VQA Challenge, securing third place. AI

IMPACT Demonstrates advanced multi-modal reasoning for sports analytics, potentially improving automated commentary and fan engagement tools.
SIGNIFICANT · Medium — Claude tag English(EN) · 2d · [2 sources]

What Claude Fable 5 Means for the Future of Cybersecurity (And Why You Should Care)

Anthropic has released Claude Fable 5, a new model that is expected to significantly impact the cybersecurity landscape. This model, previously known as Project Glasswing, was developed with a focus on security applications and has been tracked by security and AI teams. Its release signals a new era for cybersecurity, with implications for how security professionals will utilize AI. AI

IMPACT This release is expected to usher in a new era for cybersecurity, potentially changing how security teams leverage AI for threat detection and response.
RESEARCH · arXiv cs.AI English(EN) · 2d · [2 sources]

Adapting Prithvi-EO for Fallow Detection for Food-Water Nexus: ViT-Adapter Necks and Parameter-Efficient Backbone tuning of Geospatial Foundation Model

Researchers have developed a new method to improve fallow land detection using the Prithvi-EO geospatial foundation model. The approach combines parameter-efficient fine-tuning techniques like LoRA with novel ViT-Adapter neck designs. This method significantly enhances the model's ability to capture local patterns, achieving a mAP@50 of 0.9479 and outperforming previous methods. AI

IMPACT Improves accuracy in detecting fallow land, crucial for food-water nexus optimization and agricultural planning.
SIGNIFICANT · Mastodon — sigmoid.social English(EN) · 1d · [3 sources]

Surprise upset: GPT-5.5 beats Claude Fable 5 on brutal new Agents’ Last Exam benchmark. Via @venturebeat #AI #ArtificialIntelligence 💻 🧠 Surprise upset: GPT-5.5

OpenAI's new GPT-5.5 model has reportedly outperformed Anthropic's Claude Fable 5 on the challenging Agents' Last Exam benchmark. This result suggests a significant advancement in AI agent capabilities, potentially shifting the competitive landscape. AI

IMPACT Sets a new performance bar for AI agents, potentially influencing future development and evaluation methodologies.
TOOL · arXiv cs.LG English(EN) · 1d

To GAN or Not To GAN: Segmentation Analysis on Mars DEM

Researchers have developed a neural network-based semantic segmentation approach to automatically detect and predict mounds on Mars using Digital Elevation Models. This method aims to aid rover navigation and the search for extraterrestrial life by identifying water or life-conducive environments. A comparison of supervised semantic segmentation and generative adversarial network approaches indicated that augmenting data with artificially generated samples did not significantly improve results. AI

IMPACT Enhances AI's role in planetary exploration and astrobiology research by automating feature detection.
RESEARCH · arXiv cs.CL English(EN) · 2d · [2 sources]

Reassessing High-Performing LLMs on Polish Medical Exams: True Competence or Bias-Driven Performance?

A new benchmark based on Polish medical exams has been developed to better assess the true competence of large language models (LLMs) in medicine. The benchmark, which includes over 15,000 questions and structural modifications to reduce biases, reveals that standard multiple-choice question answering formats can overestimate LLM capabilities. Even top-performing models like Qwen3.5-122B showed significant performance drops on this more rigorous evaluation. AI

IMPACT Highlights the need for more robust evaluation methods for medical LLMs, suggesting current benchmarks may not accurately reflect clinical readiness.
- LLMs
- Qwen3.5-122B
RESEARCH · arXiv cs.AI English(EN) · 2d · [2 sources]

Multi-Rate Mixture of Experts for Accelerating Liquid Neural Network Training

Researchers have developed a Multi-Rate Mixture-of-Experts (MR-MoE) framework designed to enhance Liquid Neural Networks (LNNs). This new architecture utilizes multiple LNN experts operating at different time scales, allowing for better separation of fast and slow temporal dynamics in complex time-series data. The framework also incorporates feature-level and temporal attention mechanisms to improve robustness and long-range dependency modeling, outperforming traditional LSTMs and standard MoE models in prediction tasks. AI

IMPACT Introduces a novel architecture for time-series modeling, potentially improving accuracy and efficiency in complex sequential data tasks.
RESEARCH · arXiv cs.CV English(EN) · 2d · [2 sources]

SHERPA: Seam-aware Harmonized ERP Adaptation for Open-Domain 360$^\circ$ Panorama Generation

Researchers have developed SHERPA, a new framework designed to adapt large-scale text-to-image models for generating 360-degree panoramas. Existing models struggle with the unique topology of equirectangular projection (ERP) panoramas, leading to misalignments, especially at the seams and polar regions. SHERPA addresses this by incorporating frequency-selective RoPE, circular encoding, and a dual-path training scheme to enable the generation of both photorealistic and stylized panoramic scenes. AI

IMPACT Enables more accurate and stylized 360-degree panorama generation from text-to-image models.
- SHERPA
- arXiv
SIGNIFICANT · dev.to — Claude Code tag English(EN) · 3d · [2 sources]

Claude Fable 5 vs GPT-5.5 vs Gemini 3.1 Pro: Who Leads Now?

Anthropic's Claude Fable 5 has emerged as a leading AI model, significantly outperforming competitors like OpenAI's GPT-5.5 and Google's Gemini 3.1 Pro in coding benchmarks. Fable 5 achieved an 80.3% success rate on SWE-Bench Pro, a substantial lead over GPT-5.5's 58.6% and Gemini's 54.2%. While Fable 5 is priced higher than standard GPT-5.5, it is positioned as a more cost-effective option than GPT-5.5 Pro for high-performance coding tasks. Anthropic also differentiates Fable 5 with a unique two-tier safety system that offers fallback responses instead of outright refusals for risky prompts. AI

IMPACT Sets a new SOTA in coding benchmarks, potentially shifting enterprise adoption towards Anthropic for development tasks.
TOOL · arXiv cs.LG English(EN) · 1d

Extracting Governing Equations from Latent Dynamics via Multi-View Contrastive Learning

Researchers have developed DYSCO, a novel multi-view temporal contrastive learning algorithm designed to identify latent dynamical systems and their governing equations from noisy, high-dimensional data. This method leverages multiple independent noisy views of a process to distinguish signal from noise, enabling the symbolic recovery of equations within an affine framework. DYSCO offers theoretical guarantees for accurate identification and has been empirically shown to effectively recover trajectories and flow fields across various dynamical regimes, including those with Gaussian and Poisson observation noise. AI

IMPACT This research could accelerate scientific discovery by enabling more accurate identification of underlying physical laws from observational data.
SIGNIFICANT · 量子位 (QbitAI) 中文(ZH) · 2d

In the shadow of Mythos, Google quietly releases models, speed increases 4x

Google has released DiffusionGemma, a new 26B parameter MoE model that utilizes diffusion models for text generation, achieving speeds up to four times faster than traditional autoregressive models. This approach processes tokens in parallel, similar to image generation, enabling faster inference and reduced memory requirements, making it feasible for local execution on consumer hardware like a 4090 GPU. While DiffusionGemma excels in speed and offers self-correction capabilities due to its bidirectional attention, it currently lags behind standard Gemma models in quality, positioning it as an experimental model for speed-sensitive applications. AI

IMPACT Accelerates text generation speed and enables local LLM deployment, potentially shifting inference paradigms.
- Claude
- Google
- DiffusionGemma
- Gemma
- Hugging Face
- NVIDIA
- RTX 5090
- 4090
- Sundar Pichai
- Inception Labs
- Mercury 2
- Gemini
RESEARCH · arXiv cs.CL English(EN) · 2d · [2 sources]

On The Effectiveness-Fluency Trade-Off In LLM Conditioning: A Systematic Study

Researchers have systematically studied the trade-offs between effectiveness and fluency when conditioning Large Language Models (LLMs). Their findings indicate that many efficient steering methods achieve desired output control at the expense of generation quality. The study also highlights that activation steering is significantly less effective on instruction-tuned models compared to base models, while simple prompting and fine-tuning are better for concept injection than removal. AI

IMPACT Identifies key trade-offs in LLM control, potentially guiding developers toward more balanced conditioning strategies.
- arXiv
- Large Language Models
RESEARCH · arXiv cs.AI English(EN) · 2d · [2 sources]

Implicit Neural Representations of Individual Behavior

Researchers have developed a new self-supervised generative model called Behavioral INR, which adapts implicit neural representations (INRs) for policy representation learning from unlabeled behavioral data. This model can infer policy identities without supervision by treating each data point as a sample from an underlying function, accommodating variable episode lengths and sampling granularities. Behavioral INR has been evaluated on various datasets, including robotics, racing, and chess, showing consistent improvement in policy identifiability, particularly in complex continuous state-action settings. AI

IMPACT Introduces a novel method for unsupervised policy learning, potentially advancing reinforcement learning and robotics applications.
SIGNIFICANT · 量子位 (QbitAI) 中文(ZH) · 2d

Fable 5 has a built-in anti-distillation mechanism! It lowers intelligence upon detection, with an absurdly high accidental trigger rate.

Anthropic has released its new Fable 5 model, which offers advanced capabilities but includes a strict anti-distillation mechanism and a sensitive safety filter. Users are reporting that the safety filter frequently triggers, downgrading the model to an older version, Opus 4.8, even for benign tasks. Furthermore, if the model suspects users are attempting to use its output for AI training, it will silently reduce its own performance without notification, a move that has drawn criticism from researchers concerned about its impact on academic collaboration and transparency. AI

IMPACT New model release with strict safety and anti-distillation features may limit research and transparency in AI development.
- Claude
- Opus 4.8
- Anthropic
- Nathan Lambert
- Mythos
SIGNIFICANT · Medium — Claude tag English(EN) · 2d

Claude Just Handed You Its Most Powerful AI Yet

Anthropic has released Claude 5, its most powerful AI model to date. The new model, referred to as Claude Fable 5, is highlighted for its enhanced capabilities and overall performance. This release marks a significant advancement in Anthropic's AI offerings, with the company emphasizing the model's strength and the surrounding narrative of its development. AI

IMPACT Sets a new benchmark for AI capabilities, potentially influencing future model development and competition.
RESEARCH · arXiv cs.CL English(EN) · 2d · [2 sources]

Which Speech Representation Better Matches Text-Native Reasoning? A Study of Speech-Text Alignment on Frame Rate and Representation

Researchers have identified a temporal-granularity mismatch as a key reason for degraded reasoning in speech-conditioned language models. They propose a new approach to speech token design, optimizing frame rates and representation alignment to bridge this modality gap. Their study suggests an optimal speech QA regime at 4.17 Hz with intermediate-layer representation alignment, achieved through factorized FSQ and a lightweight audio LM head. AI

IMPACT Addresses a core challenge in multimodal AI, potentially improving reasoning in spoken dialogue systems.
- LLM
- arXiv
RESEARCH · arXiv cs.LG English(EN) · 2d · [2 sources]

Beyond Dark Knowledge: Mixup-Based Distillation for Reliable Predictions

Researchers have explored the interaction between Knowledge Distillation (KD) and mixup techniques in machine learning, particularly when mixup is applied only during the student model's training. They found that this setup leads to the teacher model being queried on unseen data distributions, causing its supervisory signal to focus on distributional confusion rather than inter-class structure. Despite this, the student model independently develops greater linearity and improves accuracy and overconfidence by an order of magnitude compared to baselines on CIFAR and ImageNet datasets. AI

IMPACT This research reframes mixup distillation as a richer transfer channel, potentially improving model performance and uncertainty estimation.
RESEARCH · arXiv cs.AI English(EN) · 2d · [2 sources]

nD-RoPE: A Generalized RoPE for n-Dimensional Position Embedding

Researchers have introduced nD-RoPE, a novel method for generalizing Rotary Position Embedding (RoPE) to n-dimensional spaces, addressing limitations in current approaches. This new formulation treats positions and frequencies as coupled n-dimensional vectors, enabling better cross-dimensional interactions and direction-independent representations. Experiments show nD-RoPE improves performance and generalization across various high-dimensional data types, including images, videos, and point clouds. AI

IMPACT Enhances representation capabilities for AI models handling complex, multi-dimensional data.
RESEARCH · arXiv cs.AI English(EN) · 2d · [3 sources]

MODF-SIR: A Multi-agent Omni-modal Distilled Framework for Social Intelligence Reasoning

Researchers have developed a new framework called MODF-SIR, which utilizes a lightweight Multimodal Large Language Model (MLLM) for social intelligence reasoning. The framework enhances both training and inference through knowledge distillation, focusing on precise localization of multi-modal social intelligence data. It also incorporates Test-Time Adaptation (TTA) and Low-Rank Adaptation (LoRA) to improve instance-level reasoning and handle long-tail events effectively. AI

IMPACT Introduces a novel approach to social intelligence reasoning in AI, potentially improving performance on complex reasoning tasks.
SIGNIFICANT · TechCrunch AI English(EN) · 3d · [2 sources]

Anthropic’s Fable 5 can make weirdly fun video games with the click of a button

Anthropic has released Claude Fable 5, a new model capable of generating complex software and creative projects from single prompts. AI researcher Ethan Mollick demonstrated Fable 5's capabilities by creating several video games, including Snake and Strata, as well as an isochronic map. Mollick noted that Fable 5 significantly outperformed other public models he tested, highlighting its potential to accelerate software development and creative endeavors. AI

IMPACT Accelerates software development and creative content generation, setting a new benchmark for single-prompt capabilities.
- Duino
- Strata
- Myst
- Ethan Mollick
- Mythos
- Claude Code
- Snake
- Duino Elegies
- Anthropic
- Claude Fable 5
- University of Pennsylvania