Brief

last 24h

[50/166] 223 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

RESEARCH · Hugging Face Blog English(EN) · 12h · [2 sources]

NeuroBait: I fine-tuned a model to spark dopamine for ADHD brain

A developer has fine-tuned Google's Gemma 3 12B model, named NeuroBait, to help individuals with ADHD overcome task-initiation paralysis. Unlike typical ADHD tools that offer to-do lists, NeuroBait aims to provide a dopamine boost by offering short, warm, and encouraging prompts based on the user's immediate context. The model was trained on a custom dataset and deployed on Hugging Face Spaces, with plans to release the weights and pipeline for community development. AI

IMPACT Offers a novel approach to AI-assisted task initiation, potentially benefiting individuals with ADHD and those experiencing overwhelm.
- Google
- NeuroBait
- google/gemma-3-12b-it
- ADHD
- Hugging Face
- Gemma 3 12B
- Gradio
- Unsloth
- PEFT
- ZeroGPU
- H100 80GB GPU
RESEARCH · Towards AI Deutsch(DE) · 6h

TAI #208: Open Models Find Their Role as Agent Token Bills Rise

Several AI labs have released new models, including Microsoft's MAI-Thinking-1, Google's Gemma 4 12B, and MiniMax's M3. These releases come as companies face rapidly increasing token consumption due to the rise of long-running AI agents. Cheaper, open-weight models are becoming crucial for handling high-volume, less complex tasks, allowing frontier models to focus on more demanding applications. AI

IMPACT Accelerates adoption of tiered AI agent architectures, balancing cost and performance.
- Vercel
- OpenAI
- Codex
- Apple
- Core AI
- Anthropic
- Cursor
- Sam Altman
- DeepSeek
- Microsoft
- MAI-Thinking-1
- Google
- Gemma 4 12B
- MiniMax
- NVIDIA
- Nemotron 3 Ultra
RESEARCH · arXiv cs.AI English(EN) · 17h · [2 sources]

CatalyticMLLM: A Graph-Text Multimodal Large Language Model for Catalytic Materials

Researchers have developed new multimodal large language models for material science applications. One model, CatalyticMLLM, unifies property prediction and inverse design for catalytic materials by integrating graph and text data within a single framework. Another model, MOF-LLM, enhances spatial reasoning in LLMs for predicting the complex structures of metal-organic frameworks, utilizing a block-level approach and specialized training techniques. AI

IMPACT These models demonstrate LLMs' growing capability in specialized scientific domains, potentially accelerating materials discovery and design.
RESEARCH · 量子位 (QbitAI) 中文(ZH) · 8h

Li Auto's first autonomous driving startup settles in Beijing Yizhuang

Kunlunxing, a new company founded by former Ideal Auto executive Lang Xianpeng and former Alibaba executive Ren Geng, has officially launched in Beijing. The startup aims to develop general-purpose embodied intelligence, focusing on both the physical robot (body) and the AI (brain), with a stated goal of competing with Tesla's humanoid robot. Kunlunxing has reportedly achieved a valuation exceeding $1 billion within days of its registration and has secured significant investment from top-tier institutions. AI

IMPACT Accelerates the race for general-purpose embodied AI, potentially pushing the boundaries of robotics and AI integration.
- Lang Xianpeng
- Ren Geng
- Alibaba
- Tesla
- Beijing
- XinAo Group
- Kunlunxing
RESEARCH · arXiv cs.AI English(EN) · 17h · [3 sources]

EditSR: Enhancing Neural Symbolic Regression via Edit-based Rectification

Researchers are developing new methods for neural symbolic regression, a technique that aims to discover explicit scientific laws from data. EditSR uses a two-layer framework with a neural model and an edit-based rectifier to improve efficiency and accuracy, especially for complex expressions. FunctionEvolve employs an evolutionary framework with expression trees and LLMs to guide the search for symbolic regression, achieving high accuracy on benchmark tasks. Decomposable Neuro Symbolic Regression combines transformer models, genetic algorithms, and genetic programming to generate interpretable multivariate expressions that match the original mathematical structure. AI

IMPACT These advancements in symbolic regression could lead to more interpretable AI models and accelerate scientific discovery by uncovering underlying mathematical relationships in data.
RESEARCH · arXiv cs.CV English(EN) · 17h · [2 sources]

WaveDiT: Distribution-Aware Wavelet Flow Matching for Efficient 3D Brain MRI Synthesis

Researchers have developed two new methods, WaveDiT and FlowLet, for synthesizing 3D brain MRI data. These techniques utilize wavelet transforms and flow matching to generate high-fidelity images efficiently, even on a single GPU. The generated data can improve the performance of downstream tasks like brain age prediction, particularly for underrepresented age groups, while preserving anatomical detail. AI

IMPACT Enables more efficient and accessible generation of synthetic medical imaging data for research and model training.
RESEARCH · r/ClaudeAI English(EN) · 9h

Rumor: Anthropic Planning to Release Public Version of Claude Mythos Tomorrow (with Guardrails)

Anthropic is reportedly planning to release a public version of its advanced Claude Mythos model soon, according to tech journalist Alex Heath. This model, previously available only to select partners for cybersecurity research, is expected to offer significant improvements in long-horizon tasks and agentic capabilities. The release will include substantial safety guardrails, addressing earlier concerns that led to its restricted access. AI

IMPACT Broader access to advanced agentic and reasoning capabilities could accelerate enterprise adoption of AI-powered automation.
RESEARCH · The Verge — AI English(EN) · 7h

Apple’s AI promises are finally, almost, sort of, here

Apple has unveiled its long-awaited AI strategy, centered on an enhanced Siri that integrates across its devices and apps. This new Siri aims to act as an AI agent, capable of understanding context from various sources like emails, texts, and calendars to perform multi-step tasks. While emphasizing on-device processing and privacy, Apple's AI capabilities will be powered by Google's Gemini models, positioning it as a helpful addition rather than a direct competitor to other AI leaders. AI

IMPACT Apple's AI integration aims to make AI more accessible and helpful for everyday users, potentially increasing adoption of AI agents across consumer devices.
- Apple
- Siri
- Google
- Gemini
- Tim Cook
- Craig Federighi
- Mike Rockwell
RESEARCH · arXiv cs.LG English(EN) · 17h · [2 sources]

Heterophily-Aware Adaptive Knowledge Distillation for Hypergraph Neural Networks

Two new research papers introduce advancements in hypergraph neural networks (HNNs). One paper proposes HADES, a method for knowledge distillation that adapts to node heterophily, improving student model performance and inference speed. The other paper introduces Hypergraph U-Nets, a novel architecture that addresses the challenge of pooling and unpooling operations in HNNs, demonstrating superior performance in reconstruction, classification, and anomaly detection tasks. AI

IMPACT These advancements in hypergraph neural networks could lead to more efficient and accurate models for complex relational data.
RESEARCH · Mastodon — fosstodon.org Polski(PL) · 15h

OpenAI implements a new memory architecture that automatically synthesizes context from previous conversations. The system eliminates the need for manual fact-saving

OpenAI has introduced a new memory architecture for its AI models that automatically synthesizes context from past conversations. This system aims to eliminate the need for users to manually save facts, offering a more personalized experience through in-depth analysis of chat history. The new architecture allows the AI to recall and utilize information from previous interactions, enhancing continuity and relevance in conversations. AI

IMPACT Enhances AI conversational continuity and personalization, potentially improving user experience and utility.
- OpenAI
RESEARCH · Hugging Face Daily Papers English(EN) · 1d · [5 sources]

Latent Spatial Memory for Video World Models

Researchers have developed a new method for video world models that stores 3D scene information directly in the diffusion latent space, bypassing the need for pixel-space reconstruction. This approach, named Mirage, significantly reduces computational overhead and memory usage, leading to faster video generation. Experiments show substantial improvements in generation speed and memory footprint compared to existing methods, while also achieving state-of-the-art performance on benchmarks like WorldScore. AI

IMPACT This technique could enable more efficient and faster generation of complex 3D scenes in video, impacting fields like virtual reality and content creation.
RESEARCH · 36氪 (36Kr) 中文(ZH) · 8h

Most large US tech stocks rose in pre-market trading, with Intel up more than 2%.

Apple has unveiled new AI features for its operating system, including an updated Siri. Meanwhile, OpenAI has reportedly filed for an IPO, signaling a significant move towards public markets. The cluster also touches on stock movements for major tech companies like Intel, Tesla, and Nvidia. AI

IMPACT Apple's AI integration could drive broader consumer adoption, while OpenAI's IPO signals major financial market interest in AI.
- Google
- Apple
- OpenAI
- Siri
- Intel
- Tesla
- Meta
- Oracle
- Amazon
- Nvidia
- Netflix
- Microsoft
RESEARCH · arXiv cs.LG English(EN) · 17h · [3 sources]

In-Context Learning of Stochastic Differential Equations with Foundation Inference Models

Researchers have developed a suite of Foundation Inference Models (FIMs) designed to rapidly estimate parameters for various differential equations from time-series data. These models, including FIM-SDE for stochastic differential equations, FIM-PP for temporal point processes, and FIM-ODE for ordinary differential equations, are pretrained on broad distributions of synthetic data. This pretraining allows them to perform in-context (zero-shot) inference or be quickly fine-tuned to specific datasets, often outperforming traditional methods and specialized models that require extensive training. AI

IMPACT These foundation models could significantly speed up scientific discovery by enabling faster and more accurate parameter estimation for complex dynamical systems.
- arXiv
- Ramses Sanchez
- FIM-ODE
- FIM-PP
- FIM-SDE
RESEARCH · 36氪 (36Kr) 中文(ZH) · 10h

Haojiang Intelligence: Controlling Shareholder and Actual Controller Pledges Not to Reduce Company Shares Within 12 Months

Databricks is reportedly in talks for a new funding round that could begin as early as next month. The database management software provider aims for a valuation between $165 billion and $175 billion upon completion of this round. Separately, Apple has introduced new Siri AI features, and OpenAI has reportedly filed for an IPO. AI

IMPACT Databricks' potential new funding round could accelerate its development and deployment of AI infrastructure, impacting the competitive landscape.
- OpenAI
- Databricks
- Apple
- ROKID
RESEARCH · dev.to — Anthropic tag English(EN) · 13h

Anthropic's Data Shows AI Is Now Building AI 8x Faster and the Brand Visibility Implications Are Massive

Anthropic has released data indicating significant advancements in AI development, with their engineers now shipping code eight times faster than in a previous baseline period. The company's AI models, like Claude, are demonstrating rapidly increasing autonomous task capabilities, doubling their performance every four months. These improvements are attributed to recursive self-improvement, where AI systems are increasingly used to design and develop their successors, a trend that has profound implications for how brands will be surfaced and recommended by AI answer engines. AI

IMPACT Accelerates the timeline for AI systems capable of recursive self-improvement, potentially leading to faster AI capability growth and impacting AI-driven information synthesis.
RESEARCH · 36氪 (36Kr) 中文(ZH) · 11h

Acer's May sales reached NT$26.17 billion, a year-on-year increase of 36.5%

Apple has introduced a new version of Siri with enhanced AI capabilities. ROKID has responded to allegations regarding its smart glasses potentially capturing images of flight attendants. Additionally, OpenAI has reportedly filed confidential documents for an Initial Public Offering (IPO). AI

IMPACT New AI features in Siri could influence user interaction with Apple devices, while OpenAI's IPO filing signals major financial market activity.
- Siri
- Apple
- ROKID
- OpenAI
RESEARCH · 36氪 (36Kr) 中文(ZH) · 11h

*ST Jintai: Revocation of delisting risk warning and other risk warnings from June 11

Apple has introduced a new Siri powered by AI, aiming to enhance user interaction with its devices. In other tech news, Rokid has addressed allegations of its smart glasses being used for unauthorized recording of flight attendants. Additionally, OpenAI has reportedly filed confidential documents for an Initial Public Offering (IPO), signaling potential future public market entry. AI

IMPACT Apple's AI-powered Siri could significantly enhance user experience and drive adoption of AI features across its ecosystem.
- OpenAI
- Apple
- Siri
- ROKID
RESEARCH · Axios Technology English(EN) · 12h

Apple finally ships its AI assistant upgrade

Apple has announced a significant upgrade to its AI assistant, Siri, which will be integrated into its devices this fall. This update aims to make Siri more conversational and context-aware, drawing information from texts, emails, and photos to provide personalized assistance. The new Apple Intelligence also includes tools for writing and image generation, with features across Safari, Messages, and Photos. While Apple emphasizes privacy by keeping data on-device, its rivals have already progressed to more advanced agentic AI tools. AI

IMPACT Apple's AI-powered Siri aims to enhance user experience by offering more personalized and context-aware assistance, potentially setting new standards for on-device AI capabilities.
- Google
- Craig Federighi
- Ray Wang
- Apple
- Siri
- Apple Intelligence
- OpenAI
- Anthropic
RESEARCH · 36氪 (36Kr) 中文(ZH) · 13h

"GIM" Completes Angel Round Financing Exceeding 100 Million Yuan

GIM, a company specializing in AI models for the financial sector, has secured over 100 million RMB in seed and seed+ funding. The latest round was led by SAIF Partners, with participation from the family office of a major internet company CEO. GIM plans to use these funds to further develop its proprietary large language models specifically for financial applications. AI

IMPACT This funding will accelerate the development of specialized financial AI models, potentially improving efficiency and insights within the financial sector.
- Monolith砺思资本
- SAIF Partners
RESEARCH · arXiv cs.CV English(EN) · 1d · [2 sources]

POTATR: A Lightweight Image-to-Graph Model for Page-Level Table Extraction

Researchers have developed POTATR, a new lightweight image-to-graph model for extracting tables from documents. This 29 million parameter model significantly outperforms existing methods on the PubTables-v2 benchmark, achieving a GriTS_Con score of 0.964. POTATR is also considerably faster and more cost-effective than current large language models, with its output being spatially grounded for verification and further integration. AI

IMPACT Sets a new standard for efficient and accurate table extraction, potentially accelerating document processing workflows.
RESEARCH · arXiv cs.AI English(EN) · 1d · [2 sources]

Data Synthesis and Parameter-Efficient Fine-Tuning for Low-Resource NMT: A Case Study on Q'eqchi' Mayan

Researchers have developed a novel data synthesis method to create neural machine translation (NMT) models for low-resource Indigenous languages, specifically Q'eqchi' Mayan. By transforming dictionaries into a synthetic corpus and using Parameter-Efficient Fine-Tuning (PEFT) with LoRA adapters on an mT5-base model, they achieved strong structural acquisition. However, the resulting model showed a significant gap in lexical grounding compared to organic language, indicating that while synthetic data is effective for learning grammar, authentic data is crucial for semantic refinement. AI

IMPACT Demonstrates a viable method for creating translation models for endangered languages, preserving linguistic data sovereignty.
RESEARCH · arXiv cs.LG English(EN) · 1d · [2 sources]

Disentanglement with Holographic Reduced Representations

Researchers have developed a novel unsupervised learning algorithm for neural disentanglement using holographic reduced representations (HRR). This approach treats disentangled representations as symbolic structures, moving away from continuous representations common in prior work. The HRR unbinding operation demonstrates an inductive bias for separating factors, achieving competitive results on disentanglement metrics and showing robustness to noise. AI

IMPACT Introduces a novel method for disentangling representations, potentially improving model interpretability and robustness.
RESEARCH · arXiv cs.LG English(EN) · 1d · [2 sources]

When Do Local Score Models Extrapolate Across Size? A Diagnostic Theory and Benchmark

Researchers have developed a new diagnostic theory and benchmark to understand how well local score models can extrapolate across different system sizes. They found that architectural locality alone is insufficient for stable size extrapolation, which is instead governed by the quasi-locality of the Gaussian-smoothed score. The study introduces the Finite-Depth Local Flow (FDLF) benchmark to empirically validate these findings, demonstrating that stable extrapolation depends on the interplay between spatial mixing, score quasi-locality, and model receptive fields. AI

IMPACT Provides a theoretical framework and diagnostic tool to improve the reliability of AI models in scientific generative modeling tasks.
- Finite-Depth Local Flow (FDLF)
- Gaussian-smoothed score
RESEARCH · arXiv cs.LG English(EN) · 1d · [2 sources]

A Unifying Framework for Concept-Based Representational Similarity

Researchers have introduced a new framework to unify and clarify concept-based representational similarity in machine learning models. The framework decomposes alignment into representation vs. concept and instance-wise vs. distributional levels, identifying four key properties. They also developed an intervention-based benchmark called \InterVenchA to measure these properties and proposed the Coupled Sparse Autoencoder (CoSAE) method, which demonstrates that strong alignment emerges when multiple objectives are jointly enforced, even with minimal paired data. AI

IMPACT Clarifies concept alignment in ML, potentially leading to more robust and interpretable models.
RESEARCH · arXiv cs.AI English(EN) · 1d · [2 sources]

Do Video Foundation Models Understand Intuitive Physics? A Layerwise Probing Analysis

A new research paper investigates whether video foundation models possess an understanding of intuitive physics. The study probes frozen representations of models like V-JEPA, VideoMAE, and LTX-Video using benchmarks such as IntPhys2 and Minimal Video Pairs. Results indicate that V-JEPA performs best, particularly with temporal dynamics probes, while VideoMAE is competitive, and LTX-Video shows weaker but present signals. The research also found that physics knowledge is more accessible in intermediate to late layers of these models. AI

IMPACT Reveals emergent physics understanding in video models, potentially improving their real-world interaction capabilities.
RESEARCH · arXiv cs.AI English(EN) · 1d · [2 sources]

Next-Token Prediction Learns Generalisable Representations of Sleep Physiology

Researchers have developed Hypnos, a new foundation model for sleep physiology that utilizes next-token prediction for representation learning. Trained on eight different sensing modalities from over 20,000 polysomnography recordings, Hypnos tokenizes physiological signals and uses an auto-regressive RQ-Transformer to predict future data points. This approach significantly outperforms existing models on various benchmarks, including sleep stage classification and atrial fibrillation detection, while requiring substantially less labeled data. AI

IMPACT Demonstrates a novel self-supervised learning approach for multi-modal physiological data, potentially improving healthcare diagnostics with less labeled data.
RESEARCH · arXiv cs.LG English(EN) · 1d · [2 sources]

Assessing Sample Quality in Conditional Generation under Compositional Shift

Researchers have developed a new method to evaluate the quality of generated samples from conditional models, particularly when exploring novel or unobserved conditions. This approach uses a post-hoc trust score that combines global realism and attribute faithfulness, requiring only the original training distribution for assessment. The score can effectively filter, rank, and abstain from generations, demonstrating improvements in downstream predictive performance in biological imaging and vision benchmarks. AI

IMPACT Enables more reliable evaluation of AI-generated content, especially in scientific domains where real-world data is scarce.
RESEARCH · arXiv cs.IR (Information Retrieval) English(EN) · 1d · [2 sources]

TABVERSE: Benchmarking Cross-Format Table Understanding in LLMs and VLMs

Researchers have introduced TABVERSE, a new benchmark designed to evaluate how well Large Language Models (LLMs) and Vision-Language Models (VLMs) understand tables across different formats. The benchmark standardizes table content while varying its representation, such as HTML, Markdown, LaTeX, and rendered images. Initial findings indicate that model performance is significantly influenced by the table's format, with structured text generally outperforming images, though specific tasks and formats present unique challenges. AI

IMPACT Highlights the impact of data representation on LLM/VLM performance, suggesting a need for robust cross-format handling in future model development.
- TABVERSE
- LLMs
RESEARCH · arXiv cs.LG English(EN) · 1d · [2 sources]

Integrating gene regulatory priors into Transformer attention with scTransformer for interpretable scRNA-seq analysis

Researchers have developed scTransformer, a novel approach that integrates gene regulatory information into Transformer models for analyzing single-cell RNA sequencing data. This method enhances interpretability and robustness by incorporating prior biological knowledge into the model's attention mechanisms. Evaluations show scTransformer improves cell-type classification accuracy and produces more biologically meaningful representations compared to standard Transformers. AI

IMPACT Enhances interpretability of AI models in genomics, potentially leading to new biological discoveries.
RESEARCH · arXiv cs.CL English(EN) · 1d · [2 sources]

When Built-in Thinking Helps and Hurts: Constraint-Level Error Shifts in Instruction Following

A new research paper investigates how "thinking" mechanisms in large language models affect instruction following. The study found that while overall performance changes are minor, the "thinking" process alters error patterns, improving some instructions while worsening others. Specifically, "Planning" constraints benefit from thinking, whereas "Precision" constraints consistently degrade. Analysis of model traces revealed differing correlations between trace relevance and final answer compliance across these constraint types. AI

IMPACT Reveals nuanced effects of internal reasoning mechanisms on LLM instruction following, impacting prompt engineering and model development.
- Qwen3
- Sai Adith Senthil Kumar
RESEARCH · arXiv cs.CL English(EN) · 1d · [2 sources]

Automated IEP Generation from Traditional Chinese Parent-Teacher Interviews via Corpus-Grounded Feature Diffusion

Researchers have developed a novel method for automatically generating Individualized Education Programs (IEPs) in Traditional Chinese, addressing a significant gap in special-education NLP. The proposed Corpus-Grounded Feature Diffusion (CGFD) pipeline utilizes a low-resource fine-tuning approach with a modified Breeze-7B model. This system achieves state-of-the-art results on a held-out test set, outperforming several leading LLMs in zero-shot performance while ensuring privacy-preserving, local inference. AI

IMPACT Addresses a gap in special-education NLP for Traditional Chinese, offering a privacy-preserving local inference solution.
RESEARCH · arXiv cs.AI English(EN) · 1d · [4 sources]

TheoremBench: Evaluating LLMs on Theorem Proving in Formal Mathematics

Researchers have introduced TheoremBench, a new benchmark for evaluating Large Language Models (LLMs) in formal mathematics theorem proving. This benchmark moves beyond competition-style problems to assess model performance on more complex, dependency-rich mathematical developments. Experiments with TheoremBench reveal that LLMs can solve open mathematical problems, with one agent resolving nine Erdős problems and numerous OEIS conjectures, demonstrating the potential of AI-aided formal proof search in advancing mathematical research. AI

IMPACT This research introduces new evaluation methods for LLMs in formal mathematics, potentially accelerating AI's role in scientific discovery.
RESEARCH · dev.to — LLM tag English(EN) · 18h

Local LLMs Answer 71% of Real Queries: MiMo Sets the Bar

Local large language models have significantly improved, now accurately handling 71.3% of real-world queries, a substantial leap from 23.2% last year, according to Stanford research. This advancement is exemplified by Xiaomi's new MiMo-v2.5-Pro model, a trillion-parameter open-weights model that matches top-tier closed models on coding benchmarks and achieves over 1,000 tokens per second on commodity hardware. The increasing capability and efficiency of local models are beginning to challenge the cost dominance of frontier API-based models, though some complex tasks still require more advanced solutions. AI

IMPACT Local models are rapidly closing the capability gap with frontier APIs, potentially inverting the cost calculus for millions of tokens processed monthly.
- GPT-5
- Stanford
- Xiaomi
- MiMo-v2.5-Pro
- Clément Delangue
- Claude Opus
- Epoch AI
RESEARCH · 36氪 (36Kr) 中文(ZH) · 14h

Shanghai Futures Exchange: Will conduct full market tests on June 13 and June 27

Apple has introduced a new Siri powered by AI, aiming to enhance user interaction and capabilities. In other tech news, ROKID has addressed allegations concerning its smart glasses potentially recording flight attendants. Meanwhile, OpenAI has reportedly filed confidential documents for an Initial Public Offering (IPO). AI

IMPACT New AI capabilities in Siri could enhance user experience, while OpenAI's IPO filing signals major market activity.
- Apple
- Siri
- ROKID
- OpenAI
RESEARCH · 36氪 (36Kr) 中文(ZH) · 18h

Hard Science Observation | WWDC 2026: Apple Finally Takes a Small Step in AI, iPhones in China Still Can't Use It

Apple has unveiled its AI strategy at WWDC 2026, focusing on user-centric, personalized, and privacy-respecting features. The company announced a partnership with Google to develop its foundational AI model, which will operate on both device and cloud. Key upgrades include enhanced personal context understanding, world knowledge integration, app tool utilization, and screen awareness, all designed to be accessible across Apple's hardware ecosystem. However, these AI features will not be available in mainland China due to regulatory requirements, and are also facing limitations in the EU. AI

IMPACT Apple's integration of AI across its ecosystem could accelerate mainstream adoption and set new standards for on-device and privacy-focused AI.
- Craig Federighi
- Tim Cook
- Siri
- iPhone
- Apple Vision Pro
- EU
- China
- WWDC 2026
- Google
- Gemini
- Apple
RESEARCH · 量子位 (QbitAI) 中文(ZH) · 18h

A newcomer in the first tier of domestic general large models?!

Chinese AI company Unisound has launched its new foundational model, U2, which focuses on "intelligence density times token value" rather than simply increasing parameter count. This approach aims to reduce the cost and token consumption associated with large language models, particularly in the era of AI agents. U2 reportedly achieves performance comparable to much larger models with significantly fewer active parameters and reduced thinking token usage, making it more efficient for practical applications and development. AI

IMPACT This model's focus on "intelligence density" and reduced token cost could significantly lower operational expenses for AI applications and agents.
- U2
RESEARCH · 36氪 (36Kr) 中文(ZH) · 18h

Ministry of Transport: In the first 5 months, the national waterway passenger transport volume reached 120 million person-times

Xiaomi's MiMo technical team has launched MiMo-V2.5-Pro-UltraSpeed, a new mode for their model inference system. This upgrade significantly boosts inference speed to 1000 tokens/s without compromising model capabilities. Notably, it achieves this performance using only general-purpose GPUs, eliminating the need for custom hardware. AI

IMPACT Accelerates AI model deployment and accessibility by improving inference speed on standard hardware.
RESEARCH · 雷峰网 (Leiphone) 中文(ZH) · 18h

"Superstar Startup" Kunlun Xing Robot Officially Surfaces

Kunlunxing Robotics, a new company focused on embodied intelligence, has officially launched in Beijing's Yizhuang Economic Development Zone. Founded by former Alibaba executive Ren Geng and ex-Li Auto executive Lang Xianpeng, the startup has attracted significant early-stage investment. Kunlunxing aims to develop general-purpose embodied intelligence by focusing on physically grounded causal reasoning, positioning itself as a competitor to industry benchmarks like Tesla's humanoid robot. AI

IMPACT Establishes a new player in the embodied intelligence race, potentially accelerating development towards general-purpose robots.
RESEARCH · arXiv cs.AI English(EN) · 1d · [2 sources]

Closing the Prior-Posterior Loop: Self-Reflective Molecular Design with Analysis-Driven LLM Iteration

Researchers have developed a novel method for molecular design using large language models (LLMs) that moves beyond simple trial-and-error. By feeding detailed physicochemical rationales, such as orbital energies and atomic charges, back into the LLM instead of just numerical scores, the system acts as a causal reasoner. This self-reflective approach achieved a 100% success rate on moderate tasks for targeting HOMO-LUMO gaps and proved effective for dipole-moment design across multiple LLM backbones. AI

IMPACT Enables more mechanistic and precise molecular design by providing LLMs with causal reasoning capabilities.
- HOMO-LUMO gap
- LLM
RESEARCH · arXiv cs.AI English(EN) · 1d · [2 sources]

A Finetuned SpeechLLM for Joint Multi-Granular L2 Assessment and Natural-Language Rationales

Researchers have developed a SpeechLLM designed for assessing L2 speech proficiency across multiple granularities and providing natural language rationales. This model, trained using a hybrid approach of supervised fine-tuning and Bounded Direct Preference Optimization, can predict sentence-level labels for accuracy, fluency, and prosody, as well as word/phoneme-level accuracy. While the model demonstrates strong performance and plausible sentence-level rationales, its faithfulness degrades at the word/phoneme level due to sparse and weakly aligned references. AI

IMPACT Introduces a novel approach to automated L2 speech assessment with explainability, potentially improving language learning tools.
RESEARCH · arXiv cs.CV English(EN) · 1d · [3 sources]

Training-Free Generalized Few-Shot Segmentation through Open-Vocabulary Semantic Arbitration

Researchers have developed new methods for open-vocabulary semantic segmentation, a task that allows models to identify and segment novel concepts based on text descriptions. One approach, the Semantic Calibration Network (SCN), refines mask classification by modeling semantic correlations between classes to improve discrimination while retaining the generalization abilities of pre-trained models like CLIP. Another method, Open-V, offers a training-free framework that combines existing models like SAM3 and CLIP for generalized few-shot segmentation, demonstrating significant performance gains without task-specific adaptation. AI

IMPACT These advancements could lead to more flexible and powerful image analysis tools capable of understanding and segmenting a wider range of concepts without extensive retraining.
RESEARCH · arXiv cs.LG English(EN) · 1d · [2 sources]

Graph Mamba Operator: A Latent Simulator for Interacting Particle Systems

Researchers have developed the Graph Mamba Operator (GraMO), a novel approach for simulating interacting particle systems. GraMO integrates state-space models with graph-based learning to simultaneously handle spatial interactions and long-range temporal dependencies. This method aims to overcome limitations of existing models that often separate these dynamics, leading to error accumulation over extended prediction horizons. AI

IMPACT Introduces a new method for simulating complex dynamical systems, potentially improving long-horizon predictions in fields like robotics and motion capture.
RESEARCH · arXiv cs.CL English(EN) · 1d · [2 sources]

PriFT: Prior-Support Guided Supervised Fine-Tuning

Researchers have introduced PriFT, a novel supervised fine-tuning method designed to improve model generalization. PriFT addresses limitations in standard fine-tuning by deriving token weights from a frozen pretrained model, providing a stable reweighting signal. This approach, which estimates "prior support" for target tokens, consistently enhances performance across various tasks and serves as a superior initialization for reinforcement learning. AI

IMPACT Enhances model generalization and provides better initialization for RL, potentially improving performance on complex tasks like reasoning and code generation.
- Reinforcement Learning
- Supervised Fine-Tuning
RESEARCH · Hugging Face Daily Papers English(EN) · 1d · [3 sources]

TUDSR: Twice Upsampling-Diffusion for Higher Super-Resolution

Researchers have developed TUDSR, a novel framework for image super-resolution that utilizes a two-stage diffusion process to achieve higher resolutions than previously possible. This method addresses the limitations of current diffusion models in handling large upsampling ratios and native resolutions by employing a looped chunk-based training strategy. The TUDSR framework, built upon SD2.1-base, demonstrates state-of-the-art performance, generating high-quality images at resolutions up to $2048^2$, surpassing existing techniques. AI

IMPACT Enables higher-resolution image generation from diffusion models, potentially improving detail in AI-generated imagery.
- SD2.1-base
- TUDSR
RESEARCH · arXiv cs.CV English(EN) · 1d · [2 sources]

vesselFM-CT: Segmenting All Blood Vessels in CT Images for System-Level Cardiovascular Analysis

Researchers have developed vesselFM-CT, a novel model designed to segment all blood vessels within CT images. This advancement aims to overcome the limitations of previous studies that focused on isolated vascular segments, enabling a more comprehensive analysis of the entire cardiovascular system. The model utilizes an iterative training process and a new TubeLoss function to handle the diverse structural variations of blood vessels, from large arteries to minuscule mesenteric vessels. AI

IMPACT Enables comprehensive cardiovascular system analysis from CT scans, potentially improving disease classification and understanding of vascular physiology.
- vesselFM-CT
- Bastian Wittmann
RESEARCH · arXiv cs.AI English(EN) · 1d · [3 sources]

Reasoning Arena: Trace Tournaments When Verifiable Rewards Fall Short

Researchers have developed "Reasoning Arena," a new framework designed to enhance the reasoning capabilities of large language models. This system addresses a limitation in reinforcement learning with verifiable rewards where identical rewards across different reasoning traces lead to a lack of gradient signal. Reasoning Arena converts these uninformative reward groups into valuable training data by using trace tournaments for head-to-head comparisons, thereby generating richer relative reward signals. The method improves training efficiency and performance on benchmarks, outperforming standard RLVR by 7.6% on average. AI

IMPACT Enhances LLM reasoning by converting uninformative reward signals into useful training data, potentially accelerating development.
RESEARCH · arXiv cs.CV English(EN) · 1d · [2 sources]

CapRL++: Unified Reinforcement Learning with Verifiable Rewards for Dense Image and Video Captioning

Researchers have developed CapRL++, a novel framework for training image and video captioning models using reinforcement learning with verifiable rewards. This approach moves beyond traditional supervised fine-tuning by using a vision-free language model to assess caption quality based on its ability to answer questions about the visual content. Evaluations across numerous benchmarks demonstrate that CapRL++ enhances caption quality and pretraining, leading to significant downstream performance gains and enabling smaller models to match the capabilities of much larger ones. AI

IMPACT This new training framework could lead to more capable and efficient vision-language models, improving accessibility and downstream applications.
RESEARCH · arXiv cs.CV English(EN) · 1d · [2 sources]

Echo-DM: Ultrasound Marker Removal via Conditional Latent Diffusion and Region-Aware Fusion

Researchers have developed Echo-DM, a novel framework for removing artificial markers from clinical ultrasound images. This method utilizes a conditional latent diffusion model combined with region-aware fusion to restore images without relying on masks, preserving anatomical details. Experiments on the Echo-PAIR dataset show Echo-DM outperforms existing methods in marker removal and anatomical fidelity, offering efficient deployment options. AI

IMPACT This new method could improve the accuracy of automated analysis in clinical ultrasound imaging by removing distracting artificial markers.
- Echo-DM
- Echo-PAIR
RESEARCH · arXiv cs.CV English(EN) · 1d · [2 sources]

ExDet: Open-Domain Open-Vocabulary Detection with Cross-modal Extrapolation and Rectification

Researchers have introduced ExDet, a novel framework designed to improve open-domain open-vocabulary detection (ODOVD) capabilities. This lightweight system enhances the generalization of existing detectors to new categories and unseen domains without requiring training from scratch. ExDet utilizes text-guided extrapolation to infer visual prototypes and a detector-compatible rectification module to adjust representations, achieving state-of-the-art results on several benchmark datasets. AI

IMPACT Enhances generalization for object detection models, potentially improving performance in real-world applications with novel objects and diverse environments.
- MSOSB
- ExDet
- arXiv
- OV-LVIS
- OD-LVIS
- Objects365
RESEARCH · arXiv cs.LG English(EN) · 1d · [3 sources]

Conan-embedding-v3: Fusing Modality-Specific Models for Omni-Modal Embedding

Researchers have developed Conan-embedding-v3, a new framework designed to create a unified embedding space for multiple data modalities including text, images, video, documents, and audio. The approach involves training modality-specific models independently, then fusing their task vectors into a single backbone. A key challenge addressed is "Projector Drift," which occurs when fusing models with external encoders, leading to performance degradation in specific modalities like audio. Conan-embedding-v3 employs "Projector Recovery" and multi-modal rehearsal to mitigate this issue, achieving strong performance on benchmarks like MMEB and MAEB. AI

IMPACT Introduces a novel framework for unifying diverse data types into a single embedding space, potentially improving cross-modal retrieval and understanding.
- Conan-embedding-v3