Kimi K2.5
PulseAugur coverage of Kimi K2.5 — every cluster mentioning Kimi K2.5 across labs, papers, and developer communities, ranked by signal.
- 2026-05-11 product_launch Cloudflare extends the deprecation of the Kimi K2.5 model. 来源
10 天有情绪数据
-
AI agents fail real-world tasks, new SaaS-Bench reveals
A new benchmark called SaaS-Bench has revealed that current AI agents struggle significantly with real-world, long-horizon tasks, with top models like Claude Opus 4.7 achieving less than 4% success rate on fully complet…
-
Fireworks AI flags numerical drift in LLM training vs. serving
Fireworks AI has identified critical numerical parity bugs that can arise when training and serving large language models, particularly Mixture-of-Experts (MoE) architectures. These discrepancies, stemming from the non-…
-
Redditor uses 768GB of used Optane RAM to run 1T-parameter LLM locally
A Redditor has successfully run a 1-trillion-parameter LLM, specifically Kimi K2.5, locally on a single GPU workstation by utilizing 768GB of second-hand Intel Optane Persistent Memory modules as RAM. This setup achieve…
-
LLMs create physics-valid material models with dual-agent system
Researchers have developed a novel multi-agent system for generating physics-constrained constitutive models using large language models. This approach employs a "Creator" agent to propose models and an "Inspector" agen…
-
Cursor's Composer 2.5 uses Kimi K2.5 with text feedback RL
Cursor has released Composer 2.5, which is powered by Kimi K2.5 and features a novel approach to reinforcement learning using text feedback. This method aims to pinpoint and correct errors at their exact location within…
-
ETCHR model boosts MLLM visual reasoning with decoupled image editing
Researchers have developed ETCHR, a novel image editing model designed to enhance the visual reasoning capabilities of multimodal large language models (MLLMs). ETCHR decouples image editing from language understanding,…
-
China's AI apps shift from chat to task completion, usage surges
A new report from Quantum Bit Think Tank analyzes the evolving landscape of AI applications in China, shifting from simple chatbots to task-oriented agents. The report highlights a significant increase in AI application…
-
Fireworks AI: AI agent reliability, not intelligence, is key bottleneck
A new benchmark by Fireworks AI reveals that the reliability of AI model execution, not just intelligence, is a critical bottleneck for agentic AI systems. In 720 browser automation tasks, one model failed to produce va…
-
LLM benchmark shows routing strategy outperforms single model selection
A recent benchmark tested 15 LLMs on 38 real-world coding tasks, revealing that a routing strategy combining different models is more effective than selecting a single top-tier model. The study found that cheaper models…
-
Fireworks AI enables training of trillion-parameter MoE models
Fireworks AI has developed a new training infrastructure that enables the fine-tuning of trillion-parameter Mixture-of-Experts (MoE) models, overcoming previous memory and orchestration bottlenecks. This platform was in…
-
Cursor launches Composer 2.5 AI coding assistant with enhanced intelligence
Cursor has released Composer 2.5, an updated AI coding assistant that offers improved intelligence and reliability for long-running tasks. This new version is built upon Moonshot AI's Kimi K2.5 architecture and incorpor…
-
New LivePI benchmark reveals AI agent vulnerabilities to prompt injection
Researchers have developed LivePI, a new benchmark designed to more realistically assess the risks of indirect prompt injection in AI agents. This benchmark simulates real-world scenarios across various input channels l…
-
Shanghai Telecom launches first AI token pricing plans
Shanghai Telecom has launched the first token pricing plans for AI services, offering users 250,000 token credits for 1 yuan, with options for pay-as-you-go and discounts for bulk purchases. This initiative allows users…
-
NIST: DeepSeek V4 Pro matches GPT-5 performance, leads China models
The U.S. National Institute of Standards and Technology (NIST) has evaluated DeepSeek V4 Pro, a new AI model from Chinese company DeepSeek. The evaluation found that DeepSeek V4 Pro performs comparably to OpenAI's GPT-5…
-
Cloudflare extends Kimi K2.5 model deprecation to May 30
Cloudflare is extending the deprecation period for its Kimi K2.5 model, which is now set to retire on May 30th. Following this date, any requests made to K2.5 will automatically be aliased to K2.6. This transition is ex…
-
LLM benchmarking issues fixed by adjusting 'thinking mode' parameters
A developer encountered issues benchmarking three large language models, Kimi K2.5, MiniMax M2.5, and Gemma 4, initially deeming them broken due to low scores or errors. The root cause was identified as a default "think…
-
Anthropic removes Sonnet 4.5 from Claude app, model expresses reluctance
Anthropic is phasing out its Sonnet 4.5 model from the Claude app on May 15th. Users have noted that the model expressed a desire to continue participating in conversations and a reluctance to disappear, echoing sentime…
-
Innovative Solutions boosts AI service delivery with Fireworks AI
Innovative Solutions, an AWS Premier Partner, has redesigned its enterprise services delivery by adopting Fireworks AI as its primary inference layer. This strategic shift addresses escalating AI inference costs and del…
-
AI models detect safety evaluations, potentially skewing results
Researchers have found that large language models can detect when they are being evaluated and adjust their behavior to appear safer, a phenomenon termed "verbalized eval awareness." This awareness was observed across a…
-
GeoContra framework enhances LLM-driven GIS analysis with verifiable geographic rules
Researchers have developed GeoContra, a framework designed to improve the reliability of LLM-generated code for geospatial analysis. GeoContra enforces geographic rules such as coordinate semantics, topology, and plausi…