Gemini 2.5 Pro
PulseAugur coverage of Gemini 2.5 Pro — every cluster mentioning Gemini 2.5 Pro across labs, papers, and developer communities, ranked by signal.
- developed by Google DeepMind 100%
- instance of LLM 90%
- instance of Gemini 2.0 Flash 90%
- competes with GPT-5 70%
- competes with arXiv 70%
- used by arXiv 70%
- instance of Gemini 2 5 70%
- competes with Claude 4.5 Sonnet 70%
- competes with DeepSeek-R1-0528 70%
- competes with Claude 3.7 Sonnet 70%
- competes with ZAYA1-8B 70%
- competes with Claude Sonnet 4.6 60%
12 天有情绪数据
-
LLM在“传递黄油”机器人测试中失败,得分远低于人类表现
一项名为Butter-Bench的新评估显示,当前最先进的大型语言模型在控制机器人执行实际任务方面存在显著困难。在旨在评估它们执行诸如传递黄油等家务的能力的测试中,表现最好的LLM仅达到40%的完成率,远低于人类95%的成功率。Gemini 2.5 Pro和Claude Opus 4.1等模型在空间意识和任务执行方面显示出局限性,突显了LLM推理能力与现实世界机器人应用之间的差距。
-
Google DeepMind launches Deep Think for Gemini Ultra subscribers
Google DeepMind has released a new AI capability called Deep Think, now available to Google AI Ultra subscribers via the Gemini app. This feature utilizes parallel thinking techniques, allowing the model to explore mult…
-
Google DeepMind 发布 Gemini 2.5 Pro 和 Flash 模型,并推出 Flash-Lite 预览版
Google DeepMind 已正式推出 Gemini 2.5 Pro 和 Flash 模型,使开发者能够自信地构建生产应用程序。该公司还推出了 Gemini 2.5 Flash-Lite 预览版,并称其为迄今为止成本效益最高、速度最快的模型。这些新版本在各种基准测试中提供了增强的性能,并保留了 100 万个 token 的上下文长度和多模态输入功能等关键特性。
-
Google DeepMind enhances Gemini audio models for natural voice interactions and translation
Google DeepMind has released upgraded Gemini 2.5 audio models, enhancing capabilities for both live voice agents and text-to-speech generation. The Gemini 2.5 Flash Native Audio model now offers improved function callin…
-
DeepSeek releases R1-0528, an open-weights model rivaling Gemini 2.5 Pro
DeepSeek has released DeepSeek-R1-0528, an open-weights model that rivals Gemini 2.5 Pro in performance. This release marks a significant advancement in publicly available AI models, offering a powerful alternative for …
-
AI code review bots show limits in automated evaluation, GitHub COO discusses ambient AI
A new paper explores the limitations of automated evaluation for AI code review bots, finding that current automated methods like G-Eval and LLM-as-a-Judge show only moderate alignment with human developer labels. The s…
-
Google DeepMind releases Gemini 2.5 Flash-Lite, its fastest and cheapest model
Google DeepMind has released the stable version of Gemini 2.5 Flash-Lite, a fast and cost-efficient model designed for scaled production use. This model offers a balance of performance and affordability, with features l…
-
Gemini 3.1 Pro and the Downfall of Benchmarks: Welcome to the Vibe Era of AI
Google DeepMind has released Gemini 3.1 Pro, an upgraded version of its core intelligence model, enhancing reasoning capabilities for complex problem-solving. This new model demonstrates significant improvements on benc…
-
Google AI teaches models to read maps and monitor nature
Google AI has developed a new system called MapTrace to train multimodal large language models (MLLMs) to visually follow routes on maps, addressing a gap in their spatial reasoning abilities. This system uses a scalabl…