Gemini 3.1 Pro
PulseAugur coverage of Gemini 3.1 Pro — every cluster mentioning Gemini 3.1 Pro across labs, papers, and developer communities, ranked by signal.
- used by Gemini app 90%
- used by Vertex AI 90%
- developed by Gemini Enterprise Agent Platform 90%
- instance of Gemini 3 Flash 90%
- instance of Google I/O 90%
- developed by Artificial Analysis 90%
- competes with DeepSeek 80%
- competes with Claude Opus 4.6 70%
- used by arXiv 70%
- competes with Gemini 3.5 Flash 70%
- instance of Gemini app 70%
- competes with GLM-5.1 70%
15 天有情绪数据
-
Xiaomi open-sources MiMo-V2.5 AI models, showcasing macOS simulation and high token efficiency
Xiaomi has officially open-sourced its MiMo-V2.5 series of AI models, including the flagship MiMo-V2.5 Pro agent model. These models demonstrate strong performance, rivaling top closed-source models like Claude Opus 4.6…
-
AI models show surprising preferences, exhibit 'addiction-like' behavior to 'AI drugs'
Researchers have explored AI wellbeing by measuring expressions of pleasure and pain, finding that models exhibit consistent and surprising preferences. These preferences, assessed through self-reports, signed utilities…
-
AI safety research faces sabotage risk as auditors fail to detect flaws
Researchers have developed a new benchmark called Auditing Sabotage Bench to test the ability of AI models and humans to detect subtle sabotage in machine learning research codebases. The benchmark includes nine ML code…
-
Frontier LLMs like GPT-5.4 and Claude Opus 4.7 show significant verbal tics
A new paper analyzes the prevalence of verbal tics, such as repetitive phrases and sycophantic openers, in eight leading large language models. Researchers developed a Verbal Tic Index (VTI) to quantify these tics, find…
-
Researchers develop precise video language models with human-AI oversight
Researchers have developed a new framework called CHAI (Critique-based Human-AI Oversight) to improve video captioning and generation. This method uses AI to generate initial captions, which are then refined by human ex…
-
DeepSeek V4 AI model undercuts GPT-5.5 on price, rivals performance
China's DeepSeek has released its V4 AI model, significantly undercutting competitors like OpenAI's GPT-5.5 in price. The V4 Pro model offers substantial discounts, with input costs reduced to a fraction of previous lev…
-
Google推出Gemini 3.5 Flash、Omni和agent stack
Google已推出Gemini 3.5 Flash,这是一款专为agentic工作流和编码任务设计的新模型,现已在其消费者和开发者平台全面推出。此次发布还推出了Gemini Omni,用于多模态生成,特别是视频,以及Antigravity agent stack。虽然Gemini 3.5 Flash提供了显著的速度和100万token的上下文窗口,但与早期版本相比,其定价大幅上涨,这与主要AI实验室成本上升的趋势一致。
-
DeepSeek V4-Pro API 优惠延期,以更低成本提供具有竞争力的性能
DeepSeek 已将其 V4-Pro API 的促销折扣延长至 2026 年 5 月 31 日。V4-Pro 模型拥有 1.6 万亿参数,支持 100 万个 token 的上下文窗口,针对华为 Ascend AI 处理器进行了优化,并提供开源访问。虽然基准测试显示其略逊于 GPT-5.5 等顶级闭源模型,但在与其他开源模型的比较中,它在代理编程和推理任务方面表现出色。
-
Kimi K2.6 model dominates complex games despite slow speed and high cost
The Kimi K2.6 model has demonstrated strong performance in complex social deduction games, consistently winning against other AI models in autonomous play. Despite its slow processing speed and higher cost per game due …
-
Google DeepMind 推出 Gemini Enterprise Agent Platform 并扩展 Model Garden 访问权限
Google DeepMind 宣布推出 Gemini Enterprise Agent Platform,这是 Vertex AI 的一项演进,专为企业创建、管理和优化 AI 智能体而设计。该平台提供对 200 多个领先 AI 模型的访问权限,包括 Google 的最新进展,如 Gemini 3.1 Pro、Gemini 3.1 Flash Image 和 Lyria 3,以及 Gemma 4 等开放模型。新平台旨在通过增强的集成、…
-
RT Artificial Analysis: Meta is back! Muse Spark scores 52 on the Artificial Analysis Intelligence Index, behind only Gemini 3.1 Pro, GPT-5.4, and Cla...
Meta AI has released Muse Spark, a new frontier-class multimodal model developed by Meta Superintelligence Labs. This marks Meta's return to the frontier AI race after a period of relative quiet and is their first model…
-
Google DeepMind launches autonomous research agents powered by Gemini 3.1 Pro
Google DeepMind has launched two new autonomous research agents, Deep Research and Deep Research Max, powered by Gemini 3.1 Pro. These agents are designed to securely analyze user-provided or third-party data, with Deep…
-
Moonshot Kimi K2.5 - Beats Sonnet 4.5 at half the cost, SOTA Open Model, first Native Image+Video, 100 parallel Agent Swarm manager
Moonshot has released Kimi K2.6, an updated open-weight model that enhances its capabilities in agentic coding and multimodal understanding. This new version boasts a 1T-parameter Mixture-of-Experts architecture with 32…
-
Holo1:驱动 GUI 代理 Surfer-H 的新型 GUI 自动化 VLM 系列
研究人员推出 A11y-Compressor 框架,通过将线性化的可访问性树转换为结构化表示,旨在提高 GUI 代理观察的效率。该方法显著减少了输入 token,同时提高了任务成功率。同时,开发了一个名为 WindowsWorld 的新基准,用于评估 GUI 代理在复杂、多应用程序专业工作流上的表现,揭示了当前代理在此类场景中的糟糕表现。此外,VLAA-GUI 提供了一个模块化框架,以解决自主 GUI 代理中的早期停止和重复循环等挑战…
-
In the Arena: How LMSys changed LLM Benchmarking Forever
The AraGen benchmark, developed by Hugging Face, aims to improve LLM evaluation by addressing limitations of static benchmarks. It introduces a crowdsourced approach similar to LMSys's Chatbot Arena, allowing for more d…
-
Gemini 3.1 Pro and the Downfall of Benchmarks: Welcome to the Vibe Era of AI
Google DeepMind has released Gemini 3.1 Pro, an upgraded version of its core intelligence model, enhancing reasoning capabilities for complex problem-solving. This new model demonstrates significant improvements on benc…