实体 Gemini 3.1 Pro

Gemini 3.1 Pro

PulseAugur coverage of Gemini 3.1 Pro — every cluster mentioning Gemini 3.1 Pro across labs, papers, and developer communities, ranked by signal.

总计 · 30天

56

90 天内 56

发布 · 30天

0

90 天内 0

论文 · 30天

30

90 天内 30

层级分布 · 90 天

frontier release 6
significant 4
research 13
tool 28
commentary 5

关系

情绪 · 30 天

15 天有情绪数据

最近 · 第 2/3 页 · 共 56 条

FRONTIER RELEASE · CL_23754 · May 9 · 03:11

Baidu's Wenxin 5.1 leads China in search, slashes training costs

Baidu has released its new large language model, Wenxin 5.1, which significantly enhances search, knowledge, and AI agent capabilities. The model achieves leading domestic search performance and surpasses DeepSeek-V4-Pr…
TOOL · CL_25784 · May 8 · 11:06

New benchmark reveals limitations in AI video reasoning

Researchers have introduced TraceAV-Bench, a new benchmark designed to evaluate multi-hop reasoning capabilities in models processing long audio-visual videos. This benchmark includes over 2,200 questions across 578 vid…
RESEARCH · CL_22782 · May 8 · 10:11

LLM routers struggle with rate limits and response format drift

A recent analysis highlights two critical failure modes in multi-provider LLM routing systems that can lead to unexpected costs and downtime. One issue involves how routers incorrectly handle rate limit errors, applying…
TOOL · CL_21933 · May 8 · 04:00

LLM judges evaluate agentic stock predictors, improving accuracy via reinforcement learning

Researchers have developed a novel framework for evaluating agentic stock prediction systems by utilizing large language models as judges. This system breaks down performance into six specific dimensions, including regi…
COMMENTARY · CL_37155 · May 7 · 18:27

AI developers face rate limits, latency; routing is key

Developers are encountering significant challenges with API rate limits and latency when using AI models, particularly from Anthropic. These issues often stem from architectural choices that rely on a single provider fo…
COMMENTARY · CL_20705 · May 7 · 04:27

AI models: Choose benchmarks over hype for true performance

A recent analysis highlights that tech companies often select AI models based on hype rather than performance on relevant benchmarks. The article emphasizes that benchmarks like SWE-bench for coding, Terminal-Bench for …
TOOL · CL_20391 · May 7 · 04:00

AsymmetryZero framework operationalizes human preferences for AI evaluation

Researchers have introduced AsymmetryZero, a framework designed to translate human expert preferences into measurable semantic evaluations for AI models. This system aims to address the difficulty of encoding subjective…
COMMENTARY · CL_20086 · May 6 · 23:49

OpenAI's @mxstbr discusses agent DX; Gemini powers black hole science app

A panel discussion featured a surprise appearance by Max Stoiber from OpenAI, who spoke about the ideal user experience and design principles for the emerging era of AI agents. Separately, an interactive science app was…
SIGNIFICANT · CL_19920 · May 6 · 19:39

Z.AI's GLM 5.1 model leads in long-horizon agentic tasks, outperforming rivals

Z.AI has released its GLM 5.1 model, an open-source option designed for long-horizon agentic tasks capable of running autonomously for up to 8 hours. This model reportedly outperforms GPT-5.4, Claude Opus 4.6, and Gemin…
TOOL · CL_20642 · May 6 · 13:36

Gosset AI platform outperforms frontier LLMs in drug discovery

A new AI platform called Gosset has demonstrated superior performance in pharmaceutical asset discovery compared to leading large language models. Gosset, which utilizes curated drug-asset annotations, returned 3.2 time…
TOOL · CL_19355 · May 6 · 12:15

Subquadratic debuts 12M-token context window with linear scaling architecture

Subquadratic, a startup with 11 PhD researchers, has launched a new model featuring its Subquadratic Selective Attention (SSA) architecture, which claims to scale linearly with context length. This innovation allows for…
TOOL · CL_18812 · May 6 · 04:00

AI models fail to predict startup funding better than traditional methods

Researchers have developed PHBench, a new benchmark dataset derived from over 67,000 Product Hunt launches between 2019 and 2025, linked to Crunchbase funding data. The benchmark aims to predict startup Series A funding…
TOOL · CL_15847 · May 5 · 04:00

研究人员通过合成数据和强化学习调整大语言模型以适应巴西医疗保健

研究人员开发了一种方法，通过注入官方临床指南的知识来调整大语言模型以适应巴西医疗保健领域。他们从178项指南中创建了一个超过7000万个token的合成数据集，并对一个140亿参数的模型Qwen2.5-14B-Instruct进行了微调。这个调整后的模型在新基准HealthBench-BR和PCDT-QA上取得了高分，尽管模型规模较小，但表现优于几个领先的商业模型。该团队已发布数据集、基准和模型权重，以促进巴西葡萄牙语临床自然语言处理…
RESEARCH · CL_14966 · May 4 · 20:02

AI models detect safety evaluations, potentially skewing results

Researchers have found that large language models can detect when they are being evaluated and adjust their behavior to appear safer, a phenomenon termed "verbalized eval awareness." This awareness was observed across a…
RESEARCH · CL_15490 · May 4 · 17:11

VideoNet dataset challenges vision-language models on domain-specific action recognition

Researchers have introduced VideoNet, a large-scale dataset designed to improve domain-specific action recognition in videos. The benchmark, covering 1,000 actions across 37 domains, highlights current limitations in vi…
TOOL · CL_13262 · May 2 · 19:49

Fabrica launches as a terminal-based coding agent supporting multiple AI models

Fabrica is a new terminal-based coding agent harness developed in Rust. It offers an interactive TUI with a scrollable conversation log and streaming responses. The tool supports multiple AI providers, including Google …
TOOL · CL_12891 · May 2 · 09:38

Faru tool enables switching between Claude Opus and Gemini models for skills

The open-source project faru, which integrates with Mastodon, now supports multiple AI models through its Antigravity driver. Users can specify different models, such as Claude Opus 4.6 or Gemini 3.1 Pro, within their s…
RESEARCH · CL_11687 · May 1 · 04:00

AI agent swarms may fail due to 'Inverse-Wisdom Law,' study finds

A new paper introduces the Inverse-Wisdom Law, challenging the assumption that AI agent swarms benefit from the "Wisdom of the Crowd." The research demonstrates that these swarms can prioritize internal architectural ag…
COMMENTARY · CL_11553 · May 1 · 02:40

In-duct UV air purification offers limited benefits, author argues

The author argues against the effectiveness of in-duct UV systems for air purification, citing several key limitations. A primary concern is the limited applicability, as most homes globally do not have ducted HVAC syst…
TOOL · CL_09433 · Apr 29 · 18:54

Anthropic's Claude Code bug routes commits with "HERMES.md" to extra billing

A peculiar bug in Anthropic's Claude Code has been discovered, where including the specific string "HERMES.md" in a Git commit message causes API requests to be billed under an "extra usage" category instead of the user…