GPT-5
PulseAugur coverage of GPT-5 — every cluster mentioning GPT-5 across labs, papers, and developer communities, ranked by signal.
- instance of LLM 95%
- instance of GPT-Realtime-2 95%
- developed by GPT-Realtime-2 95%
- developed GPT-3 90%
- developed by GPT-3 90%
- competes with Opus 4.7 90%
- instance of large-language models 90%
- used by Microsoft Copilot for Microsoft 365 90%
- used by arXiv 70%
- competes with Claude Sonnet 4.5 70%
- instance of GPT-4o mini 70%
- developed by Microsoft Copilot for Microsoft 365 70%
- 2025-08-07 product_launch OpenAI launched GPT-5, its latest AI model, offering enhanced capabilities for businesses.
14 天有情绪数据
-
GPT-5 leads AI model usage rankings, outpacing benchmark champions
A new ranking system based on actual user adoption and discussion, rather than solely benchmark scores, reveals a significant divergence in AI model popularity. GPT-5 emerges as the top-ranked model by usage, despite ne…
-
AgentTape index ranks AI models by usage, not just benchmarks
A new open-source index called AgentTape ranks AI models based on a blend of benchmark performance, actual usage, cost, and speed. Currently, OpenAI's GPT-5 models dominate the top rankings, with GPT-5.5 specifically ex…
-
LLM reasoning effort settings boost cost, offer limited task benefits
The `reasoning_effort` setting in LLMs like OpenAI's GPT-5 and Anthropic's models controls the amount of internal chain-of-thought processing before an answer is generated. While higher settings can improve performance …
-
Fei-Fei Li's team launches ESI-Bench for embodied spatial intelligence
A new benchmark called ESI-Bench has been released by Fei-Fei Li's team to evaluate embodied spatial intelligence in AI. Unlike previous benchmarks that assumed optimal observation, ESI-Bench requires AI agents to activ…
-
New benchmark reveals perception, spatiotemporal modeling as MLLM weaknesses
Researchers have introduced BEAR, a new benchmark designed to evaluate and diagnose the skill-level capabilities of embodied multimodal large language models (MLLMs). This benchmark decomposes embodied tasks into 14 dis…
-
Language models can now forecast research success, outperforming GPT-5
Researchers have developed a method for language models to predict the success of scientific research ideas before experimentation. By training models on a dataset of comparative idea evaluations, they achieved signific…
-
DrugRAG pipeline boosts LLM accuracy in pharmacy Q&A
Researchers have developed DrugRAG, a novel retrieval-augmented generation pipeline designed to enhance the performance of large language models (LLMs) on pharmacy-related question-answering tasks. In their study, they …
-
MLLM jailbreak vulnerability differs across languages and modalities
A new study reveals that the vulnerability of frontier multimodal large language models (MLLMs) to jailbreak attacks is significantly influenced by language and modality. Researchers found that while linguistic framing …
-
AI chatbots struggle with news accuracy, regional bias, and false premises
A new study evaluated six major AI chatbots on their ability to accurately report emerging news facts. While top models achieved over 90% accuracy on multiple-choice questions, their performance dropped significantly in…
-
New TTBYS framework boosts LLM persuasive dialogue with dual knowledge
Researchers have introduced a new framework called Think Thrice Before You Speak (TTBYS) to enhance the Theory of Mind (ToM) capabilities in large language models for persuasive dialogue. This framework addresses limita…
-
OpenAI model disproves 80-year-old math problem for under $1000
OpenAI has announced that an internal model, speculated to be a version of GPT-5, has disproven an 80-year-old mathematical conjecture known as the Erdős planar unit distance problem. This general-purpose reasoning mode…
-
Ricoh develops GPT-5-level Japanese LLM; Needswell launches Copilot training
Ricoh has developed a new Japanese large language model that matches GPT-5's performance, particularly in reasoning capabilities. This advanced model is designed to enhance AI applications and services. Separately, Need…
-
DeepSeek V4 validates on Huawei Ascend 950, testing China's AI chip ecosystem
DeepSeek's V4 model has successfully validated inference on Huawei's Ascend 950 chip, marking a significant step for China's domestic AI hardware. This validation required substantial engineering effort, including rewri…
-
Claude Haiku 4.5 leads in cost-effective JSON extraction benchmark
A recent benchmark evaluated six large language models on their ability to extract structured data, specifically JSON, from customer support emails. The analysis found that Anthropic's Claude Haiku 4.5 offered the best …
-
AI Agents Advance with New Coding Tools and Reasoning Capabilities
Several recent posts explore advancements and applications in AI agents, particularly for coding and reasoning tasks. Topics include building autonomous coding agents that can open GitHub pull requests, using patterns l…
-
Microsoft launches AI certs amid xAI payment dispute and Copilot turnaround
Microsoft has introduced four new AI-related certifications to address the growing demand for AI professionals. Separately, there are reports that Elon Musk's xAI may have failed to pay a $420 fee for tax data. Addition…
-
OpenAI model disproves 80-year-old math conjecture
OpenAI's general-purpose reasoning model has disproved an 80-year-old conjecture in discrete geometry, known as the unit distance problem. This marks a significant advancement for AI in mathematics, as the model autonom…
-
New FineBench benchmark highlights VLM struggles with human activity
Researchers have introduced FineBench, a new benchmark designed to evaluate the fine-grained human activity understanding capabilities of vision-language models (VLMs). The benchmark includes nearly 200,000 question-ans…
-
Human engineers outperform GPT-5 and Gemini in system failure diagnosis
A new benchmark called ARFBench reveals that human engineers still significantly outperform AI models like GPT-5 and Gemini in diagnosing system failures. The results challenge the marketing claims of AI's full autonomy…
-
CodePercept boosts LLM visual perception using code, not just reasoning
Researchers from Shanghai Jiao Tong University and the Qwen team have introduced CodePercept, a novel approach to enhance large language models' visual perception capabilities, particularly for STEM tasks. Their researc…