PulseAugur
实时 20:19:02
实体 GPT-5

GPT-5

PulseAugur coverage of GPT-5 — every cluster mentioning GPT-5 across labs, papers, and developer communities, ranked by signal.

Show in brief
总计 · 30天
94
90 天内 94
发布 · 30天
0
90 天内 0
论文 · 30天
52
90 天内 52
层级分布 · 90 天
关系
时间线
  1. 2025-08-07 product_launch OpenAI launched GPT-5, its latest AI model, offering enhanced capabilities for businesses.
情绪 · 30 天

14 天有情绪数据

最近 · 第 1/5 页 · 共 94 条
  1. COMMENTARY · CL_49506 ·

    GPT-5 leads AI model usage rankings, outpacing benchmark champions

    A new ranking system based on actual user adoption and discussion, rather than solely benchmark scores, reveals a significant divergence in AI model popularity. GPT-5 emerges as the top-ranked model by usage, despite ne…

  2. TOOL · CL_49508 ·

    AgentTape index ranks AI models by usage, not just benchmarks

    A new open-source index called AgentTape ranks AI models based on a blend of benchmark performance, actual usage, cost, and speed. Currently, OpenAI's GPT-5 models dominate the top rankings, with GPT-5.5 specifically ex…

  3. COMMENTARY · CL_46878 ·

    LLM reasoning effort settings boost cost, offer limited task benefits

    The `reasoning_effort` setting in LLMs like OpenAI's GPT-5 and Anthropic's models controls the amount of internal chain-of-thought processing before an answer is generated. While higher settings can improve performance …

  4. TOOL · CL_43727 ·

    Fei-Fei Li's team launches ESI-Bench for embodied spatial intelligence

    A new benchmark called ESI-Bench has been released by Fei-Fei Li's team to evaluate embodied spatial intelligence in AI. Unlike previous benchmarks that assumed optimal observation, ESI-Bench requires AI agents to activ…

  5. TOOL · CL_45081 ·

    New benchmark reveals perception, spatiotemporal modeling as MLLM weaknesses

    Researchers have introduced BEAR, a new benchmark designed to evaluate and diagnose the skill-level capabilities of embodied multimodal large language models (MLLMs). This benchmark decomposes embodied tasks into 14 dis…

  6. TOOL · CL_44809 ·

    Language models can now forecast research success, outperforming GPT-5

    Researchers have developed a method for language models to predict the success of scientific research ideas before experimentation. By training models on a dataset of comparative idea evaluations, they achieved signific…

  7. TOOL · CL_44758 ·

    DrugRAG pipeline boosts LLM accuracy in pharmacy Q&A

    Researchers have developed DrugRAG, a novel retrieval-augmented generation pipeline designed to enhance the performance of large language models (LLMs) on pharmacy-related question-answering tasks. In their study, they …

  8. RESEARCH · CL_48847 ·

    MLLM jailbreak vulnerability differs across languages and modalities

    A new study reveals that the vulnerability of frontier multimodal large language models (MLLMs) to jailbreak attacks is significantly influenced by language and modality. Researchers found that while linguistic framing …

  9. RESEARCH · CL_43968 ·

    AI chatbots struggle with news accuracy, regional bias, and false premises

    A new study evaluated six major AI chatbots on their ability to accurately report emerging news facts. While top models achieved over 90% accuracy on multiple-choice questions, their performance dropped significantly in…

  10. TOOL · CL_43938 ·

    New TTBYS framework boosts LLM persuasive dialogue with dual knowledge

    Researchers have introduced a new framework called Think Thrice Before You Speak (TTBYS) to enhance the Theory of Mind (ToM) capabilities in large language models for persuasive dialogue. This framework addresses limita…

  11. SIGNIFICANT · CL_42312 ·

    OpenAI model disproves 80-year-old math problem for under $1000

    OpenAI has announced that an internal model, speculated to be a version of GPT-5, has disproven an 80-year-old mathematical conjecture known as the Erdős planar unit distance problem. This general-purpose reasoning mode…

  12. RESEARCH · CL_42108 ·

    Ricoh develops GPT-5-level Japanese LLM; Needswell launches Copilot training

    Ricoh has developed a new Japanese large language model that matches GPT-5's performance, particularly in reasoning capabilities. This advanced model is designed to enhance AI applications and services. Separately, Need…

  13. RESEARCH · CL_41028 ·

    DeepSeek V4 validates on Huawei Ascend 950, testing China's AI chip ecosystem

    DeepSeek's V4 model has successfully validated inference on Huawei's Ascend 950 chip, marking a significant step for China's domestic AI hardware. This validation required substantial engineering effort, including rewri…

  14. TOOL · CL_40542 ·

    Claude Haiku 4.5 leads in cost-effective JSON extraction benchmark

    A recent benchmark evaluated six large language models on their ability to extract structured data, specifically JSON, from customer support emails. The analysis found that Anthropic's Claude Haiku 4.5 offered the best …

  15. TOOL · CL_40365 ·

    AI Agents Advance with New Coding Tools and Reasoning Capabilities

    Several recent posts explore advancements and applications in AI agents, particularly for coding and reasoning tasks. Topics include building autonomous coding agents that can open GitHub pull requests, using patterns l…

  16. TOOL · CL_40048 ·

    Microsoft launches AI certs amid xAI payment dispute and Copilot turnaround

    Microsoft has introduced four new AI-related certifications to address the growing demand for AI professionals. Separately, there are reports that Elon Musk's xAI may have failed to pay a $420 fee for tax data. Addition…

  17. RESEARCH · CL_41378 ·

    OpenAI model disproves 80-year-old math conjecture

    OpenAI's general-purpose reasoning model has disproved an 80-year-old conjecture in discrete geometry, known as the unit distance problem. This marks a significant advancement for AI in mathematics, as the model autonom…

  18. RESEARCH · CL_40787 ·

    New FineBench benchmark highlights VLM struggles with human activity

    Researchers have introduced FineBench, a new benchmark designed to evaluate the fine-grained human activity understanding capabilities of vision-language models (VLMs). The benchmark includes nearly 200,000 question-ans…

  19. TOOL · CL_38606 ·

    Human engineers outperform GPT-5 and Gemini in system failure diagnosis

    A new benchmark called ARFBench reveals that human engineers still significantly outperform AI models like GPT-5 and Gemini in diagnosing system failures. The results challenge the marketing claims of AI's full autonomy…

  20. TOOL · CL_38915 ·

    CodePercept boosts LLM visual perception using code, not just reasoning

    Researchers from Shanghai Jiao Tong University and the Qwen team have introduced CodePercept, a novel approach to enhance large language models' visual perception capabilities, particularly for STEM tasks. Their researc…