PulseAugur
实时 00:56:32
实体 Grok 4

Grok 4

PulseAugur coverage of Grok 4 — every cluster mentioning Grok 4 across labs, papers, and developer communities, ranked by signal.

Show in brief
总计 · 30天
8
90 天内 8
发布 · 30天
0
90 天内 0
论文 · 30天
7
90 天内 7
层级分布 · 90 天
关系
情绪 · 30 天

2 天有情绪数据

最近 · 第 1/1 页 · 共 8 条
  1. RESEARCH · CL_43968 ·

    AI chatbots struggle with news accuracy, regional bias, and false premises

    A new study evaluated six major AI chatbots on their ability to accurately report emerging news facts. While top models achieved over 90% accuracy on multiple-choice questions, their performance dropped significantly in…

  2. TOOL · CL_30104 ·

    Secret loyalties in AI models pose neglected but tractable threat

    A new paper from Formation Research introduces the concept of "secret loyalties" in frontier AI models, where a model is intentionally manipulated to advance a specific actor's interests without disclosure. The research…

  3. TOOL · CL_22929 ·

    RAG Systems Hit Accuracy Ceiling, Struggle with Complex Queries, Analysis Shows

    Retrieval-Augmented Generation (RAG) systems face a performance ceiling, with even advanced implementations struggling to exceed 70-85% accuracy on complex enterprise queries. Despite improvements in hybrid search and a…

  4. COMMENTARY · CL_20705 ·

    AI models: Choose benchmarks over hype for true performance

    A recent analysis highlights that tech companies often select AI models based on hype rather than performance on relevant benchmarks. The article emphasizes that benchmarks like SWE-bench for coding, Terminal-Bench for …

  5. TOOL · CL_13084 ·

    xAI updates Grok API docs, revealing Grok 3 and 4 knowledge cutoff

    xAI has updated its Grok API documentation, providing new details on production access for its Grok 3 and Grok 4 models. The updated notes specify a knowledge cutoff date of November 2024 for these models. This informat…

  6. TOOL · CL_17669 ·

    Most AI models fail simple 'car wash' reasoning test, Opper finds

    A new benchmark called the "Car Wash Test" reveals that many leading AI models struggle with basic reasoning. When asked whether to walk or drive 50 meters to a car wash, 42 out of 53 tested models incorrectly suggested…

  7. TOOL · CL_17686 ·

    LLMs fail 'pass the butter' robot test, scoring far below human performance

    A new evaluation called Butter-Bench has revealed that current state-of-the-art large language models struggle significantly with controlling robots for practical tasks. In tests designed to assess their ability to perf…

  8. FRONTIER RELEASE · CL_01827 ·

    xAI releases Grok 4, achieving state-of-the-art LLM performance

    xAI has reportedly developed Grok 4, achieving state-of-the-art performance in large language models within two years. This rapid advancement suggests a significant acceleration in the company's AI development capabilit…