Grok 4
PulseAugur coverage of Grok 4 — every cluster mentioning Grok 4 across labs, papers, and developer communities, ranked by signal.
2 天有情绪数据
-
AI chatbots struggle with news accuracy, regional bias, and false premises
A new study evaluated six major AI chatbots on their ability to accurately report emerging news facts. While top models achieved over 90% accuracy on multiple-choice questions, their performance dropped significantly in…
-
Secret loyalties in AI models pose neglected but tractable threat
A new paper from Formation Research introduces the concept of "secret loyalties" in frontier AI models, where a model is intentionally manipulated to advance a specific actor's interests without disclosure. The research…
-
RAG Systems Hit Accuracy Ceiling, Struggle with Complex Queries, Analysis Shows
Retrieval-Augmented Generation (RAG) systems face a performance ceiling, with even advanced implementations struggling to exceed 70-85% accuracy on complex enterprise queries. Despite improvements in hybrid search and a…
-
AI models: Choose benchmarks over hype for true performance
A recent analysis highlights that tech companies often select AI models based on hype rather than performance on relevant benchmarks. The article emphasizes that benchmarks like SWE-bench for coding, Terminal-Bench for …
-
xAI updates Grok API docs, revealing Grok 3 and 4 knowledge cutoff
xAI has updated its Grok API documentation, providing new details on production access for its Grok 3 and Grok 4 models. The updated notes specify a knowledge cutoff date of November 2024 for these models. This informat…
-
Most AI models fail simple 'car wash' reasoning test, Opper finds
A new benchmark called the "Car Wash Test" reveals that many leading AI models struggle with basic reasoning. When asked whether to walk or drive 50 meters to a car wash, 42 out of 53 tested models incorrectly suggested…
-
LLMs fail 'pass the butter' robot test, scoring far below human performance
A new evaluation called Butter-Bench has revealed that current state-of-the-art large language models struggle significantly with controlling robots for practical tasks. In tests designed to assess their ability to perf…
-
xAI releases Grok 4, achieving state-of-the-art LLM performance
xAI has reportedly developed Grok 4, achieving state-of-the-art performance in large language models within two years. This rapid advancement suggests a significant acceleration in the company's AI development capabilit…