实体 GPT-4

GPT-4

PulseAugur coverage of GPT-4 — every cluster mentioning GPT-4 across labs, papers, and developer communities, ranked by signal.

总计 · 30天

90

90 天内 90

发布 · 30天

0

90 天内 0

论文 · 30天

41

90 天内 41

层级分布 · 90 天

frontier release 1
significant 11
research 23
tool 37
commentary 18

关系

情绪 · 30 天

16 天有情绪数据

最近 · 第 3/5 页 · 共 90 条

RESEARCH · CL_15409 · May 5 · 05:07

New benchmarks reveal military LLM compliance gaps and jailbreak vulnerabilities

A new military-aligned safety benchmark called ARMOR 2025 has been introduced to evaluate large language models on their compliance with military doctrines such as the Law of War and Rules of Engagement. Initial results…
COMMENTARY · CL_30038 · May 4 · 21:53

Anthropic 工程师推动 Claude Code 代理输出使用 HTML 而非 Markdown

Anthropic 的 Claude Code 团队主张将代理输出从 Markdown 转向 HTML，认为在拥有大型上下文窗口的情况下，Markdown 的令牌效率已不再是主要问题。Claude Code 工程师 Thariq Shihipar 展示了 HTML 如何提供比 Markdown 更丰富、更具交互性和上下文相关性的输出，而 Markdown 可能导致信息被忽略。此举旨在利用 HTML 的结构优势来提高开发者的理解能力和工…
TOOL · CL_17217 · May 3 · 07:06

What is Tokenization Drift and How to Fix It?

Tokenization drift occurs when minor formatting changes in input text, such as spacing or line breaks, lead to different token IDs being generated by a model. This can cause unpredictable shifts in model behavior becaus…
COMMENTARY · CL_13298 · May 2 · 21:37

Hacker News commenters rank top coding models by performance

A recent analysis of Hacker News comments reveals that while models like GPT-4 and Claude 3 Opus are highly regarded for their coding capabilities, they are not perceived as the absolute state-of-the-art. Users frequent…
RESEARCH · CL_13057 · May 2 · 13:46

GPT-5.5 and Opus 4.7 show systematic reasoning failures on ARC-AGI-3 benchmark

A new benchmark, ARC-AGI-3, has revealed significant reasoning errors in advanced AI models like GPT-5.5 and Opus 4.7. These models achieved a mere 0.8% success rate on the benchmark, highlighting persistent gaps in abs…
COMMENTARY · CL_12702 · May 2 · 02:30

Developers urged to build on cheap AI before subsidies end

AI companies are currently offering subsidized access to powerful models like GPT-4 and Claude Opus, similar to how Uber and AWS subsidized early adoption. This strategy aims to capture market share by making advanced A…
RESEARCH · CL_12039 · May 1 · 09:29

Google DeepMind's AI Co-Clinician beats GPT-5.4 in medical tests, aids doctors

Google DeepMind has developed an AI co-clinician designed to assist physicians with diagnostics and patient care, aiming to reduce errors and improve efficiency. In blind evaluations, this AI demonstrated superior perfo…
RESEARCH · CL_10517 · Apr 30 · 10:24

IBM 新推出的 8B Granite 4.1 模型性能超越了旧款 32B MoE 版本

IBM 发布了 Granite 4.1，这是一个专为企业设计的开源语言模型家族，包含三种尺寸（3B、8B 和 30B 参数）。值得注意的是，在 ArenaHard 和 GSM8K 等多项基准测试中，8B 密集模型表现出的性能与之前的 32B MoE 模型相当甚至更优。这一改进归功于 IBM 对数据质量的关注以及涉及 15 万亿 token 和迭代数据混合调整的复杂多阶段训练过程。
COMMENTARY · CL_07403 · Apr 28 · 10:08

The Social Edge of Intellgience: Individual Gain, Collective Loss https://www.theideasletter.org/essay/the-social-edge-of-intelligence/ # HackerNews # Tech # AI

A recent study suggests that while AI tools can enhance individual creativity, they may lead to a collective loss of diversity in output. Researchers found that writers using GPT-4 produced more creative individual stor…
RESEARCH · CL_08320 · Apr 28 · 09:25

AI chatbots excel at emergency psychiatric triage but over-assign urgency

A new study evaluated 15 advanced AI chatbots on their ability to perform emergency psychiatric triage using 112 clinical vignettes. The chatbots demonstrated high accuracy in identifying true emergencies, with an under…
RESEARCH · CL_07230 · Apr 28 · 08:00

AI models achieve 10x intelligence gains via Mixture of Experts and Transformer architectures

The Transformer architecture, introduced in the paper "Attention Is All You Need," revolutionized AI by enabling models to process information more efficiently. This innovation is key to understanding how models like Op…
FRONTIER RELEASE · CL_07150 · Apr 28 · 06:25

AI models demonstrate dominance, rewriting human achievement benchmarks

AI models have demonstrated a significant leap in performance, moving from failing exams two years ago to achieving dominance. This rapid advancement suggests that AI is not only mastering existing benchmarks but is als…
RESEARCH · CL_06681 · Apr 28 · 04:00

New N-Gram attack probes black-box LLMs for training data leakage

Researchers have developed a new membership inference attack called N-Gram Coverage Attack, which can be used on black-box language models like GPT-4 by only analyzing their text outputs. This method leverages the obser…
RESEARCH · CL_05815 · Apr 27 · 19:12

AI工具增加了自行辩护的法庭案件，给司法系统带来压力

一项新的研究论文表明，自2022年以来，美国联邦法院自行辩护的原告数量显著增加，这与生成式AI工具的广泛采用恰逢其时。该研究分析了数百万起法庭案件，认为像ChatGPT和Claude这样的AI降低了个人参与法律程序的门槛。据报道，这种自行辩护案件的激增，尤其是在原告一方，由于文件和动议的增加，导致法院工作量大幅上升。
RESEARCH · CL_05561 · Apr 27 · 14:03

Open-source AI agent surpasses Gemini and GPT-4 on TerminalBench 2.0

An open-source AI agent, developed in Turkey and named OSS Agent I, has achieved a 65.2% success rate on the TerminalBench 2.0 benchmark. This performance surpasses that of established models like Google's Gemini-3-flas…
FRONTIER RELEASE · CL_04875 · Apr 27 · 04:34

美团测试基于国内算力构建的万亿参数AI模型

据报道，美团已启动一项私有测试，该测试使用的是仅在中国计算基础设施上开发的万亿参数AI模型。据称，该模型在性能上可与GPT-4相媲美，并且很可能使用了华为昇腾硬件进行训练，绕过了英伟达组件。此举标志着美团在基础AI模型开发领域迈出了战略性一步。
RESEARCH · CL_06304 · Apr 26 · 16:49

新的RAG方法用于医学QA，结果喜忧参半，多模态方法在大规模上优于微调

研究人员开发了MED-VRAG，一个新颖的迭代多模态检索增强生成框架，该框架处理医学文档页面图像，包括表格和图形，而不仅仅是文本。该系统在四个医学QA基准测试中的平均准确率为78.6%，比基线高5.8个百分点，比MedRAG + GPT-4的比较高1.8个百分点。另外，一项在4B参数模型上比较领域微调与RAG在医学问答中的研究发现，微调带来了显著的6.8个百分点的准确率提升，而RAG未显示统计学上的显著改进。
FRONTIER RELEASE · CL_03573 · Apr 24 · 18:50

Deepseek V4 model rumored to achieve AGI capabilities

DeepSeek has reportedly released its V4 model, with claims of achieving AGI capabilities. The model is said to have surpassed GPT-4 on several benchmarks, including coding and reasoning tasks. This development suggests …
RESEARCH · CL_04970 · Apr 23 · 18:42

LLMs struggle to detect culturally specific health misinformation on YouTube

Two new research papers explore the limitations of Large Language Models (LLMs) in detecting culturally specific health misinformation, particularly concerning the promotion of cow urine as a remedy on YouTube in India.…
SIGNIFICANT · CL_17334 · Apr 17 · 16:51

Arm 推出首款完整 AI CPU，挑战芯片设计规范

Arm Holdings 宣布推出其首款完整量产芯片 Arm AGI CPU，专为 AI 数据中心工作负载设计，并由台积电 (TSMC) 采用 3nm 工艺制造。此举标志着 Arm 的重大转变，它将超越传统的 IP 授权模式，提供 turnkey 芯片解决方案，旨在为 Meta 和 OpenAI 等客户加速上市时间和降低成本。AGI CPU 预计将于 2026 年下半年上市，使 Arm 能够抓住快速增长的 AI 半导体市场的更多价值。