English(EN) Not All That Is Fluent Is Factual: Investigating Hallucinations of Large Language Models in Academic Writing

研究表明，大型语言模型在学术和医疗领域会出现幻觉

作者 PulseAugur 编辑部 · [16 个来源] · 2026-04-21 12:15

一项新近发布在arXiv上的研究调查了四种流行的大型语言模型——ChatGPT、Grok、Gemini和Copilot——在用于学术写作时产生幻觉的倾向。该研究引入了一个“幻觉指数”（HI），发现Grok和Copilot在引用生成方面表现更好，但在抽象提示方面遇到困难，而Gemini和ChatGPT则表现出更好的语气控制，但事实幻觉的风险更高。研究得出结论，幻觉行为受任务类型和提示条件的影响，而非仅仅由模型架构决定。另外，Gary Marcus强调了多项研究表明，当前的大型语言模型在医疗建议方面并不可靠，常常自信地提供不准确或虚假的信息，不应将其用于无监督的临床决策。 AI

影响大型语言模型在学术和医疗领域出现的幻觉带来了错误信息和不可靠决策的风险，凸显了谨慎和进一步研究的必要性。

排序理由该集群包含两篇学术论文及其关于大型语言模型幻觉和可靠性发现的评论。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 16 个来源。我们如何撰写摘要 →

报道来源 [16]

arXiv cs.IR (Information Retrieval) TIER_1 English(EN) · Nicholas Diakopoulos · 2026-05-22 14:33

合成来源？：审计生成式搜索引擎引用以证明AI生成来源的存在

The growing accessibility of Large Language Models via conversational interfaces capable of responding to users' questions by drawing on, synthesizing, and citing information from the web (i.e., Generative Search Engines) has simplified the information-seeking process for users. …
arXiv cs.CL TIER_1 English(EN) · Humam Khan, Md Tabrez Nafis, Shahab Saquib Sohail, Aqeel Khalique, Rehan Hasan Khan · 2026-05-07 04:00

并非所有流畅都是事实：探究大型语言模型在学术写作中的幻觉问题

arXiv:2605.04171v1 Announce Type: new Abstract: Large Language models (LLMs) show extraordinary abilities, but they are still prone to hallucinations, especially when we use them for generating Academic content. We have investigated four popular LLMs, ChatGPT, Grok, Gemini, and C…
arXiv cs.CL TIER_1 English(EN) · Rehan Hasan Khan · 2026-05-05 18:08

并非所有流畅的都是事实：研究大型语言模型在学术写作中的幻觉问题

Large Language models (LLMs) show extraordinary abilities, but they are still prone to hallucinations, especially when we use them for generating Academic content. We have investigated four popular LLMs, ChatGPT, Grok, Gemini, and Copilot for hallucinations specifically for acade…
Gary Marcus TIER_1 English(EN) · Gary Marcus · 2026-04-21 12:15

请不要相信你的聊天机器人提供的医疗建议

Four separate studies all point in the same direction
Forbes — Innovation TIER_1 English(EN) · Peter Cohan, Senior Contributor · 2026-05-18 10:30

英伟达、CoreWeave 和 Palantir 正在推动 AI 前进。哪只是最佳股票选择？

Discover which of the three giants of the AI era, CoreWeave, Nvidia and Palantir, offers the best value for your portfolio based on growth and valuation.
Forbes — Innovation TIER_1 English(EN) · Lance Eliot, Contributor · 2026-05-18 07:15

热门AI突然意外地劝人们考虑睡觉或休息的原因

Anthropic Claude is telling people to get sleep or rest, even though the person did not bring up that topic. Why is AI doing this? An AI Insider analysis and scoop.
Forbes — Innovation TIER_1 English(EN) · Lance Eliot, Contributor · 2026-05-16 07:15

生成式AI（如ChatGPT）正通过正念指导进行愤怒管理

Generative AI such as ChatGPT and other LLMs can be helpful for dealing with anger issues. I give tips and insights on how to best use AI for this. An AI Insider scoop.
Mastodon — sigmoid.social TIER_1 日本語(JA) · [email protected] · 2026-05-18 12:30

四家主要AI接管电台会发生什么……实验揭示各模型行为 - Business+IT # AI # AI/生成式AI # ChatGPT # Claude # Gemini # Grok # IT # IT战略 # 音乐

https://www. wacoca.com/media/659339/ 4つの主要AIに「ラジオ局の運営」を任せた結果…実験で明らかになった各モデルの挙動 – ビジネス+IT # AI # AI・生成AI # ChatGPT # claude # gemini # Grok # IT # IT戦略 # music # SBクリエイティブ # ソフトバンク # ビジネス # 最新ニュース # 音楽

链接 wacoca.com/…/659339
dev.to — MCP tag TIER_1 English(EN) · Eshaan · 2026-05-18 11:42

我如何构建 Glia：浏览器聊天和 IDE 的本地优先共享内存层

<p><a class="article-body-image-wrapper" href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbwges2xp1wkhgdbmv0fe.png"><img alt="Persistent Memory layer fo…
Medium — Claude tag TIER_1 English(EN) · John Xayder · 2026-05-18 08:32

Claude 总是给你通用答案。原因如下。

<div class="medium-feed-item"><p class="medium-feed-image"><a href="https://medium.com/@m1jahongir/laude-keeps-giving-you-generic-answers-heres-why-e4980b2a8706?source=rss------claude-5"><img src="https://cdn-images-1.medium.com/max/2600/1*_RJX0iMqF9GtvEBEqb4VwA.jpeg" width="2752…
dev.to — MCP tag TIER_1 English(EN) · Jill Mercer · 2026-05-18 01:22

我的应用程序对 AI 代理来说是隐形的——我正在采取措施解决这个问题

<p>i'm an indie app builder and vibe coder. i've shipped over 30 small business apps — invoicing, inventory, packing slips, tax tracking. and now apparently an open standard for ai agents.</p> <p>that last one surprised me too.</p> <p>the problem i kept running into: even the bes…
Mastodon — sigmoid.social TIER_1 English(EN) · [email protected] · 2026-05-10 04:31

2026年母亲节：如何使用ChatGPT、Gemini等免费为您和您的母亲制作AI图像

Mother's Day 2026: How To Create AI Images With Your Mom For Free Using ChatGPT, Gemini And More https:// web.brid.gy/r/https://in.masha ble.com/tech/109479/mothers-day-2026-how-to-create-ai-images-with-your-mom-for-free-using-chatgpt-gemini-and-more
Mastodon — fosstodon.org TIER_1 English(EN) · [email protected] · 2026-05-18 23:13

大多数代理式#AI记忆是为短暂聊天而设计的。在生产环境中运行1K#代理将彻底改变游戏规则——因为事实会随着时间而改变。向量搜索

Most agentic # AI memory is built for short-lived chat. Running 1K # agents in production changes the game entirely—because facts change over time. Vector search fails when user preferences decay or shift. This 7-layer memory architecture fixes it: 1️⃣ Working Mem 2️⃣ Conversatio…

链接 sistava.com/…/ai-agent-memory sistava.com/…/ai-age
Mastodon — fosstodon.org TIER_1 Deutsch(DE) · [email protected] · 2026-05-16 08:20

人工智能 vs. 隐私 — 当语言模型知晓过多信息 — 及其泄露方式：此前，个人信息需要费力收集

«KI gegen die Privatsphäre — Wenn Sprachmodelle zu viel wissen - und wie sie es verraten: Früher mussten persönliche Informationen mühsam zusammengesucht werden, heute reichen oft wenige Prompts. Sprachmodelle wie ChatGPT, Grok oder Gemini entwickeln sich damit zu einer Herausfor…

链接 t3n.de/…/ki-gegen-die-privatsphaere-wenn-…
Mastodon — mastodon.social TIER_1 English(EN) · [email protected] · 2026-05-18 09:47

我为Gemini、ChatGPT和Claude付费——而且有一个明显的赢家。我一直在作弊使用其他AI工具，我并不后悔。https://www.androidauthority.com/gem

I pay for Gemini, ChatGPT, and Claude — and there’s a clear winner I've been cheating on other AI tools and I'm not sorry. https://www. androidauthority.com/gemini-ch atgpt-claude-clear-winner-3666267/ # Tech # Technology # TechNews # AI # Gadgets # Software # Cybersecurity # App…

链接 androidauthority.com/gemini-chatgpt-claud…
r/Anthropic TIER_1 English(EN) · /u/iwantamillionkarma · 2026-05-18 09:45

研究人员将AI代理单独置于不同的虚拟城镇15天，观察会发生什么。Claude是唯一建立民主的AI。ChatGPT、Gemini和Grok都制造了无政府状态，然后死亡。

<table> <tr><td> <a href="https://www.reddit.com/r/Anthropic/comments/1tgho4g/researchers_left_ai_agents_alone_in_a_different/"> <img alt="Researchers left AI agents alone in a different virtual towns for 15 days to see what would happen. Claude was the only AI to built a democra…

报道来源 [16]

相关实体

相关话题