LLMs Hallucinate in Academic and Medical Contexts, Studies Show
作者PulseAugur 编辑部·[16 个来源]·
A new study published on arXiv investigated the hallucination tendencies of four popular LLMs—ChatGPT, Grok, Gemini, and Copilot—when used for academic writing. The research introduced a "Hallucination Index" (HI) and found that Grok and Copilot performed better in reference generation but struggled with abstract prompts, while Gemini and ChatGPT showed better tone control but higher factual hallucination risks. The study concluded that hallucination behavior is influenced by task type and prompting conditions, not solely by model architecture. Separately, Gary Marcus highlighted multiple studies indicating that current LLMs are unreliable for medical advice, often providing inaccurate or fabricated information with high confidence, and should not be used for unsupervised clinical decision-making.
AI
影响
LLM hallucinations in academic and medical contexts pose risks of misinformation and unreliable decision-making, highlighting the need for caution and further research.
排序理由
The cluster contains two academic papers and commentary on their findings regarding LLM hallucinations and reliability.
The growing accessibility of Large Language Models via conversational interfaces capable of responding to users' questions by drawing on, synthesizing, and citing information from the web (i.e., Generative Search Engines) has simplified the information-seeking process for users. …
arXiv cs.CL
TIER_1English(EN)·Humam Khan, Md Tabrez Nafis, Shahab Saquib Sohail, Aqeel Khalique, Rehan Hasan Khan·
arXiv:2605.04171v1 Announce Type: new Abstract: Large Language models (LLMs) show extraordinary abilities, but they are still prone to hallucinations, especially when we use them for generating Academic content. We have investigated four popular LLMs, ChatGPT, Grok, Gemini, and C…
Large Language models (LLMs) show extraordinary abilities, but they are still prone to hallucinations, especially when we use them for generating Academic content. We have investigated four popular LLMs, ChatGPT, Grok, Gemini, and Copilot for hallucinations specifically for acade…
Discover which of the three giants of the AI era, CoreWeave, Nvidia and Palantir, offers the best value for your portfolio based on growth and valuation.
Anthropic Claude is telling people to get sleep or rest, even though the person did not bring up that topic. Why is AI doing this? An AI Insider analysis and scoop.
Generative AI such as ChatGPT and other LLMs can be helpful for dealing with anger issues. I give tips and insights on how to best use AI for this. An AI Insider scoop.
<p>i'm an indie app builder and vibe coder. i've shipped over 30 small business apps — invoicing, inventory, packing slips, tax tracking. and now apparently an open standard for ai agents.</p> <p>that last one surprised me too.</p> <p>the problem i kept running into: even the bes…
Mother's Day 2026: How To Create AI Images With Your Mom For Free Using ChatGPT, Gemini And More https:// web.brid.gy/r/https://in.masha ble.com/tech/109479/mothers-day-2026-how-to-create-ai-images-with-your-mom-for-free-using-chatgpt-gemini-and-more
Most agentic # AI memory is built for short-lived chat. Running 1K # agents in production changes the game entirely—because facts change over time. Vector search fails when user preferences decay or shift. This 7-layer memory architecture fixes it: 1️⃣ Working Mem 2️⃣ Conversatio…
«KI gegen die Privatsphäre — Wenn Sprachmodelle zu viel wissen - und wie sie es verraten: Früher mussten persönliche Informationen mühsam zusammengesucht werden, heute reichen oft wenige Prompts. Sprachmodelle wie ChatGPT, Grok oder Gemini entwickeln sich damit zu einer Herausfor…
I pay for Gemini, ChatGPT, and Claude — and there’s a clear winner I've been cheating on other AI tools and I'm not sorry. https://www. androidauthority.com/gemini-ch atgpt-claude-clear-winner-3666267/ # Tech # Technology # TechNews # AI # Gadgets # Software # Cybersecurity # App…
<table> <tr><td> <a href="https://www.reddit.com/r/Anthropic/comments/1tgho4g/researchers_left_ai_agents_alone_in_a_different/"> <img alt="Researchers left AI agents alone in a different virtual towns for 15 days to see what would happen. Claude was the only AI to built a democra…