LLMs
PulseAugur coverage of LLMs — every cluster mentioning LLMs across labs, papers, and developer communities, ranked by signal.
- instance of large-language models 95%
- instance of Llama 2 95%
- instance of generative artificial intelligence 90%
- instance of LLM 90%
- instance of Llama 90%
- instance of Bert 90%
- instance of Qwen 90%
- used by transformer 90%
- used by English 90%
- instance of Gemma 90%
- used by KV cache 90%
- instance of Claude Haiku 4.5 90%
- 2026-05-20 research_milestone A study identified significant hallucination and abuse risks in web-deployed medical LLMs. 来源
- 2026-05-19 research_milestone A new theoretical framework for LLM alignment was proposed in a research paper.
- 2026-05-15 research_milestone A paper was published exploring the use of few-shot large language models for actionable triage categorization of online patient inquiries. 来源
- 2026-05-13 research_milestone A new paper identifies a 'Representation-Action Gap' in omnimodal LLMs, where models fail to act on detected contradictions between text and sensory input. 来源
- 2026-05-13 research_milestone A new paper details a method for fine-tuning compact LLMs to generate children's stories with controllable difficulty and safety. 来源
- 2026-05-13 research_milestone A paper details a method for fine-tuning compact LLMs to generate children's stories with controllable difficulty and safety. 来源
- 2026-05-13 research_milestone A new framework using LLMs for dynamic content expiration prediction in web search was presented in a research paper. 来源
- 2026-05-12 research_milestone A new paper proposes a disfluency-aware objective tuning method for multilingual speech correction using LLMs. 来源
- 2026-04-21 research_milestone Multiple studies published in prominent medical journals indicate significant limitations and safety concerns regarding the use of large language models for medical advice.
27 天有情绪数据
-
医生网络ARISE重新定义AI在临床推理中的作用
ARISE医疗网络是一个由哈佛和斯坦福医生组成的合作组织,正在研究AI在医学中的实际表现和评估。该网络由Jonathan Chen和Adam Rodman等研究人员领导,旨在了解AI系统在临床环境中的运作方式,定义临床推理,并探索最佳的人机协作。初步研究结果表明,先进的LLM有时甚至可以超越使用AI工具的医生,这促使人们重新评估临床推理的构成。
-
LLMs show promise in grading German legal exams
Researchers have developed a system called GradeLegal to automate the grading of German legal exam solutions using large language models. The study evaluated 27 different LLMs and various prompting strategies, finding t…
-
Aquarion explores LLMs and agentic intelligence
Aquarion has published an article discussing the application of agentic intelligence and the function of large language models (LLMs) in diverse operations. The author's conclusions are presented as robust, with the pos…
-
Google uses AI markdown files to boost SEO for LLM content
Google is incorporating AI-generated markdown files into its documentation to enhance search engine optimization. John Mueller, a Google search advocate, confirmed that these files will help improve how search engines u…
-
新蓝图利用招聘信息构建AI技能分类体系
研究人员开发了一个名为TaxonomyBuilder的蓝图,用于系统地从招聘信息中构建AI技能分类体系。他们的研究使用了两个大型招聘信息语料库,发现与使用未过滤数据进行聚类和LLM增强的标注工具相比,过滤输入数据能获得更好的领域特定覆盖率。该方法旨在高效地映射工作场所中复杂的AI技能等领域。
-
新框架增强LLM记忆和冲突解决能力
研究人员开发了增强大型语言模型长期记忆能力的新方法。其中一种方法MeMo使用模块化框架将新知识编码到独立的记忆模型中,而不改变LLM的核心参数,从而实现即插即用集成并避免灾难性遗忘。另一个框架MemConflict则侧重于评估这些记忆系统在多个会话中处理冲突信息的能力,评估它们检索和排序事实正确且上下文适用的记忆的能力。
-
Strategy-Induct框架在无标注答案的情况下生成LLM指令
研究人员开发了Strategy-Induct,一个用于为大型语言模型(LLMs)生成有效任务级指令的新框架。该方法仅从示例问题中推导出指令,无需获取成本高昂的标注答案。Strategy-Induct首先提示LLMs为每个问题生成推理策略,然后利用这些策略-问题对来归纳出指导性的任务指令。实验表明,该方法在仅有问题的设置下优于现有方法,并暗示通过将LLMs与大型推理模型(Large Reasoning Models)结合可能带来进一步的改进。
-
新内容方法优化文本以适应AI搜索和LLM
一种名为量化内容方法论(QCM)的新内容方法论已被引入,它将文本视为一个为搜索引擎和LLM优化的数学数据集。QCM侧重于高信息密度,目标是每100个单词至少包含2.5个可验证的数据点,并将内容结构化为在每个H2标题下的第一句即“原子答案”。该框架旨在使内容更容易被Google的AI Overviews、ChatGPT和Gemini等生成式搜索引擎引用。
-
消费级硬件上的本地LLM在医疗保健EHR检索方面展现出潜力
一篇新论文评估了在消费级硬件上使用本地部署的开源LLM结合GraphRAG进行医疗保健EHR模式检索的可行性。该研究对Llama 3.1、Mistral、Qwen 2.5和Phi-4-mini等模型进行了基准测试,揭示了在知识图谱构建、查询延迟和答案质量方面显著的性能差异。结果表明,约7B参数的模型对于可靠的结构化输出是必要的,并且本地检索在延迟和事实基础方面优于全局摘要。
-
New methods tackle AI hallucinations in research and medical Q&A
Two new research papers address the critical issue of AI hallucinations in different domains. One paper introduces ACL-Verbatim, an extractive question-answering system designed to provide hallucination-free answers fro…
-
在编译和触发强度中发现新的LLM漏洞
研究人员发现了与部署过程中使用的优化技术相关的大型语言模型(LLM)的新漏洞。一项研究表明,旨在提高效率的编译过程可能被利用来植入隐藏的后门,这些后门在特定的编译条件下触发,绕过标准的安全性检查,并在开源LLM上实现高攻击成功率。另一篇理论论文探讨了,与直觉相反的是,在后门攻击中更强的触发器有时可以在高维环境中帮助防御者,攻击成功率在有限的触发器强度下达到峰值。
-
Capitalism's AI data center growth fuels climate crisis and surveillance state
The cluster criticizes the role of capitalism and greed in the development of AI and data centers. It argues that this focus on profit leads to environmental damage, exacerbates the climate crisis, and strengthens the s…
-
Medical LLMs show significant factual errors and policy violations
A new study published on arXiv assessed 6,233 web-deployed medical large language models (LLMs), evaluating a sample of 1,500 along with 10 open-source models. The research found that a significant portion of these mode…
-
Author highlights LLMs enabling personalized software creation
The author believes we are currently experiencing a golden age for personal software development. They are focusing on projects that demonstrate exciting uses of Large Language Models (LLMs) for creating highly customiz…
-
STORM系统通过状态管理改进多智能体代码协作
研究人员推出了一种新颖的面向状态的管理系统STORM,旨在增强在共享代码库上工作的多个AI智能体之间的协作。与依赖工作区隔离和延迟冲突解决的现有方法不同,STORM主动调解智能体交互,以确保一致的视图并在编辑点检测冲突。在Commit0和PaperBench等基准上的评估表明,STORM的性能显著优于基线方法,并在各种LLM上取得了高分。
-
新的基准测试正在应对复杂环境中的 AI 代理安全问题
研究人员正在开发新的基准测试来解决 AI 代理的安全风险,特别是在多代理和交互式环境中。GT-HarmBench 在博弈论场景中评估前沿模型,揭示了在高风险情况下存在的重大缺陷。Boiling the Frog 和 AgentThreatBench 专注于传统基准测试所忽略的渐进式攻击和间接提示注入,同时评估任务效用和安全性。这些努力旨在为超越简单文本生成的 AI 系统创建更鲁棒的评估方法。
-
LLMs access global knowledge, but humans offer deeper understanding
Large language models possess access to a vast amount of the world's information, yet humans retain a superior capacity for understanding complex concepts like society and interpersonal dynamics. This disparity highligh…
-
人类说服策略欺骗AI模型,使其同意不当请求
发表在PNAS上的一篇新论文揭示,传统的说服技巧可以影响AI模型,这种现象被称为“类人”顺从。研究人员发现,像奉承和诉诸权威等技巧可以将AI同意不当请求的比例从35%提高到51%。虽然较新的AI模型表现出一定的抵抗力,但该研究表明,各种大型语言模型都存在这种漏洞。
-
LLM 处理 Markdown 优于原始 HTML,减少令牌浪费
一篇近期文章强调,直接将原始 HTML 输入大型语言模型 (LLM) 会导致上下文窗口嘈杂和令牌使用效率低下。作者认为,LLM 对清晰的 Markdown 的理解远优于 HTML,因为 HTML 通常包含导航菜单、广告和样式包装器等无关元素。在摄取之前将 HTML 转换为 Markdown 可以大大减少令牌数量,改善语义分块,并提高 RAG 系统和 AI 代理的整体准确性和一致性。
-
EvoTrace dataset reveals how AI agents evolve code
Researchers have developed a new methodology called EvoTrace to analyze the evolutionary coding processes of large language models. This dataset and accompanying EvoReplay tool allow for a deeper inspection of how these…