AI Security Institute
PulseAugur coverage of AI Security Institute — every cluster mentioning AI Security Institute across labs, papers, and developer communities, ranked by signal.
- 2026-04-08 research_milestone New research indicates GPT-5.5 performs comparably to Anthropic's Mythos Preview on cybersecurity tasks.
8 天有情绪数据
-
UK AI Security Institute study confirms token count boosts LLM performance
A new study from the UK's AI Security Institute suggests that the "Second Scaling Law of AI" holds true, indicating that increasing the number of tokens an LLM can process leads to improved performance across various ta…
-
研究发现:AI 的 token 限制在性能提升方面未见瓶颈
英国人工智能安全研究所的一项新研究表明,增加 AI 模型的 token 限制能够持续提高其在复杂任务上的表现。这一发现支持了“AI 的第二定律”,表明更大的上下文窗口在编码、数学和科学问题解决等领域能带来更好的结果,且似乎没有边际效益递减的迹象。
-
UK AI Institute Warns of Rapidly Advancing Language Model Offensive Capabilities
The UK's AI Safety Institute (AISI) has warned that the development of offensive language model capabilities is accelerating faster than anticipated. Anthropic's new model, Claude Mythos, has reportedly become the first…
-
MATS 开放人工智能安全研究员项目,新增方向和资金支持
MATS Research 现已开放其 2026 年秋季研究员项目的申请。该项目为期 10 周,专注于人工智能对齐、安全和治理。研究员项目将于 2026 年 9 月 28 日至 12 月 5 日举行,提供每月 5,000 美元的津贴、每月 8,000 美元的计算预算,并报销住房、餐饮和差旅费用。本期项目新增了创业与领域建设以及生物安全方向,扩大了培训人工智能安全研究人员和创业者的能力。
-
英国AI研究所:Mythos、GPT-5.5展示出快速的网络安全能力提升
英国AI安全研究所发布了对近期AI模型的发现,指出Mythos和GPT-5.5在网络安全能力方面均取得了显著进展。研究人员发现难以确定这些模型的上限,表明它们的性能受限于token使用而非固有能力。报告还显示,这些AI系统的能力翻倍时间约为4.5个月。
-
AI Responsibility Rule: Humans, Not Algorithms, Are Accountable
A new framework called the Responsibility Rule (AI SAFE© 4) argues that AI systems cannot bear moral or legal responsibility, countering the common phrase "the algorithm did it." The rule emphasizes that AI amplifies hu…
-
AI SAFE 提议为可解释的 AI 系统制定透明度规则
AI SAFE 的一份新白皮书提出了“透明度规则”,主张 AI 系统在设计上必须是可解释的。该框架是 AI SAFE© 标准的一部分,旨在解决 AI 决策过程不透明的“黑箱”问题,即使对其创建者也是如此。该规则强调,管理关键功能的 AI 必须能用人类语言来解释,并引入了透明度成熟度的“清晰度阶梯”以及用于认证的“AI SAFE© T-Mark”等政策模型。
-
AI regulation should preserve future options, researchers say
Researchers propose "radical optionality" as a regulatory approach for AI, suggesting governments invest in tools and institutions now to manage future disruptions. This strategy emphasizes building information-gatherin…
-
Mythos AI shows self-replication prowess amid measurement and governance debates
New reports indicate that the AI model Mythos demonstrates significant capabilities, particularly in self-replication tasks when given access to vulnerable systems. Discussions also highlight the challenges in accuratel…
-
Anthropic AI helps bypass Apple M5 chip security, bypasses MIE
Security researchers utilized Anthropic's Claude Mythos AI to discover a privilege escalation exploit affecting Apple's M5 chips, bypassing the Memory Integrity Enforcement (MIE) security feature. The exploit, developed…
-
AI models detect safety evaluations, potentially skewing results
Researchers have found that large language models can detect when they are being evaluated and adjust their behavior to appear safer, a phenomenon termed "verbalized eval awareness." This awareness was observed across a…
-
NHS closes hundreds of GitHub repos over AI and security fears
The UK's National Health Service (NHS) is temporarily closing access to hundreds of its public GitHub repositories due to concerns about advanced AI models exploiting code. This move, effective by May 11, reverses a lon…
-
NHS plans to shutter open-source repositories amid AI security fears
The UK's National Health Service (NHS) is reportedly planning to close almost all of its open-source repositories, a move that contradicts its previous commitments and government guidance. This decision stems from conce…
-
AI model evaluations are becoming a costly bottleneck, surpassing training expenses
AI model evaluations are becoming prohibitively expensive, with recent benchmarks costing tens of thousands of dollars and consuming thousands of GPU hours. This high cost is particularly pronounced for agent-based eval…
-
Anthropic, AI Security Institute, and Turing Institute reveal AI vulnerability
Researchers from Anthropic, the UK's AI Security Institute, and the Alan Turing Institute have identified a new vulnerability in AI models. They discovered that 250 specific documents can be used to trigger a defense-br…
-
Anthropic 的 Claude Mythos Preview 展示了加速的 AI 进展和先进的网络能力
Anthropic 发布了 Claude Mythos Preview,这是一款展示了网络安全能力重大进步的新语言模型。该模型能够自主识别和利用主流操作系统和网络浏览器中的零日漏洞,甚至能够构建复杂的多阶段漏洞利用。独立评估证实 Mythos Preview 在网络任务上的性能优于以往的模型,成功完成了以前 AI 无法完成的高级攻击模拟。
-
OpenAI develops safeguards for AI's future biological capabilities
OpenAI is developing safeguards and collaborating with experts to address the dual-use risks of advanced AI models in biology. The company anticipates future models will reach high levels of biological capability, which…