中文(ZH) Google AMS 模型 Activation 掃描實測

Google's AMS tool finds critical safety flaws in three tested LLMs

By PulseAugur Editorial · [2 sources] · 2026-06-13 04:05

Google Cloud has open-sourced AMS (Activation Model Scanner), a tool that analyzes the geometric structure of a model's activation space to verify safety training. Unlike traditional behavioral tests, AMS directly inspects the model's weights for evidence of safety alignment. Initial tests on three open-source models (TinyLlama, distilgpt2, and Qwen2.5-0.5B) all resulted in a 'CRITICAL' rating, indicating a lack of effective safety training or significant deviations from safety benchmarks. AI

IMPACT This tool offers a novel, weight-level approach to LLM safety verification, potentially improving supply chain security and CI/CD pipelines for AI models.

RANK_REASON The cluster describes the release and practical application of a new open-source tool for evaluating LLM safety, including experimental results.

Read on dev.to — LLM tag →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

Google's AMS tool finds critical safety flaws in three tested LLMs

COVERAGE [2]

dev.to — LLM tag TIER_1 中文(ZH) · JH5 · 2026-06-13 06:18

Google AMS Model Activation Scan Practical Test

<h1> AMS 模型 Activation 掃描實測：從 Weights 層面驗證安全性，三個小模型全部 CRITICAL </h1> <p>Google Cloud 在 2026 年 4 月底開源了 AMS（Activation Model Scanner），它用的不是傳統的行為測試，而是直接量測模型 activation space 的幾何結構，確認 safety training 是否真的在 weights 層面留下了痕跡。我們用三個大小不同的開源模型做了快速掃描，結果是：三個全部 CRITICAL，分數從 0.37 到 1.82 不等，沒有一個…
dev.to — LLM tag TIER_1 中文(ZH) · JH5 · 2026-06-13 04:05

Google AMS Model Activation Scan Practical Test

<h1> AMS 模型 Activation 掃描實測：從 Weights 層面驗證安全性，三個小模型全部 CRITICAL </h1> <p>Google Cloud 在 2026 年 4 月底開源了 AMS（Activation Model Scanner），它用的不是傳統的行為測試，而是直接量測模型 activation space 的幾何結構，確認 safety training 是否真的在 weights 層面留下了痕跡。我們用三個大小不同的開源模型做了快速掃描，結果是：三個全部 CRITICAL，分數從 0.37 到 1.82 不等，沒有一個…

COVERAGE [2]

Google AMS Model Activation Scan Practical Test

Google AMS Model Activation Scan Practical Test

RELATED ENTITIES

RELATED TOPICS