PulseAugur
EN
LIVE 12:07:59
中文(ZH) Google AMS 模型 Activation 掃描實測

Google's AMS tool finds critical safety flaws in three tested LLMs

Google Cloud has open-sourced AMS (Activation Model Scanner), a tool that analyzes the geometric structure of a model's activation space to verify safety training. Unlike traditional behavioral tests, AMS directly inspects the model's weights for evidence of safety alignment. Initial tests on three open-source models (TinyLlama, distilgpt2, and Qwen2.5-0.5B) all resulted in a 'CRITICAL' rating, indicating a lack of effective safety training or significant deviations from safety benchmarks. AI

IMPACT This tool offers a novel, weight-level approach to LLM safety verification, potentially improving supply chain security and CI/CD pipelines for AI models.

RANK_REASON The cluster describes the release and practical application of a new open-source tool for evaluating LLM safety, including experimental results.

Read on dev.to — LLM tag →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

Google's AMS tool finds critical safety flaws in three tested LLMs

COVERAGE [2]

  1. dev.to — LLM tag TIER_1 中文(ZH) · JH5 ·

    Google AMS Model Activation Scan Practical Test

    <h1> AMS 模型 Activation 掃描實測:從 Weights 層面驗證安全性,三個小模型全部 CRITICAL </h1> <p>Google Cloud 在 2026 年 4 月底開源了 AMS(Activation Model Scanner),它用的不是傳統的行為測試,而是直接量測模型 activation space 的幾何結構,確認 safety training 是否真的在 weights 層面留下了痕跡。我們用三個大小不同的開源模型做了快速掃描,結果是:三個全部 CRITICAL,分數從 0.37 到 1.82 不等,沒有一個…

  2. dev.to — LLM tag TIER_1 中文(ZH) · JH5 ·

    Google AMS Model Activation Scan Practical Test

    <h1> AMS 模型 Activation 掃描實測:從 Weights 層面驗證安全性,三個小模型全部 CRITICAL </h1> <p>Google Cloud 在 2026 年 4 月底開源了 AMS(Activation Model Scanner),它用的不是傳統的行為測試,而是直接量測模型 activation space 的幾何結構,確認 safety training 是否真的在 weights 層面留下了痕跡。我們用三個大小不同的開源模型做了快速掃描,結果是:三個全部 CRITICAL,分數從 0.37 到 1.82 不等,沒有一個…