English(EN) Beyond Gradient-Based Attacks: Adversarial Robustness and Explainability Stability in Cybersecurity Classifiers

新指标衡量AI安全分类器解释在攻击下的退化程度

作者 PulseAugur 编辑部 · [1 个来源] · 2026-07-03 04:00

一项新的研究论文引入了可解释性稳定性指数（ESI），用于衡量对抗性攻击如何影响网络安全分类器的解释。该研究将先前的工作扩展到随机森林和XGBoost模型，并使用了四个表格安全数据集，发现预测鲁棒性和解释稳定性是不同的指标。研究强调，一些攻击虽然对基于梯度的方法表现出鲁棒性，但仍可能显著破坏模型解释的稳定性，这表明需要同时衡量鲁棒性和稳定性。 AI

影响引入了一个新的指标来评估AI安全分类器的可信度，这对于理解模型行为超越简单准确性至关重要。

排序理由学术论文，详细介绍了新指标和实验结果。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Mona Rajhans, Vishal Khawarey · 2026-07-03 04:00

Beyond Gradient-Based Attacks: Adversarial Robustness and Explainability Stability in Cybersecurity Classifiers

arXiv:2607.01679v1 Announce Type: cross Abstract: Adversarial attacks on cybersecurity classifiers pose a dual threat: degrading predictions and destabilising the SHAP-based explanations that security analysts rely on to understand and triage alerts. We extend our prior MLP confe…

报道来源 [1]

Beyond Gradient-Based Attacks: Adversarial Robustness and Explainability Stability in Cybersecurity Classifiers

相关实体

相关话题