English(EN) Security Document Classification with a Fine-Tuned Local Large Language Model: Benchmark Data and an Open-Source System

本地LLM TorchSight在安全文档分类中达到95%的准确率

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-22 04:00

研究人员开发了TorchSight，一个使用微调的Qwen 3.5 27B大型语言模型的开源本地系统，用于分类安全文档。该系统在1000份文档的基准测试中达到了95.0%的准确率，显著优于得分在75.4%至79.9%之间的商业模型。微调的本地模型展示了在保持数据隐私的同时，准确识别各种安全类别和子类别中的敏感信息的能力。 AI

影响证明了微调的本地LLM在敏感数据分类方面可以媲美甚至超越商业模型，从而提高隐私性。

排序理由该集群包含一篇学术论文，详细介绍了用于安全文档分类的新型开源系统和基准数据。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Ivan Dobrovolskyi · 2026-05-22 04:00

使用微调的本地大型语言模型进行安全文档分类：基准数据与开源系统

arXiv:2605.20368v1 Announce Type: cross Abstract: Organizations that scan documents for sensitive information face a practical problem. Cloud services require data to be sent to external infrastructure, while rule-based tools often miss threats that depend on context. This study …

报道来源 [1]

使用微调的本地大型语言模型进行安全文档分类：基准数据与开源系统

相关实体

相关话题