Security Document Classification with a Fine-Tuned Local Large Language Model: Benchmark Data and an Open-Source System
Researchers have developed TorchSight, an open-source local system for classifying security documents using a fine-tuned Qwen 3.5 27B large language model. This system achieved 95.0% accuracy on a benchmark of 1,000 documents, significantly outperforming commercial models which scored between 75.4% and 79.9%. The fine-tuned local model demonstrates the capability to maintain data privacy while accurately identifying sensitive information across various security categories and subcategories. AI
IMPACT Demonstrates that fine-tuned local LLMs can match or exceed commercial models for sensitive data classification, enabling better privacy.