Brief · PulseAugur

RESEARCH · arXiv cs.CL English(EN) · 5d · [2 sources]

Check Your LLM's Secret Dictionary! Five Lines of Code Reveal What Your LLM Learned (Including What It Shouldn't Have)

Researchers have developed a method using singular value decomposition (SVD) of a large language model's weight matrix to reveal interpretable semantic subspaces. This technique, requiring minimal code and no model inference, can expose the composition and curation of a model's training data. The analysis of models like GPT-OSS-120B, Gemma-2-2B, and Qwen2.5-1.5B showed systematic differences in their learned subspaces, with Qwen exhibiting ethically inappropriate vocabulary. The study proposes this SVD analysis as a standard pre-release safety auditing step and suggests its use for tokenizer optimization and more controllable LLM design. AI

IMPACT Offers a novel, low-overhead method for auditing LLM training data and identifying potential ethical risks before deployment.

transformer
Gemma-2-2B
lm_head
GPT-OSS-120B
Qwen2.5-1.5B
singular value decomposition
PyTorch