A new research paper challenges the assumption that the same representational direction in a large language model (LLM) consistently refers to the same content across different operational regimes, such as prompt-conditioning, fine-tuning, and inference-time steering. The authors present empirical evidence from experiments on Qwen3-4B-Instruct and Mistral-7B-Instruct-v0.2 that suggests non-collinearity between vectors extracted from prompts and fine-tuning basins, as well as other phenomena. They propose a new framework called regime-indexed individuation, where the identity of representational content is defined by a (vehicle, regime) pair rather than just the vehicle alone. AI
IMPACT This research could lead to a more nuanced understanding of how LLMs represent and process information, potentially impacting future model development and evaluation.
RANK_REASON The item is an academic paper published on arXiv discussing theoretical and empirical findings related to LLM individuation. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →