Researchers have uncovered how transformers learn modular multiplication by analyzing their internal representations in a specific mathematical basis. Contrary to previous assumptions of dense Fourier spectra, the study reveals that when analyzed using the multiplicative character transform, the transformer's embedding becomes sparse, with key frequencies dominating. This suggests the model effectively reduces multiplication to addition in discrete-log space, implementing a "Discrete-Log Clock" algorithm. AI
RANK_REASON Research paper published on arXiv detailing a novel finding about transformer interpretability. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →