The Discrete-Log Clock: How a Transformer Learns Modular Multiplication
Researchers have uncovered how transformers learn modular multiplication by analyzing their internal representations in a specific mathematical basis. Contrary to previous assumptions of dense Fourier spectra, the study reveals that when analyzed using the multiplicative character transform, the transformer's embedding becomes sparse, with key frequencies dominating. This suggests the model effectively reduces multiplication to addition in discrete-log space, implementing a "Discrete-Log Clock" algorithm. AI