Brief · PulseAugur

TOOL · arXiv cs.CL English(EN) · 1w

Weights to Code: Extracting Interpretable Algorithms from the Discrete Transformer

Researchers have developed a "Discrete Transformer" architecture designed to extract interpretable algorithms from trained models. This approach addresses the challenge of representation entanglement in standard Transformers, where overlapping features obscure symbolic expression recovery. By incorporating discreteness through temperature-annealed sampling, the Discrete Transformer facilitates the synthesis of human-readable programs, achieving performance comparable to existing methods on discrete tasks and extending extraction capabilities to tasks with continuous intermediate computations. The architecture also offers fine-grained control over synthesized programs, serving as a platform for algorithm extraction and Transformer interpretability research. AI

IMPACT Introduces a novel architecture for improving AI model interpretability and algorithm extraction.

Transformer
Yifan Zhang
Discrete Transformer