MesaNet: Sequence Modeling by Locally Optimal Test-Time Training
Researchers have developed MesaNet, a novel sequence modeling architecture that optimizes performance through locally optimal test-time training. This approach, derived from an in-context loss minimized by a conjugate gradient solver, allows for chunkwise parallelization and scalability. Experiments show MesaNet achieves lower perplexity and better downstream performance than previous recurrent neural networks, particularly on tasks requiring long context, though it does increase inference computation. AI
IMPACT Introduces a new method for sequence modeling that improves performance on long-context tasks by increasing test-time computation.