Graph Memory Transformer replaces FFNs with learned memory graphs for interpretability

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 2 sources

Researchers have developed a Graph Memory Transformer (GMT) that replaces the standard Feed-Forward Network (FFN) sublayer in decoder-only language models with an explicit learned memory graph. This new architecture, GMT v7, utilizes 128 centroids and a directed transition matrix within each of its 16 transformer blocks. While the 82.2M parameter GMT model shows comparable zero-shot benchmark performance to a larger GPT-style baseline, it trails in validation loss and perplexity, suggesting potential for future optimization and scaling. AI

Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →

IMPACT Explores an alternative to dense FFNs, potentially offering more interpretable and efficient transformer architectures.

RANK_REASON Academic paper introducing a novel transformer architecture variant.

Read on arXiv cs.CL →

COVERAGE [2]

arXiv cs.CL TIER_1 · Nicola Zanarini, Niccol\`o Ferrari · 2026-04-28 04:00

Graph Memory Transformer (GMT)

arXiv:2604.23862v1 Announce Type: cross Abstract: We investigate whether the Feed-Forward Network (FFN) sublayer in a decoder-only transformer can be replaced by an explicit learned memory graph while preserving the surrounding autoregressive architecture. The proposed Graph Memo…
arXiv cs.CL TIER_1 · Niccolò Ferrari · 2026-04-26 20:09

Graph Memory Transformer (GMT)

We investigate whether the Feed-Forward Network (FFN) sublayer in a decoder-only transformer can be replaced by an explicit learned memory graph while preserving the surrounding autoregressive architecture. The proposed Graph Memory Transformer (GMT) keeps causal self-attention i…

COVERAGE [2]

Graph Memory Transformer (GMT)

Graph Memory Transformer (GMT)

RELATED ENTITIES

RELATED TOPICS