PulseAugur
LIVE 07:54:35
research · [2 sources] ·
0
research

Graph Memory Transformer replaces FFNs with learned memory graphs for interpretability

Researchers have developed a Graph Memory Transformer (GMT) that replaces the standard Feed-Forward Network (FFN) sublayer in decoder-only language models with an explicit learned memory graph. This new architecture, GMT v7, utilizes 128 centroids and a directed transition matrix within each of its 16 transformer blocks. While the 82.2M parameter GMT model shows comparable zero-shot benchmark performance to a larger GPT-style baseline, it trails in validation loss and perplexity, suggesting potential for future optimization and scaling. AI

Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →

IMPACT Explores an alternative to dense FFNs, potentially offering more interpretable and efficient transformer architectures.

RANK_REASON Academic paper introducing a novel transformer architecture variant.

Read on arXiv cs.CL →

COVERAGE [2]

  1. arXiv cs.CL TIER_1 · Nicola Zanarini, Niccol\`o Ferrari ·

    Graph Memory Transformer (GMT)

    arXiv:2604.23862v1 Announce Type: cross Abstract: We investigate whether the Feed-Forward Network (FFN) sublayer in a decoder-only transformer can be replaced by an explicit learned memory graph while preserving the surrounding autoregressive architecture. The proposed Graph Memo…

  2. arXiv cs.CL TIER_1 · Niccolò Ferrari ·

    Graph Memory Transformer (GMT)

    We investigate whether the Feed-Forward Network (FFN) sublayer in a decoder-only transformer can be replaced by an explicit learned memory graph while preserving the surrounding autoregressive architecture. The proposed Graph Memory Transformer (GMT) keeps causal self-attention i…