PulseAugur
EN
LIVE 21:16:03

Researchers Tighten Sample Complexity Bounds for Transformers

Researchers have precisely defined the VC dimension for depth-L Transformers with W parameters, establishing an upper bound of O(LW log(TW)) and a nearly matching lower bound. The study also characterizes the sample complexity for chain-of-thought learning with these Transformers, showing teacher forcing achieves O(LW log((T+T')W)) complexity. Any learning rule utilizing chain-of-thought data requires at least \Omega(LW log((T+T')W/L)) examples. AI

IMPACT Provides theoretical bounds on Transformer learning, potentially guiding future model design and efficiency.

RANK_REASON The cluster contains an academic paper detailing theoretical research on the sample complexity of Transformers.

Read on arXiv cs.LG →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

  1. arXiv cs.LG TIER_1 English(EN) · Chenxiao Yang, Nathan Srebro, Zhiyuan Li ·

    Tight Sample Complexity of Transformers

    arXiv:2606.09731v1 Announce Type: new Abstract: We tightly characterize the VC dimension of depth-$L$ Transformers with a total of $W$ parameters, mapping an input sequence of length $T$ to a single output, establishing an upper bound of $O(L W \log (T W))$ and a nearly matching …

  2. arXiv cs.LG TIER_1 English(EN) · Zhiyuan Li ·

    Tight Sample Complexity of Transformers

    We tightly characterize the VC dimension of depth-$L$ Transformers with a total of $W$ parameters, mapping an input sequence of length $T$ to a single output, establishing an upper bound of $O(L W \log (T W))$ and a nearly matching lower bound of $Ω(L W \log (T W / L))$. We furth…