PulseAugur
EN
LIVE 07:59:33

Transformer sample complexity tightly characterized in new paper

Researchers have precisely defined the VC dimension for depth-L Transformers with W parameters, establishing an upper bound of O(LW log(TW)) and a nearly matching lower bound. They also characterized the sample complexity for chain-of-thought learning with these Transformers, showing that teacher forcing can learn with O(LW log((T+T')W)) samples. Any learning rule utilizing chain-of-thought data will require at least \Omega(LW log((T+T')W/L)) examples. AI

IMPACT Provides theoretical bounds on Transformer sample complexity, informing future model design and training efficiency.

RANK_REASON Academic paper detailing theoretical properties of a model architecture. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.LG →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

  1. arXiv cs.LG TIER_1 English(EN) · Chenxiao Yang, Nathan Srebro, Zhiyuan Li ·

    Tight Sample Complexity of Transformers

    arXiv:2606.09731v1 Announce Type: new Abstract: We tightly characterize the VC dimension of depth-$L$ Transformers with a total of $W$ parameters, mapping an input sequence of length $T$ to a single output, establishing an upper bound of $O(L W \log (T W))$ and a nearly matching …

  2. arXiv cs.LG TIER_1 English(EN) · Zhiyuan Li ·

    Tight Sample Complexity of Transformers

    We tightly characterize the VC dimension of depth-$L$ Transformers with a total of $W$ parameters, mapping an input sequence of length $T$ to a single output, establishing an upper bound of $O(L W \log (T W))$ and a nearly matching lower bound of $Ω(L W \log (T W / L))$. We furth…