Transformer sample complexity tightly characterized in new paper

By PulseAugur Editorial · [2 sources] · 2026-06-08 16:56

Researchers have precisely defined the VC dimension for depth-L Transformers with W parameters, establishing an upper bound of O(LW log(TW)) and a nearly matching lower bound. They also characterized the sample complexity for chain-of-thought learning with these Transformers, showing that teacher forcing can learn with O(LW log((T+T')W)) samples. Any learning rule utilizing chain-of-thought data will require at least \Omega(LW log((T+T')W/L)) examples. AI

IMPACT Provides theoretical bounds on Transformer sample complexity, informing future model design and training efficiency.

RANK_REASON Academic paper detailing theoretical properties of a model architecture. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.LG →

paper
other

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

arXiv cs.LG TIER_1 English(EN) · Chenxiao Yang, Nathan Srebro, Zhiyuan Li · 2026-06-09 04:00

Tight Sample Complexity of Transformers

arXiv:2606.09731v1 Announce Type: new Abstract: We tightly characterize the VC dimension of depth-$L$ Transformers with a total of $W$ parameters, mapping an input sequence of length $T$ to a single output, establishing an upper bound of $O(L W \log (T W))$ and a nearly matching …
arXiv cs.LG TIER_1 English(EN) · Zhiyuan Li · 2026-06-08 16:56

Tight Sample Complexity of Transformers

We tightly characterize the VC dimension of depth-$L$ Transformers with a total of $W$ parameters, mapping an input sequence of length $T$ to a single output, establishing an upper bound of $O(L W \log (T W))$ and a nearly matching lower bound of $Ω(L W \log (T W / L))$. We furth…

COVERAGE [2]

Tight Sample Complexity of Transformers

Tight Sample Complexity of Transformers

RELATED ENTITIES

RELATED TOPICS