Researchers have demonstrated that a two-layer transformer model is the minimum required to compute the PARITY task, which determines if a binary sequence has an even or odd number of ones. They proved that a one-layer transformer cannot solve this problem due to its slower growth in average sensitivity compared to PARITY. Additionally, a new construction shows PARITY can be computed by a four-layer transformer without impractical assumptions like length-dependent positional encoding or hardmax, and is compatible with causal masking. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Provides theoretical insights into the computational capabilities and limitations of transformer architectures.
RANK_REASON Academic paper detailing theoretical limitations and new constructions for transformer models on a specific computational task. [lever_c_demoted from research: ic=1 ai=1.0]