The Information-Theoretic Benefit of Shared Representations under Orthogonality Constraints
Researchers have developed a theoretical framework demonstrating the benefits of shared representations in multi-task deep learning, particularly under orthogonality constraints. Their work establishes lower and upper bounds on description-lengths for separate versus joint approximation classes. By constructing a class of orthogonal functions using Rademacher-Haar wavelet series and Sawtooth-Walsh readouts, they show that joint approximation requires fewer bits when tasks share a latent hard feature, providing theoretical backing for compositional multi-output architectures. AI