TBP-mHC: full expressivity for manifold-constrained hyper connections through transportation polytopes
Researchers have introduced Transportation Birkhoff Polytope (TBP) parameterizations as a novel method for constructing exactly doubly stochastic mixing matrices in hyper-connections. This approach offers full expressivity of the Birkhoff polytope with significantly reduced degrees of freedom compared to previous methods. TBP parameterizations have demonstrated competitive performance in language model pre-training, showing improved stability and scalability. AI
IMPACT Introduces a more stable and scalable method for training language models by improving hyper-connection expressivity.