ENSEMBITS: an alphabet of protein conformational ensembles
Researchers have developed Ensembits, a novel tokenizer designed to represent protein conformational ensembles, which capture dynamic movements and alternative states beyond static structures. This new method addresses challenges in encoding variable-sized ensembles and sparse dynamics data by using a Residual VQ-VAE with a frame distillation objective. Ensembits demonstrate superior performance in predicting protein dynamics and match or exceed static tokenizers on various prediction tasks, despite using less pretraining data, paving the way for incorporating dynamics into protein language modeling and design. AI
IMPACT Enables the incorporation of protein dynamics into language models, advancing protein design and analysis.