Retrieval and competition: how a protein foundation model starts a protein
A new research paper explores how protein foundation models, specifically ESM2-8M, make predictions about protein sequences. The study reveals that the model does not directly recognize biological evidence for common rules like the starting amino acid methionine. Instead, it relies on retrieving a statistical default signal from a reference representation, even when biological reality diverges. This suggests that the model's confidence in its predictions may not accurately reflect its understanding of underlying biological mechanisms, highlighting challenges in verifying complex biological predictions. AI
IMPACT Reveals limitations in protein foundation models' ability to distinguish statistical defaults from biological evidence, impacting reliable prediction.