Protein language models' allergen explanations lack biological grounding

By PulseAugur Editorial · [1 sources] · 2026-06-20 18:25

A new study published on arXiv questions the biological relevance of explanations provided by protein language models used in allergenicity classification. While models like ESM-2 and DeepPlantAllergy demonstrate strong protein-level prediction accuracy, their residue-level attribution signals do not significantly align with annotated allergen epitopes. The research suggests these models may rely on general sequence features rather than specific immunological mechanisms, cautioning against interpreting their explanations as direct immunological insights for safety screening or hypoallergen design without rigorous validation. AI

IMPACT Challenges the interpretability of protein language models for safety-critical applications like allergen screening.

RANK_REASON Research paper published on arXiv detailing limitations of AI models. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.LG →

paper
safety

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Protein language models' allergen explanations lack biological grounding

COVERAGE [1]

arXiv cs.LG TIER_1 English(EN) · Damir Zhakparov · 2026-06-20 18:25

Residue-Level Attributions in Protein Language Models Do Not Recover Allergen Epitopes

Deep allergenicity classifiers are increasingly used in safety screening of novel foods, and recent protein language models have substantially improved protein-level allergenicity prediction. However, whether their explanations capture biologically meaningful information remains …

COVERAGE [1]

Residue-Level Attributions in Protein Language Models Do Not Recover Allergen Epitopes

RELATED ENTITIES

RELATED TOPICS