VFUSE: Virulent Feature Understanding with Sparse autoEncoders
Researchers have developed VFUSE, a new method using sparse autoencoders to interpret generative models for protein design. This approach audits models like RoseTTAFold3 and RFDiffusion3 for potentially hazardous features. VFUSE's analysis in the latent space of these models improved the detection of dangerous protein designs, identifying specific features that activate only for hazardous outputs with high accuracy. AI
IMPACT Provides a new tool for ensuring safety and interpretability in generative AI for scientific applications like protein design.