FLiP: Towards understanding and interpreting multimodal multilingual sentence embeddings
Researchers have developed factorized linear projection (FLiP) models to analyze and interpret sentence embedding spaces. These FLiP models are capable of recalling over 75% of lexical content from embeddings generated by multilingual, multimodal, and API-based models like LaBSE, SONAR, and Gemini. This technique allows for the identification of modality and language biases within these encoders, offering insights without traditional downstream evaluations. AI
IMPACT Provides a new diagnostic tool for understanding biases in multimodal and multilingual sentence encoders.