MHA-RAG: Improving Efficiency, Accuracy, and Consistency by Encoding Exemplars as Soft Prompts
Researchers have developed MHA-RAG, a novel framework that encodes domain-specific examples as soft prompts rather than traditional text. This approach, utilizing Multi-Head Attention, aims to improve the efficiency and accuracy of adapting foundation models to new domains with limited data. Experiments show MHA-RAG achieves a 20-point performance gain over standard RAG while reducing inference costs by 10x, demonstrating superior accuracy and efficiency regardless of exemplar order. AI
IMPACT This method could significantly reduce the computational cost and improve the performance of fine-tuning large language models for specialized tasks.