Alignment-Aware Decoding
Researchers have developed a new inference-time technique called alignment-aware decoding (AAD) to improve the alignment of large language models. AAD operates without requiring additional training beyond standard preference optimization setups, such as Direct Preference Optimization (DPO). Empirical results show AAD consistently surpasses existing baselines on various alignment benchmarks and across different model sizes. Furthermore, AAD can generate high-quality synthetic data for alignment tasks when labeled data is scarce. AI
IMPACT This method could improve LLM safety and performance by enhancing alignment at inference time, potentially reducing the need for extensive fine-tuning.