Polite on the Surface, Wrong in Practice: A Curated Dataset for Fixing Honorific Failures in Multilingual Bangla Generation
Researchers have developed a new dataset and benchmarking framework called BLADE to address honorific failures in multilingual Bangla text generation. This dataset comprises over 4,000 curated interaction pairs designed to improve the cultural nuance and context-dependent communication of large language models. Fine-tuning models like DeepSeek-8B and LLaMA-3.2-3B on BLADE has shown significant improvements in structural fidelity and honorific alignment for low-resource languages. AI
IMPACT Enhances multilingual LLM capabilities by addressing cultural nuances and honorifics in low-resource languages like Bangla.