Researchers have developed a novel method called Convergent-Divergent Routing to steer large language models towards specific ethical frameworks at inference time, while maintaining general capabilities. This technique involves identifying and modifying critical pathways within transformer blocks that influence ethical reasoning, allowing for calibrated control over moral decision-making. Separately, a new dataset named TF1-EN-3M has been created, comprising three million synthetic moral fables generated by smaller language models, designed to train and evaluate open-source models on ethical storytelling and value alignment. AI
Summary written by gemini-2.5-flash-lite from 3 sources. How we write summaries →
IMPACT New methods and datasets emerge for improving ethical reasoning and value alignment in smaller, open-source language models.
RANK_REASON Two research papers are presented, one detailing a method for controlling LLM moral reasoning and another introducing a dataset for training LLMs on moral fables.