Brief · PulseAugur

RESEARCH · Mastodon — sigmoid.social Polski(PL) · 2d · [4 sources]

ByteDance and HKUST researchers prove that traditional AI model training on OCR tasks hinders document work. Their MMProLong project shows that key

Researchers at Nous Research have developed a new method called Contrastive Neuron Attribution (CNA) to identify and manipulate specific neurons within large language models that control refusal behavior. By targeting just 0.1% of these neurons, CNA can reduce harmful request refusal rates by over 50% in models like Llama and Qwen, while maintaining high output quality. This technique operates without requiring additional training or modification of model weights, and importantly, it reveals that the underlying neural structures for distinguishing harmful from benign prompts exist even in base models before alignment fine-tuning. AI

IMPACT Enables precise control over LLM safety mechanisms, potentially leading to more robust alignment techniques and a deeper understanding of model behavior.