LLaMA-3-8B-Instruct
PulseAugur coverage of LLaMA-3-8B-Instruct — every cluster mentioning LLaMA-3-8B-Instruct across labs, papers, and developer communities, ranked by signal.
-
New method uses model's own outputs for safety fine-tuning
Researchers have developed a novel method for safety fine-tuning language models by identifying and utilizing the most challenging prompts. This technique involves scoring prompts based on the frequency of harmful model…
-
The Measure of Deception: An Analysis of Data Forging in Machine Unlearning
Two new research papers explore vulnerabilities and detection methods in machine unlearning, a process designed to remove specific data from trained models for privacy compliance. One paper, "DurableUn," reveals that lo…
-
New attack redirects LLM attention to bypass safety alignment
Researchers have developed a new white-box adversarial attack called the Attention Redistribution Attack (ARA) that targets the internal attention mechanisms of safety-aligned large language models. This attack crafts n…
-
DPN-LE方法以最小的神经元干预精确编辑LLM个性
研究人员开发了DPN-LE,一种通过靶向特定神经元来编辑大型语言模型“个性”的新颖方法。现有技术通常通过修改过多神经元(其中许多是多功能的)来降低整体模型性能。DPN-LE通过对比MLP激活来识别特定于个性的神经元,并使用双重标准过滤方法来分离相关的神经元子集。该方法仅干预一小部分神经元,在保持通用能力的同时实现精确的个性控制。