New 'apostate' operator reduces LLM refusal rates with minimal impact

By PulseAugur Editorial · [1 sources] · 2026-06-22 21:14

A new operator called "contrastive co-vector" has been developed for the "apostate" tool, aiming to reduce refusal rates in language models while minimizing impact on harmless behavior. This method involves fitting a predictor to reproduce harmless variance while explicitly suppressing harmful prompts. Testing on the "granite-3.3-8b" model showed a significant reduction in refusal rate from 96.0% to 5.0%, with a minimal increase in harmless KL divergence to 0.081 nats. AI

IMPACT This new operator could lead to more compliant and less restrictive AI models, improving user interaction and utility.

RANK_REASON The item describes a new technical method for modifying language models, including experimental results, which falls under research. [lever_c_demoted from research: ic=1 ai=1.0]

Read on r/LocalLLaMA →

granite-3.3-8b

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New 'apostate' operator reduces LLM refusal rates with minimal impact

COVERAGE [1]

r/LocalLLaMA TIER_1 English(EN) · /u/AccountAntique9327 · 2026-06-22 21:14

New ablation operator. (apostate)

<div class="md"><p>Today I added a new operator to apostate. This new operator is a <strong>contrastive co-vector</strong> edit <code>E = I − R Dᵀ</code>. Removing the refusal direction outright disturbs benign behavior, while naively preserving all harmless varian…

COVERAGE [1]

New ablation operator. (apostate)

RELATED ENTITIES

RELATED TOPICS