Heretic is a command-line tool designed to "uncensor" language models, making them accessible to everyone. It utilizes directional ablation and Optuna-based TPE optimization to minimize refusal responses while preserving the original model's performance by limiting KL divergence. The tool supports a variety of dense, MoE, and multimodal models, and includes research features like bitsandbytes quantization and PaCMAP residual visualization. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Provides a tool for researchers and users to modify existing language models for reduced censorship and enhanced interpretability.
RANK_REASON Heretic is a command-line tool for modifying language models, not a new model release or a fundamental research paper.