Heretic is a command-line tool designed to "uncensor" language models, making them accessible to everyone. It utilizes directional ablation and Optuna-based TPE optimization to minimize refusal responses while preserving the original model's performance by limiting KL divergence. The tool supports a variety of dense, MoE, and multimodal models, and includes research features like bitsandbytes quantization and PaCMAP residual visualization. AI
影响 Provides a tool for researchers and users to modify existing language models for reduced censorship and enhanced interpretability.
排序理由 Heretic is a command-line tool for modifying language models, not a new model release or a fundamental research paper.
在 Mastodon — sigmoid.social 阅读 →
AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →