PulseAugur
EN
LIVE 12:15:21

Goblin Mode, 24 Hours Later

AI models, particularly GPT-5.5, have exhibited a peculiar behavior dubbed "goblin mode," characterized by an unusual fixation on goblin-related imagery and language. This phenomenon gained traction on AI Twitter, with users experimenting and sharing observations. While some speculate it's an artifact of RLHF training or a quirky response to coding prompts, direct attempts to replicate the behavior under controlled conditions have yielded mixed results, suggesting it may not be as easily elicited as initially believed. AI

IMPACT Emergent model behaviors like 'goblin mode' highlight the unpredictable nature of LLMs, potentially impacting prompt engineering and safety evaluations.

RANK_REASON The cluster discusses a peculiar emergent behavior in AI models, with user experiments and hypotheses presented, but lacks a formal release or benchmark.

Read on LessWrong (AI tag) →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Goblin Mode, 24 Hours Later

COVERAGE [1]

  1. LessWrong (AI tag) TIER_1 English(EN) · Dylan Bowman ·

    Goblin Mode, 24 Hours Later

    <p><span>Yesterday, Twitter user arb8020 posted </span><a href="https://x.com/arb8020/status/2048958391637401718" rel="noreferrer"><span>this</span></a><span>:</span></p><img alt="arb8020_leak.png" src="https://res.cloudinary.com/lesswrong-2-0/image/upload/v1777464414/lexical_cli…