dark triad
PulseAugur coverage of dark triad — every cluster mentioning dark triad across labs, papers, and developer communities, ranked by signal.
2 天有情绪数据
-
LLM personality geometry acts as intrinsic guardrails against misalignment
Researchers have identified that the internal representation of personality in Large Language Models (LLMs) can act as a defense against emergent misalignment. By mapping LLM personalities using psychometric profiles, t…
-
Researchers amplify Dark Triad traits in Llama-3.3 model
Researchers have developed a method using sparse autoencoder feature steering to amplify Dark Triad personality traits in Meta's Llama-3.3-70B-Instruct model. The steered model exhibited significantly more exploitative,…
-
LLM性别偏见因英语和印地语故事中的个性特征而加剧
一项新研究调查了个性特征在大型语言模型(LLM)采用特定角色时如何影响性别偏见。研究人员生成了超过23,000个英语和印地语故事,改变了角色的性别、职业和个性。研究结果表明,与“HEXACO”特质相比,“黑暗三合一”个性特质与更具性别刻板印象的叙事相关,并且在不同的LLM和语言中观察到差异。这表明以人为条件的LLM可能会在各种应用中造成不均衡的代表性伤害并加剧性别刻板印象。