Alignment Forum
TIER_1
English(EN)
·
Josh Engels
·
2026-06-20 20:05
<p><span>Authors: Joshua Engels*, Callum McDougall*, Bilal Chughtai*, Janos Kramar, Senthoran Rajamanoharan, Cindy Wu, Arthur Conmy, Asic Q Chen, Jean Tarbouriech, Min Ma, Brendan O'Donoghue+, João Gabriel Lopes de Oliveira+, Rohin Shah+, Neel Nanda+</span><br /><span>*Primary Co…
Alignment Forum
TIER_1
English(EN)
·
Josh Engels
·
2026-06-20 20:05
<p><span>Authors: Joshua Engels*, Callum McDougall*, Bilal Chughtai*, Janos Kramar, Senthoran Rajamanoharan, Cindy Wu, Arthur Conmy, Asic Q Chen, Jean Tarbouriech, Min Ma, Brendan O'Donoghue+, João Gabriel Lopes de Oliveira+, Rohin Shah+, Neel Nanda+</span><br /><span>*Primary Co…
arXiv cs.AI
TIER_1
English(EN)
·
Joshua Engels, Callum McDougall, Bilal Chughtai, Janos Kramar, Senthoran Rajamanoharan, Cindy Wu, Arthur Conmy, Asic Q Chen, Jean Tarbouriech, Min Ma, Brendan O'Donoghue, Jo\~ao Gabriel Lopes de Oliveira, Rohin Shah, Neel Nanda
·
2026-06-19 04:00
arXiv:2606.20560v1 Announce Type: cross Abstract: LLM reasoning transparency is a critical affordance for understanding model decisions, mitigating misuse and misalignment, and debugging surprising model behaviors. However, DiffusionGemma performs a larger fraction of its computa…
arXiv cs.AI
TIER_1
English(EN)
·
Neel Nanda
·
2026-06-18 17:59
LLM reasoning transparency is a critical affordance for understanding model decisions, mitigating misuse and misalignment, and debugging surprising model behaviors. However, DiffusionGemma performs a larger fraction of its computation in a continuous latent space; does this make …
LessWrong (AI tag)
TIER_1
English(EN)
·
Josh Engels
·
2026-06-20 20:05
<p><span>Authors: Joshua Engels*, Callum McDougall*, Bilal Chughtai*, Janos Kramar, Senthoran Rajamanoharan, Cindy Wu, Arthur Conmy, Asic Q Chen, Jean Tarbouriech, Min Ma, Brendan O'Donoghue+, João Gabriel Lopes de Oliveira+, Rohin Shah+, Neel Nanda+</span><br /><span>*Primary Co…
LessWrong (AI tag)
TIER_1
English(EN)
·
Josh Engels
·
2026-06-20 20:05
<p><span>Authors: Joshua Engels*, Callum McDougall*, Bilal Chughtai*, Janos Kramar, Senthoran Rajamanoharan, Cindy Wu, Arthur Conmy, Asic Q Chen, Jean Tarbouriech, Min Ma, Brendan O'Donoghue+, João Gabriel Lopes de Oliveira+, Rohin Shah+, Neel Nanda+</span><br /><span>*Primary Co…