PulseAugur / Brief
EN
LIVE 08:34:43

Brief

last 24h
[1/1] 223 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. Harmfulness Directions in OLMo

    Researchers have analyzed the development of harmfulness representations within the OLMo 3 7B model during its training process. They identified distinct but related linear activation directions for various harmfulness subcategories, observing that these directions evolve and stabilize over time. The study found that in-distribution evaluations can be misleading, emphasizing the need for out-of-distribution testing, and demonstrated that late-stage training directions can effectively steer the model's behavior. AI

    Harmfulness Directions in OLMo

    IMPACT Reveals insights into how harmful concepts are represented and evolve during LLM training, potentially informing future safety research.