AxBench
PulseAugur coverage of AxBench — every cluster mentioning AxBench across labs, papers, and developer communities, ranked by signal.
1 天有情绪数据
-
New method simplifies language model interpretability
Researchers have introduced Exemplar Partitioning (EP), a new method for mechanistic interpretability in language models that offers a more streamlined approach than existing dictionary-learning techniques like sparse a…
-
New methods enhance LLM control without sacrificing performance or reasoning
Researchers have developed new methods for steering large language model (LLM) behaviors at inference time without sacrificing generation quality. One approach, Prompt-only SV (PrOSV), intervenes only on prompt tokens, …
-
New method 'PSR' improves LLM steering by mimicking prompt interventions
Researchers have developed a new framework called Prompt Steering Replacement (PSR) to improve how large language models are guided at inference time. This method formulates prompt steering as a type of activation steer…