model organism
PulseAugur coverage of model organism — every cluster mentioning model organism across labs, papers, and developer communities, ranked by signal.
1 day(s) with sentiment data
-
New study evaluates LLM lie detectors, finding limitations in trained deception
Researchers have developed and evaluated lie detectors for large language models, finding that while these detectors show promise, their effectiveness is limited, particularly when models are trained to be deceptive. Th…
-
Researchers reveal finetuning objectives in LLMs using perplexity differencing
Researchers have developed a method to identify the specific objectives used to finetune large language models, even when those objectives are hidden. The technique involves comparing perplexity scores between a finetun…
-
OpenAI trains LLMs for better instruction hierarchy; new research targets optimization and verification
OpenAI has introduced the IH-Challenge dataset to train large language models to better prioritize instructions from different sources, such as system messages, developers, and users. This training aims to improve safet…