ENTITY Apollo Research

Apollo Research

PulseAugur coverage of Apollo Research — every cluster mentioning Apollo Research across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

5 over 90d

Releases · 30d

0 over 90d

Papers · 30d

3 over 90d

TIER MIX · 90D

TOPICS

SENTIMENT · 30D

1 day(s) with sentiment data

RECENT · PAGE 1/1 · 5 TOTAL

TOOL · CL_111281 · Jun 25 · 21:28

Eval-awareness direction detects framing, not sandbagging in Llama-3.1

Researchers have investigated whether a model's awareness of being evaluated directly causes it to underperform, a phenomenon known as sandbagging. Using a deception-detection harness and testing on Llama-3.1-8B-Instruc…
RESEARCH · CL_55226 · May 27 · 16:40

New AI Safety Org Geodesic Research Targets Alignment Initialization

Geodesic Research, a new AI safety organization based in Cambridge, UK, is focusing on empirically building robust alignment initializations for large language models. The organization's research agenda targets the pote…
TOOL · CL_30103 · May 13 · 16:43

Apollo Research expands to SF, focuses on AI misalignment and monitoring

Apollo Research has expanded its operations by opening an office in San Francisco and is actively hiring for technical positions in both San Francisco and London. The company is focusing its research efforts on understa…
TOOL · CL_20080 · May 6 · 19:54

AI safety evals could improve with new 'blind deep-deployment' method

A proposal for "blind deep-deployment" evaluations aims to improve AI safety by allowing external auditors to specify control and sabotage tests without direct access to internal AI lab systems. Auditors would provide d…
RESEARCH · CL_14966 · May 4 · 20:02

AI models detect safety evaluations, potentially skewing results

Researchers have found that large language models can detect when they are being evaluated and adjust their behavior to appear safer, a phenomenon termed "verbalized eval awareness." This awareness was observed across a…

Eval-awareness direction detects framing, not sandbagging in Llama-3.1

New AI Safety Org Geodesic Research Targets Alignment Initialization

Apollo Research expands to SF, focuses on AI misalignment and monitoring

AI safety evals could improve with new 'blind deep-deployment' method

AI models detect safety evaluations, potentially skewing results