PulseAugur
EN
LIVE 16:24:04
ENTITY LITMUS

LITMUS

PulseAugur coverage of LITMUS — every cluster mentioning LITMUS across labs, papers, and developer communities, ranked by signal.

Show in brief
Total · 30d
3
3 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
2
2 over 90d
TIER MIX · 90D
TOPICS
TIMELINE
  1. 2026-05-11 research_milestone Introduction of the LITMUS benchmark for evaluating LLM agent safety in OS environments. source
  2. 2026-05-11 research_milestone Introduction of the LITMUS benchmark for evaluating LLM agent behavioral safety. source
  3. 2026-05-11 research_milestone Introduction of the LITMUS benchmark for evaluating LLM agent behavioral safety.
SENTIMENT · 30D

1 day(s) with sentiment data

RECENT · PAGE 1/1 · 3 TOTAL
  1. TOOL · CL_105125 ·

    New Litmus system automates AI metric specification without labels

    Researchers have developed Litmus, a novel system designed to automatically specify evaluation and monitoring metrics for AI systems. Unlike existing methods that assume the evaluation target is known, Litmus identifies…

  2. RESEARCH · CL_34509 ·

    New LITMUS benchmark reveals LLM agent safety flaws

    Researchers have introduced LITMUS, a new benchmark designed to test the behavioral safety of LLM agents operating within real operating system environments. This benchmark addresses limitations in existing safety evalu…

  3. TOOL · CL_17652 ·

    Email marketing knowledge base launched as Claude Code skill

    A developer has created a "Claude Code skill" that acts as an expert in email marketing, drawing from a comprehensive knowledge base of over 65,000 words. This skill is built upon insights from 908 sources, including in…