ENTITY WildChat

WildChat

PulseAugur coverage of WildChat — every cluster mentioning WildChat across labs, papers, and developer communities, ranked by signal.

Total · 30d

6

6 over 90d

Releases · 30d

0

0 over 90d

Papers · 30d

5

5 over 90d

TIER MIX · 90D

TOPICS

SENTIMENT · 30D

3 day(s) with sentiment data

RECENT · PAGE 1/1 · 6 TOTAL

TOOL · CL_96010 · Jun 17 · 03:53

Public chat data explored for AI model safety evaluation

Researchers are exploring the use of public chat data as an alternative to private production data for evaluating frontier AI models. This approach, termed Deployment Simulation, aims to predict undesirable model behavi…
RESEARCH · CL_95252 · Jun 16 · 19:42

OpenAI unveils deployment simulation to predict AI model behavior

OpenAI has developed a new method called Deployment Simulation to predict how AI models will behave in real-world scenarios before they are released. This technique uses de-identified user data to simulate deployment co…
RESEARCH · CL_95829 · Jun 16 · 15:37

Study: Commercial LLMs Outperform Open-Weight Models on Security Prompts

A new study analyzed 14,727 security and privacy prompts from the WildChat dataset, revealing that users frequently seek advice on protecting themselves online. Commercial large language models, such as GPT 5.5, demonst…
RESEARCH · CL_85554 · Jun 11 · 13:00

AI chatbots repeat Elias Thorne stories due to alignment training

A recurring character named Elias Thorne, often depicted as a lighthouse keeper or clockmaker, is appearing in a significant percentage of stories generated by various large language models. Researchers from Cornell Uni…
RESEARCH · CL_27575 · May 10 · 23:06

New research tackles AI agent training with realistic user personas

Two new research papers explore the limitations of current user simulators for training AI agents. The first paper introduces Persona Policies (PPol), a method to generate more realistic and varied user personas for sim…
RESEARCH · CL_15870 · May 5 · 04:00

New benchmark 'Prosa' evaluates LLMs on Brazilian Portuguese chats

Researchers have introduced Prosa, a new benchmark designed to evaluate Large Language Models (LLMs) using real user conversations in Brazilian Portuguese. This benchmark utilizes a rubric-based scoring system with mult…