WildChat
PulseAugur coverage of WildChat — every cluster mentioning WildChat across labs, papers, and developer communities, ranked by signal.
3 day(s) with sentiment data
-
Public chat data explored for AI model safety evaluation
Researchers are exploring the use of public chat data as an alternative to private production data for evaluating frontier AI models. This approach, termed Deployment Simulation, aims to predict undesirable model behavi…
-
OpenAI unveils deployment simulation to predict AI model behavior
OpenAI has developed a new method called Deployment Simulation to predict how AI models will behave in real-world scenarios before they are released. This technique uses de-identified user data to simulate deployment co…
-
Study: Commercial LLMs Outperform Open-Weight Models on Security Prompts
A new study analyzed 14,727 security and privacy prompts from the WildChat dataset, revealing that users frequently seek advice on protecting themselves online. Commercial large language models, such as GPT 5.5, demonst…
-
AI chatbots repeat Elias Thorne stories due to alignment training
A recurring character named Elias Thorne, often depicted as a lighthouse keeper or clockmaker, is appearing in a significant percentage of stories generated by various large language models. Researchers from Cornell Uni…
-
New research tackles AI agent training with realistic user personas
Two new research papers explore the limitations of current user simulators for training AI agents. The first paper introduces Persona Policies (PPol), a method to generate more realistic and varied user personas for sim…
-
New benchmark 'Prosa' evaluates LLMs on Brazilian Portuguese chats
Researchers have introduced Prosa, a new benchmark designed to evaluate Large Language Models (LLMs) using real user conversations in Brazilian Portuguese. This benchmark utilizes a rubric-based scoring system with mult…