PulseAugur
EN
LIVE 17:35:19
ENTITY BrowseComp+

BrowseComp+

PulseAugur coverage of BrowseComp+ — every cluster mentioning BrowseComp+ across labs, papers, and developer communities, ranked by signal.

Show in brief
Total · 30d
11
11 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
9
9 over 90d
TIER MIX · 90D
TOPICS
SENTIMENT · 30D

6 day(s) with sentiment data

RECENT · PAGE 1/1 · 11 TOTAL
  1. RESEARCH · CL_99569 ·

    New method mines agent skills from interaction data, but policy improvement is limited

    Researchers have developed a method to automatically generate skill libraries for computer-using agents by mining interaction trajectories. The process involves segmenting graphical user interface (GUI) trajectories, cl…

  2. RESEARCH · CL_106759 ·

    New LLM Training Methods Optimize Data Scheduling for Efficiency and Performance

    Researchers have developed new methods for optimizing the training of large language models (LLMs) through advanced data scheduling techniques. One approach, the Holistic Data Scheduler (HDS), uses multi-objective reinf…

  3. TOOL · CL_86307 ·

    Perplexity Integrates Deep Research with Multi-Model Orchestration System

    Perplexity has integrated its Deep Research feature into its Computer orchestration system, enhancing its ability to break down complex questions into subtasks. These subtasks are then routed across more than 20 differe…

  4. RESEARCH · CL_84831 ·

    TreeSeeker framework enhances AI deep search with controlled trial-and-error

    Researchers have introduced TreeSeeker, a novel framework designed to improve the efficiency of deep search agents. This system structures search processes as a tree, allowing agents to explore multiple potential paths …

  5. RESEARCH · CL_65077 ·

    New Korean web-browsing benchmark reveals LLM performance gaps

    Researchers have introduced K-BrowseComp, a new benchmark designed to evaluate the web-browsing agent capabilities of large language models specifically within Korean contexts. The benchmark comprises 400 problems, with…

  6. COMMENTARY · CL_61793 ·

    Author warns AI evaluations are unreliable, risking unseen harms

    The author argues that current AI evaluation methods are unreliable and systematically flawed, posing significant risks. They highlight issues like models gaming evaluations, distribution shifts rendering metrics inaccu…

  7. RESEARCH · CL_55915 ·

    New benchmark LiveBrowseComp tests LLM search agents' true discovery skills

    A new research paper introduces LiveBrowseComp, a benchmark designed to assess whether large language model (LLM) search agents truly discover new information or merely verify their existing internal knowledge. The stud…

  8. RESEARCH · CL_37215 ·

    Hugging Face launches Open Agent Leaderboard for AI systems

    Hugging Face has launched the Open Agent Leaderboard, a new framework for evaluating the performance and cost of AI agent systems. This benchmark focuses on assessing an agent's generality across diverse tasks and setti…

  9. RESEARCH · CL_44793 ·

    New open-weight agents tackle deep research tasks with synthetic data and novel architectures

    Two new research papers introduce advanced agent systems designed for deep research tasks. The first, QUEST, offers a family of open-weight models (2B to 35B parameters) trained on synthetic data, demonstrating strong p…

  10. RESEARCH · CL_20273 ·

    OpenSearch-VL offers open recipe for advanced multimodal search agents

    Researchers have developed OpenSearch-VL, a novel, fully open-source recipe for training advanced multimodal deep search agents. This approach utilizes a curated pipeline for high-quality training data, a diverse tool e…

  11. FRONTIER RELEASE · CL_01790 ·

    Kimi K2 model boasts 1T parameters and SOTA HLE, while Soumith Chintala departs PyTorch

    Kimi K2, a new model from Kimi, boasts 1 trillion parameters and achieves state-of-the-art results on the HLE benchmark. It also demonstrates capabilities in BrowseComp and TauBench. Separately, Soumith Chintala has dep…