PulseAugur
EN
LIVE 20:03:05
ENTITY Humanity's Last Exam

Humanity's Last Exam

PulseAugur coverage of Humanity's Last Exam — every cluster mentioning Humanity's Last Exam across labs, papers, and developer communities, ranked by signal.

Show in brief
Total · 30d
11
11 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
6
6 over 90d
TIER MIX · 90D
TOPICS
SENTIMENT · 30D

3 day(s) with sentiment data

RECENT · PAGE 1/1 · 11 TOTAL
  1. TOOL · CL_71823 ·

    Andon Labs stress-tests AI agents in real-world business scenarios

    Andon Labs is developing novel real-world evaluations for AI systems, moving beyond traditional benchmarks to assess model behavior in complex scenarios. Their "Vending-Bench" and "Luna" projects, which involve AI-run p…

  2. SIGNIFICANT · CL_45430 ·

    Google's Gemini 3.5 Flash outperforms 3.1 Pro on coding and agents

    Google's Gemini 3.5 Flash model has surpassed its predecessor, Gemini 3.1 Pro, on several key benchmarks, particularly in coding and agentic tasks. This new tier offers a significant cost reduction of 40% and approximat…

  3. TOOL · CL_30793 ·

    LLMs learn to actively seek external info for better task adaptation

    Researchers have developed a new method for adapting large language models (LLMs) by enabling them to actively seek information from external sources like Wikipedia and web browsers. This approach, termed "active inform…

  4. TOOL · CL_18871 ·

    New RSE strategy recycles LLM search experience for efficient test-time scaling

    Researchers have introduced Recycling Search Experience (RSE), a novel method to improve the efficiency of test-time scaling for large language models. RSE transforms test-time search from isolated trials into a cumulat…

  5. RESEARCH · CL_20273 ·

    OpenSearch-VL offers open recipe for advanced multimodal search agents

    Researchers have developed OpenSearch-VL, a novel, fully open-source recipe for training advanced multimodal deep search agents. This approach utilizes a curated pipeline for high-quality training data, a diverse tool e…

  6. FRONTIER RELEASE · CL_07657 ·

    Xiaomi's MiMo-v2.5-Pro open-source model rivals top AI coding assistants

    Xiaomi has released MiMo-v2.5-Pro, an open-source coding-focused language model that demonstrates impressive capabilities in complex tasks. The model successfully completed a university-level compiler project in hours, …

  7. RESEARCH · CL_06636 ·

    MTRouter cuts LLM costs by 58% on ScienceWorld, 43% on HLE

    Researchers have developed MTRouter, a novel system designed to optimize the cost of multi-turn interactions with large language models. By jointly embedding interaction history and candidate models, MTRouter learns to …

  8. FRONTIER RELEASE · CL_11258 ·

    Google Gemini API adds Deep Research updates with MCP and chart generation

    Google has released two significant updates to its Gemini API, enhancing its Deep Research capabilities. These updates introduce improved quality, support for MCP, and native generation of charts and infographics. The G…

  9. FRONTIER RELEASE · CL_01763 ·

    new Gemini 3 Deep Think, Anthropic $30B @ $380B, GPT-5.3-Codex Spark, MiniMax M2.5

    Google DeepMind has released Gemini 3 Deep Think V2, a new reasoning mode for Google AI Ultra subscribers and available via API early access. This model achieves new state-of-the-art results on benchmarks like ARC-AGI-2…

  10. FRONTIER RELEASE · CL_01790 ·

    Kimi K2 model boasts 1T parameters and SOTA HLE, while Soumith Chintala departs PyTorch

    Kimi K2, a new model from Kimi, boasts 1 trillion parameters and achieves state-of-the-art results on the HLE benchmark. It also demonstrates capabilities in BrowseComp and TauBench. Separately, Soumith Chintala has dep…

  11. FRONTIER RELEASE · CL_01735 ·

    Google DeepMind launches Deep Think for Gemini Ultra subscribers

    Google DeepMind has released a new AI capability called Deep Think, now available to Google AI Ultra subscribers via the Gemini app. This feature utilizes parallel thinking techniques, allowing the model to explore mult…