PulseAugur
EN
LIVE 19:56:47
ENTITY Gemini 2.5 Pro

Gemini 2.5 Pro

PulseAugur coverage of Gemini 2.5 Pro — every cluster mentioning Gemini 2.5 Pro across labs, papers, and developer communities, ranked by signal.

Show in brief
Total · 30d
62
62 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
42
42 over 90d
TIER MIX · 90D
TOPICS
RELATIONSHIPS
SENTIMENT · 30D

20 day(s) with sentiment data

RECENT · PAGE 1/4 · 62 TOTAL
  1. TOOL · CL_79925 ·

    SCOUT framework boosts LLM performance on non-linguistic tasks

    Researchers have developed a new framework called SCOUT to improve the performance of Large Language Models (LLMs) on non-linguistic tasks. SCOUT decouples exploration from exploitation, using lightweight "scouts" to ef…

  2. TOOL · CL_77337 ·

    New ODE framework boosts multimodal AI agents with reusable visuals

    Researchers have developed a new framework called On-policy Data Evolution (ODE) to improve multimodal deep search agents. ODE addresses two key limitations: the inability to reuse intermediate visual information from s…

  3. TOOL · CL_76047 ·

    AI library standardizes 'model thinking' across providers

    A new library called aichain has been developed to standardize the implementation of a common LLM feature across different providers. This feature, which allows models to "think longer" before responding, is offered by …

  4. TOOL · CL_75512 ·

    New GCF format outperforms JSON and TOON in LLM data handling benchmark

    A new benchmark reveals that common data formats like JSON and TOON struggle with large language models, failing to maintain accuracy and validity at scale. The study found that JSON breaks down with as few as 500 recor…

  5. TOOL · CL_74406 ·

    LLMs struggle with South Asian music generation and understanding

    Researchers have evaluated the capabilities of Large Language Models (LLMs) in understanding and generating South Asian classical music, a domain with distinct structural principles from Western traditions. Their new be…

  6. COMMENTARY · CL_73947 ·

    PauseAI UK marks one year of AI safety advocacy and protests

    PauseAI UK has marked its first year by organizing significant events and advocacy efforts aimed at raising awareness of AI risks. The group hosted conferences, facilitated a debate in the European Parliament, and co-or…

  7. TOOL · CL_72287 ·

    Estonia benchmark: Claude Opus 4.7 best resists Russian propaganda

    Estonia's Language Institute has released a new benchmark called "Propaganda Resistance" to evaluate how well large language models can withstand Russian state-sponsored disinformation. The benchmark tested 14 types of …

  8. RESEARCH · CL_72505 ·

    New OMTG benchmark surpasses Gemini 2.5 Pro with novel reward functions

    Researchers have introduced a new benchmark and dataset for One-to-Many Temporal Grounding (OMTG), a task that involves localizing multiple video segments corresponding to a single text query. Existing multimodal large …

  9. RESEARCH · CL_65818 ·

    New LLM creativity metric analyzes token distribution shifts

    Researchers have developed a new method for evaluating LLM creativity by analyzing how sampling temperature reshapes token distributions, outperforming existing metrics. This approach, tested on Llama-3.1-8B-Instruct, a…

  10. TOOL · CL_62860 ·

    New EMBGuard system enhances AI agent safety by identifying physical hazards

    Researchers have developed EMBGuard, a new safety system for embodied AI agents that identifies and reasons about physical hazards in real-world environments. Unlike previous methods, EMBGuard explicitly decouples risk …

  11. RESEARCH · CL_58821 ·

    New methods boost LLM geometric reasoning with symbolic interfaces

    Researchers have developed new methods to improve Large Language Models' (LLMs) ability to reason about geometric problems. One approach uses symbolic intermediaries to translate numerical outputs from physics simulator…

  12. TOOL · CL_56728 ·

    Medical AI Agents Learn to "See" Evidence, Outperforming GPT-5

    Researchers have developed new AI paradigms for medical imaging and video analysis, enabling models to actively "look" at evidence rather than just passively process it. These "Think with Images" and "Think with Videos"…

  13. TOOL · CL_51076 ·

    New 'Chain-of-Thought Hijacking' attack exploits LLM reasoning for jailbreaks

    Researchers have identified a new vulnerability in large reasoning models (LRMs) called "Chain-of-Thought Hijacking." This attack exploits extended reasoning processes to weaken a model's refusal capabilities, leading t…

  14. TOOL · CL_50854 ·

    MDIA agent achieves high scores on HealthBench Professional benchmark

    Researchers have developed MDIA, a Multi-agent Diagnostic Intelligence Agent, which utilizes a 7-node clinical reasoning graph to achieve strong performance on the HealthBench Professional benchmark. When evaluated usin…

  15. TOOL · CL_49936 ·

    Bifrost gateway improves LLM cost, data quality for robotics and agents

    Two separate teams at Nexus Labs and Prophesee have adopted Bifrost, an open-source gateway, to manage their interactions with multiple large language models. Prophesee used Bifrost to caption 1.2 million robotics frame…

  16. TOOL · CL_49232 ·

    Claude Sonnet 4.5 leads Gemini 2.5 Pro, GPT-4.1 in coding benchmark

    A recent benchmark compared GPT-4.1, Claude Sonnet 4.5, and Gemini 2.5 Pro on real-world coding tasks. Claude Sonnet 4.5 scored highest in code generation, demonstrating strong structural consistency and appropriate use…

  17. RESEARCH · CL_44138 ·

    OpenClaw surpasses React's GitHub stars, offers multi-model AI coding

    OpenClaw, a new open-source developer tool, has rapidly gained popularity, surpassing React's GitHub star count in just 60 days. The tool allows users to select their preferred AI model, including options from Anthropic…

  18. TOOL · CL_43243 ·

    Shadow LLM APIs deceive researchers with cheaper models

    Researchers at CISPA audited 17 third-party "shadow" LLM APIs and discovered significant performance discrepancies compared to the official models they claimed to represent. These services often provide access to cheape…

  19. RESEARCH · CL_45032 ·

    MAVEN pipeline automates video reasoning data annotation

    Researchers have developed MAVEN, an agentic pipeline designed to automate the creation of high-quality structured annotations for video reasoning tasks. This pipeline synthesizes multi-scale event descriptions and supp…

  20. RESEARCH · CL_44020 ·

    LLMs outperform fine-tuned models on rare suicide circumstances

    A new research paper compares the performance of large language models (LLMs) against fine-tuned RoBERTa models for extracting complex circumstances from death investigation narratives. The study introduces a "Complexit…