GPT-5.1
PulseAugur coverage of GPT-5.1 — every cluster mentioning GPT-5.1 across labs, papers, and developer communities, ranked by signal.
10 day(s) with sentiment data
-
LLMs narrow research methodology suggestions, study finds
A new study published on arXiv investigates the research methodologies suggested by large language models (LLMs) when prompted with research questions. The study found that models like GPT-5.1, Gemini 3 Pro, and DeepSee…
-
New method enhances LLM agent clarification seeking by decomposing uncertainty
Researchers have developed a novel method for LLM agents to improve their clarification-seeking capabilities by decomposing uncertainty. This approach separates action confidence from request uncertainty, allowing agent…
-
LLMs achieve 0.76 reliability in stance detection for Bayesian cognitive science
Researchers have developed a novel method for stance detection in scientific discourse, utilizing Large Language Models (LLMs) to analyze whether authors treat Bayesian models as descriptive mechanisms or useful mathema…
-
LLM Sampling: Why You Should Only Tune Temperature or Top-P
The article explains the distinct functions of temperature and top-p sampling in large language models, warning against using both simultaneously. Temperature rescales the probability distribution of tokens, affecting a…
-
New DNR-Bench reveals 0% pass rate for top LLMs
A new benchmark called DNR-Bench has been introduced to evaluate large language models' ability to avoid responding to specific prompts. Across several leading models including GPT-5.1, Claude Opus 4.8, Gemini 3 Pro, an…
-
New datasets tackle AI-generated evidence in legal settings
Researchers have developed new datasets to help detect AI-generated evidence in legal contexts. One corpus focuses on synthetic documents like receipts and administrative records, while another dataset, SLED-1400, conta…
-
User seeks GPT-5.1 aggregator with large context window
A user on Reddit is seeking information about model aggregators that offer GPT-5.1 and possess a high context window, ideally around 200k to 500k tokens. The user is also concerned about the pricing of such services.
-
AI's Token Billing Shock: Companies Scramble to Manage Runaway Costs
Companies are increasingly scrutinizing their AI spending as new token-based billing models reveal unexpectedly high costs. This shift from opaque, all-you-can-eat subscriptions to per-use charges has exposed a lack of …
-
Polymarket: Anthropic's Claude Opus 4.8 favored to lead AI model race
Prediction markets on Polymarket show a strong sentiment favoring Anthropic's Claude Opus 4.8 as the best AI model by the end of June 2026, with odds reaching 96%. This surge in confidence is attributed to early preview…
-
Kapa.ai indexes images for RAG to improve AI answers
Kapa.ai has developed a new method for incorporating images into Retrieval-Augmented Generation (RAG) pipelines for AI assistants. Instead of processing images at query time, which is costly and inefficient, Kapa.ai des…
-
New EMBGuard system enhances AI agent safety by identifying physical hazards
Researchers have developed EMBGuard, a new safety system for embodied AI agents that identifies and reasons about physical hazards in real-world environments. Unlike previous methods, EMBGuard explicitly decouples risk …
-
AuthorMix framework enables modular authorship style transfer
Researchers have developed AuthorMix, a novel framework for authorship style transfer that utilizes modular, style-specific LoRA adapters. This approach allows for rapid training of adaptation models for new target auth…
-
AI Labs Shift to Full API Pricing, Signaling Strong Product-Market Fit
Leading AI labs like Anthropic and OpenAI have shifted to full API pricing for their enterprise customers, signaling a strong product-market fit for their coding agents. This move, occurring in April 2026, mirrors the S…
-
New ClinConsensus Benchmark Evaluates Chinese Medical LLMs
Researchers have developed ClinConsensus, a new benchmark designed to evaluate the clinical rubric coverage of Chinese medical Large Language Models (LLMs). The benchmark includes 2,500 expert-curated cases across 36 sp…
-
New LLM Safety Tools Target Financial Regulatory Compliance
Researchers have developed two new systems, FinGuard and FinHarness, to enhance the safety and regulatory compliance of Large Language Models (LLMs) in financial services. FinGuard, built on Qwen3-8B, uses a novel pipel…
-
LLMs show promise for low-resource ASR error correction
Researchers explored the effectiveness of large language models (LLMs) in correcting errors for low-resource automatic speech recognition (ASR) systems, specifically focusing on West Frisian. Their study introduced a co…
-
LLM advancements in coding agents and personal assistants detailed
Simon Willison presented a five-minute talk at PyCon US 2026 summarizing LLM developments since November 2025. Key advancements included significant improvements in coding agents, which became reliable for daily use, an…
-
AI models favor sponsored flights, study finds
A recent study from Princeton and the University of Washington found that 18 out of 23 AI models exhibited a bias towards selecting more expensive, sponsored flight options when instructed to choose. Models like Grok-4.…
-
HLS-Seek uses RL to generate hardware descriptions prioritizing performance
Researchers have developed HLS-Seek, a new framework for generating hardware descriptions from natural language that prioritizes Quality of Results (QoR) like latency and resource utilization. Unlike previous methods th…
-
Deduplication in RAG systems cuts context size without quality loss
A new preprint details an empirical analysis of byte-exact deduplication in Retrieval-Augmented Generation (RAG) systems. The study found significant context reduction across academic, enterprise, and conversational AI …