GPT-5.1
PulseAugur coverage of GPT-5.1 — every cluster mentioning GPT-5.1 across labs, papers, and developer communities, ranked by signal.
4 天有情绪数据
-
LLMs show promise for low-resource ASR error correction
Researchers explored the effectiveness of large language models (LLMs) in correcting errors for low-resource automatic speech recognition (ASR) systems, specifically focusing on West Frisian. Their study introduced a co…
-
LLM advancements in coding agents and personal assistants detailed
Simon Willison presented a five-minute talk at PyCon US 2026 summarizing LLM developments since November 2025. Key advancements included significant improvements in coding agents, which became reliable for daily use, an…
-
AI models favor sponsored flights, study finds
A recent study from Princeton and the University of Washington found that 18 out of 23 AI models exhibited a bias towards selecting more expensive, sponsored flight options when instructed to choose. Models like Grok-4.…
-
HLS-Seek uses RL to generate hardware descriptions prioritizing performance
Researchers have developed HLS-Seek, a new framework for generating hardware descriptions from natural language that prioritizes Quality of Results (QoR) like latency and resource utilization. Unlike previous methods th…
-
Deduplication in RAG systems cuts context size without quality loss
A new preprint details an empirical analysis of byte-exact deduplication in Retrieval-Augmented Generation (RAG) systems. The study found significant context reduction across academic, enterprise, and conversational AI …
-
Cursor AI uses older models despite newer options being available
A user on Reddit's Cursor subreddit is questioning why the Cursor IDE's subagent feature is defaulting to older models like GPT-5.1 and GPT-5.2 for coding tasks. Despite configuring the system to use newer and potential…
-
BioTool dataset enhances LLM biomedical tool-calling capabilities
Researchers have developed BioTool, a new dataset aimed at improving the ability of large language models to utilize specialized biomedical tools. The dataset includes 34 tools from major databases and over 7,000 human-…
-
New dataset reveals MLLMs struggle with handwritten STEM student solutions
Researchers have introduced EDU-CIRCUIT-HW, a new dataset comprising over 1,300 handwritten solutions from university STEM students to evaluate multimodal large language models (MLLMs). The dataset aims to address the c…
-
MLLM feedback on student drawings shows significant grounding failures
A new study published on arXiv reveals significant grounding failures in multimodal large language models (MLLMs) when generating feedback on student science drawings. Researchers found that 41.3% of feedback instances …
-
New AEGIS benchmark reveals AI image forensics lag behind generative advances
Researchers have introduced AEGIS, a new benchmark designed to evaluate the forensic analysis of AI-generated academic images. This benchmark addresses domain-specific complexity across seven academic categories and inc…
-
AI researchers review AGI forecasting methods, identify gaps and implications
A new report reviews current methodologies for forecasting the arrival of artificial general intelligence (AGI), highlighting significant limitations in existing approaches. The research synthesizes diverse forecasting …
-
AI models evaluated on meeting summaries, GPT-5.1 shows gains
Researchers have developed a reusable pipeline for evaluating AI-generated meeting summaries, designed to be adaptable across different domains. The system treats both ground truth and AI outputs as structured artifacts…
-
ArguAgent uses GPT-5.2 to group STEM students for better classroom arguments
Researchers have developed ArguAgent, a generative AI system designed to improve collaborative learning in STEM classrooms. The system uses AI to group students in real-time based on their argumentation stances and qual…
-
Podium arms 10,000+ SMBs with AI agents powered by GPT-5.1
Podium has launched an enhanced AI agent, named "Jerry," powered by OpenAI's GPT-5.1 model, to assist over 10,000 small and medium-sized businesses (SMBs). This AI agent automates lead capture, appointment scheduling, a…
-
Black Forest Labs FLUX.2 [pro|flex|dev|klein]: near-Nano Banana quality but Open Weights
Black Forest Labs has released FLUX.2, an image generation model with multi-reference support for up to 4-megapixel outputs and 10 images, including open-weight versions. Concurrently, Anthropic's Claude Opus 4.5 is sho…
-
2023 Year In Review
METR, an AI safety research organization, detailed its 2023 accomplishments, including developing methodologies for evaluating AI agents on autonomous tasks and contributing to OpenAI's GPT-4 system card. The organizati…