GPT-4 Turbo
PulseAugur coverage of GPT-4 Turbo — every cluster mentioning GPT-4 Turbo across labs, papers, and developer communities, ranked by signal.
5 天有情绪数据
-
新的ERM框架在无标签情况下批判LLM的因果推理
一个名为认知遗憾最小化(ERM)的新框架已被引入,以改进大型语言模型的因果推理能力。与只奖励正确答案的传统方法不同,ERM批判的是其底层的推理过程本身。这种无标签的方法能够识别并纠正模型思维过程中诸如混淆相关性与因果性以及未经验证的混淆变量等问题。实验表明,ERM显著增强了GPT-4 Turbo和GPT-5.2等模型的因果推理能力,其表现优于标准的测试时纠正方法。
-
Cursor 用户就 AI 编码助手在小任务上的成本效益展开辩论
Cursor subreddit 的用户正在讨论使用 AI 编码助手处理小任务的经济可行性。讨论的焦点在于,为小型编码工作运行 GPT-4 Turbo 或 Claude 3 Opus 等模型的成本是否超过了节省的时间。一些用户建议使用更便宜、更快的模型,或者为更简单的任务禁用 AI 功能,以控制开支。
-
LLM clinical accuracy varies significantly by prompting language, study finds
A new study published on arXiv reveals that the language used to prompt large language models significantly impacts their diagnostic reasoning and accuracy in clinical settings. Researchers found that four out of five e…
-
Developers cut AI costs by running LLMs locally
Developers are increasingly running large language models locally to reduce costs and latency, with one developer reportedly cutting their OpenAI bill from $2,400 to $180 per month by shifting 80% of their workload to a…
-
Vector RAG vs. LLM Wiki: Study reveals trade-offs in research synthesis
A new research paper compares Vector Retrieval-Augmented Generation (RAG) against an LLM-compiled wiki for answering questions over a small corpus of 24 research papers. While the wiki excelled at synthesizing informati…
-
Prompt engineering guide details LLM interaction techniques
Prompt engineering is crucial for optimizing large language model outputs, involving techniques like zero-shot and few-shot prompting to guide the AI. Advanced methods include chain-of-thought prompting for complex reas…
-
LLM costs surge in 2026 due to complex factors beyond token pricing
By 2026, the cost of using large language models like Claude 3.5 Sonnet and GPT-4 Turbo will become significantly more complex than simple per-token pricing. Developers must account for factors such as prompt caching, b…
-
ReCode框架通过奖励推理过程来增强AI代码生成
研究人员开发了ReCode,一个新颖的强化学习框架,旨在通过关注推理过程来改进代码生成。该框架使用对比推理过程奖励学习(CRPL)在合成的推理变体上训练奖励模型,并使用一致性门控GRPO(CG-GRPO)来整合这些奖励,同时通过执行结果缓解奖励攻击。ReCode应用于一个7B模型时,比其基础版本提高了16.1%,并在各种基准测试上取得了与GPT-4-Turbo相当的性能。
-
LLMs simulate survey respondents, offering new social science research tools
Researchers have developed a new benchmark called LLM-S^3 to evaluate how well large language models can simulate human respondents in surveys. The benchmark includes 11 real-world datasets across various sociological d…
-
METR finds GPT-4o shows impressive agent skills but suffers fixable failures
METR has released preliminary findings from an evaluation of GPT-4o's autonomous capabilities across 77 tasks. The model demonstrated impressive skills like systematic exploration but also exhibited failure modes such a…
-
OpenAI releases GPT-4o with fine-tuning and enhanced multimodal capabilities
OpenAI has released fine-tuning capabilities for its GPT-4o model, allowing developers to customize its performance and tone for specific applications. This feature, available on paid tiers, offers developers the chance…
-
OpenAI launches GPT-4 Turbo with larger context, lower prices, and new tools
OpenAI announced several updates at its DevDay event, including the new GPT-4 Turbo model with a 128K context window and knowledge up to April 2023, offered at a reduced price. The company also introduced an Assistants …
-
Replit 发布 Teams、Code Repair AI 及 Workspace 升级
Replit 在其年度开发者日活动上宣布了重要的平台更新和新的人工智能功能。该公司通过推出 Replit Teams 来扩展其团队产品,旨在增强协作和简化开发工作流程。此外,Replit 还推出了 Code Repair,这是一个自动化调试的人工智能模型,据报道在特定基准测试中表现优于 GPT-4 Turbo 和 Claude 3 Opus 等领先模型。该平台还公布了其 Workspace 的改进,包括增加 RAM 和 CPU 限制、…
-
OpenAI launches new embedding models with price cuts and performance boosts
OpenAI has released new embedding models, text-embedding-3-small and text-embedding-3-large, offering significant improvements in performance and efficiency over previous models like text-embedding-ada-002. These new mo…