GPT-4.1 mini
PulseAugur coverage of GPT-4.1 mini — every cluster mentioning GPT-4.1 mini across labs, papers, and developer communities, ranked by signal.
3 天有情绪数据
-
大型语言模型在精神科筛查中表现不一,需要验证
一项发表在arXiv上的新研究评估了五种大型语言模型在精神科筛查中的表现,使用了包含555次访谈的基准。模型表现出不同的准确性,其中GPT-4.1 Mini和GPT-5 Mini显示出最一致的结果。研究人员发现,当患者报告功能完好或有社会支持时,大型语言模型倾向于低估症状证据,这凸显了在临床使用前需要进行仔细验证。
-
LLMs struggle with Bangla medical visual questions, new dataset shows
Researchers have developed BanglaMedVQA, a new dataset designed to evaluate Large Language Models (LLMs) and Large Vision Language Models (LVLMs) on medical visual question answering in the Bangla language. Their benchm…
-
Study: Stale code context actively harms AI code completion
A new study published on arXiv investigates the impact of outdated information on code generation models. Researchers found that providing stale repository context can actively lead models to produce incompatible code, …
-
AI-native graduates showcase groundbreaking projects, reshaping higher education
OpenAI has launched its "ChatGPT Futures" program to recognize students who have effectively integrated AI into their university education. The program highlights 26 individuals and teams, aged around 20, who have used …
-
AICoFe系统使用多个LLM为高等教育提供AI辅助学生反馈
研究人员开发了AICoFe,一个旨在加强高等教育协作反馈的人工智能系统。该系统采用多LLM管道,集成了GPT-4.1-mini、Gemini 2.5 Flash和Llama 3.1,将评分标准数据和定性评论转化为精炼的反馈。一个关键组成部分是“教师在循环中”工作流程,允许教育工作者在AI生成的草稿交付给学生之前,通过学习分析仪表板进行审查和编辑。该系统的数据基础设施结合了SQL和MongoDB,用于管理反馈版本并确保可追溯性。
-
新研究通过动态评估和鲁棒防御策略应对LLM越狱问题
多篇研究论文探讨了增强大型语言模型(LLM)安全性、使其免受越狱攻击的先进技术。这些研究引入了新的框架和方法,用于评估和防御旨在诱导有害输出的对抗性提示。研究重点在于开发更全面的评估指标、自适应攻击生成策略以及能够识别模型行为中细微模式的鲁棒检测机制。
-
AI Help Desk uses RAG and GPT-4.1-mini for protein structure deposition support
Researchers have developed an AI-powered Help Desk system to assist structural biologists with depositing macromolecular structures into the Protein Data Bank (PDB). The system utilizes Retrieval-Augmented Generation (R…
-
AI models evaluated on meeting summaries, GPT-5.1 shows gains
Researchers have developed a reusable pipeline for evaluating AI-generated meeting summaries, designed to be adaptable across different domains. The system treats both ground truth and AI outputs as structured artifacts…
-
AI code review bots show limits in automated evaluation, GitHub COO discusses ambient AI
A new paper explores the limitations of automated evaluation for AI code review bots, finding that current automated methods like G-Eval and LLM-as-a-Judge show only moderate alignment with human developer labels. The s…
-
Introducing gpt-realtime and Realtime API updates
OpenAI has released GPT-4.1, a new series of models for its API that offer significant improvements in coding, instruction following, and long context comprehension, outperforming previous models like GPT-4o. The compan…