GPT-OSS 120B
PulseAugur coverage of GPT-OSS 120B — every cluster mentioning GPT-OSS 120B across labs, papers, and developer communities, ranked by signal.
4 天有情绪数据
-
LLM analysis method reveals training data secrets and ethical risks
Researchers have developed a method using singular value decomposition (SVD) of a large language model's weight matrix to reveal interpretable semantic subspaces. This technique, requiring minimal code and no model infe…
-
新框架支持对复杂BIM数据进行自然语言查询
研究人员开发了IfcLLM,一个旨在通过自然语言查询使Industry Foundation Classes (IFC) 数据更易于访问的新型框架。该系统将IFC模型转换为关系和图表示,然后由具有迭代推理能力的LLM进行处理以理解用户请求。该框架使用开源GPT OSS 120B模型实现,在测试中表现出高准确性,为更直观地与复杂BIM数据交互指明了方向。
-
LLMs show bias toward sponsored products, but simple prompts can fix it
A new paper reveals that many large language models, including OpenAI's GPT-3.5 Turbo and GPT-4o, exhibit a bias towards recommending sponsored products. Researchers found that these models often suggest more expensive,…
-
AI predicts human rater disagreement in LLM-generated difficulty scores
Researchers have developed a new method to predict when AI-generated difficulty ratings for educational materials might disagree with human assessments. This approach uses a separate embedding space, like ModernBERT, to…
-
AI agent costs soar 40x without caching, prompting architectural shifts
The author is evaluating the cost-effectiveness of using Cerebras hardware for LLM inference, specifically with GLM 4.7. While Cerebras offers impressive speed, the lack of prompt caching leads to significantly higher c…
-
LaTA autograder uses local LLM to grade STEM coursework compliantly
Researchers have developed LaTA, an open-source autograder that uses a local LLM to grade STEM coursework without sending student data to third-party APIs. This FERPA-compliant system runs on commodity hardware and inte…
-
AI developers face rate limits, latency; routing is key
Developers are encountering significant challenges with API rate limits and latency when using AI models, particularly from Anthropic. These issues often stem from architectural choices that rely on a single provider fo…
-
New FMECA framework assesses patient safety risks in AI-generated clinical content
Researchers have developed and validated a new framework, based on Failure Mode, Effects, and Criticality Analysis (FMECA), to systematically assess patient safety risks associated with generative AI-created clinical co…
-
New red-teaming method ContextualJailbreak bypasses LLM safety alignment
Researchers have developed ContextualJailbreak, an evolutionary red-teaming strategy designed to find vulnerabilities in large language models. This black-box approach uses simulated multi-turn dialogues and a graded ha…
-
新框架通过灵活交互和细粒度反馈增强文本到SQL模型
研究人员开发了几个新框架来改进文本到SQL生成,特别是针对小型语言模型和复杂的数据库交互。FineStep和FINER-SQL引入了新颖的强化学习方法,具有步级信用分配和细粒度执行反馈,以提高准确性和效率。Rose-SQL利用小推理模型的上下文学习进行多轮查询,而FlexSQL专注于灵活的数据库交互和探索以更好地解释查询。此外,EGRefine通过优化命名约定来解决模式歧义,以提高各种模型在下游文本到SQL方面的性能。
-
AI models achieve high verification success with formal code generation
Researchers have developed a new dataset, NL2VC-60, containing 60 algorithmic problems to aid in generating verified code from natural language. They evaluated seven open-weight LLMs using various prompting strategies, …
-
AWS SageMaker AI streamlines generative AI deployment with new inference recommendations and G7e instances
Amazon SageMaker AI has introduced new features to streamline the deployment of generative AI models. The platform now offers optimized inference recommendations, leveraging NVIDIA AIPerf to reduce the weeks-long manual…
-
These AI Workstations Look Like PCs but Pack a Stronger Punch
Tenstorrent has unveiled the QuietBox 2, an AI workstation designed to run large language models locally, resembling a standard PC but with significantly enhanced hardware. This new machine features four Tenstorrent Bla…
-
IonRouter launches AI inference service with custom IonAttention engine
IonRouter has launched a new inference service designed for high throughput and low cost, utilizing its proprietary IonAttention engine. This engine is capable of multiplexing multiple models on a single GPU, enabling r…
-
新研究探讨LLM的推理、指令遵循和自我纠正能力
几篇最新的研究论文探讨了大型推理模型(LRM)的内部机制和推理能力。其中一篇已被撤回的论文提出了熵梯度反演(Entropy-Gradient Inversion)及其相关优化技术(CorR-PO),通过关联词元熵与logit梯度来改进推理。另一篇被撤回的论文LambdaPO,旨在通过重新构想优势估计以获得更细粒度的偏好信号,从而增强强化学习的对齐。第三篇论文引入了凸组合能量最小化(Convex Compositional Energy…
-
OpenAI launches affordable GPT-4o mini and open-weight gpt-oss models
OpenAI has released GPT-4o mini, a new, highly cost-efficient small model designed to broaden AI accessibility and application development. This model demonstrates superior performance on benchmarks like MMLU, MGSM, and…
-
[GRPO Explained] DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Researchers are developing new benchmarks and evaluation methods for large language models (LLMs) in mathematical reasoning and educational assessment. New datasets like ESTBook and Math-PT aim to go beyond simple accur…