实体 Massive Multitask Language Understanding

Massive Multitask Language Understanding

PulseAugur coverage of Massive Multitask Language Understanding — every cluster mentioning Massive Multitask Language Understanding across labs, papers, and developer communities, ranked by signal.

Show in brief

总计 · 30天

90 天内 30

发布 · 30天

90 天内 0

论文 · 30天

90 天内 27

层级分布 · 90 天

frontier release 2
significant 1
research 12
tool 11
commentary 4

关系

instance of helmet 90%
instance of HumanEval 70%
instance of GPQA: A Graduate-Level Google-Proof Q&A Benchmark 70%
instance of large-language models 70%
used by GSM8K 70%

情绪 · 30 天

7 天有情绪数据

最近 · 第 2/2 页 · 共 30 条

RESEARCH · CL_05211 · Apr 27 · 04:00

Language agents use auction to cut communication costs and boost reasoning

Researchers have developed a new framework called DALA (Dynamic Auction-based Language Agent) to improve communication efficiency in multi-agent systems powered by large language models. This system treats communication…
RESEARCH · CL_00834 · Nov 1 · 15:31

In the Arena: How LMSys changed LLM Benchmarking Forever

The AraGen benchmark, developed by Hugging Face, aims to improve LLM evaluation by addressing limitations of static benchmarks. It introduces a crowdsourced approach similar to LMSys's Chatbot Arena, allowing for more d…
FRONTIER RELEASE · CL_01020 · Sep 12 · 10:02

OpenAI 的 o1 模型展现出高级推理能力，而谷歌和苹果则在探索新的 LLM 训练方法。

OpenAI 发布了其新模型 OpenAI o1-preview 的早期版本，该模型在推理能力方面相比 GPT-4o 有显著提升。该模型在竞赛编程、高级数学考试和复杂的科学基准测试中表现出色，在某些领域超越了人类专家的表现。这种进步归功于一种大规模强化学习算法，该算法通过思维链教会模型进行生产性思考，并且性能随着训练和测试时间的计算量而扩展。
COMMENTARY · CL_01323 · Sep 9 · 17:28

大型语言模型在纠正错误方面有多好？一项使用 Keras 和 TPU 的聊天机器人竞技场实验

当前评估大型语言模型的方法，如 MMLU 和 HumanEval，可能不足以捕捉交互式、目标导向对话的细微差别。更有效的方法是根据聊天机器人在多轮对话中与用户互动以实现特定目标的能力来评估它们，这模仿了人类的互动模式。这种“有目的的对话”可以增强用户体验并解锁新功能，即使在代码生成和个性化助手等领域也是如此。
FRONTIER RELEASE · CL_01024 · May 13 · 22:58

OpenAI launches affordable GPT-4o mini and open-weight gpt-oss models

OpenAI has released GPT-4o mini, a new, highly cost-efficient small model designed to broaden AI accessibility and application development. This model demonstrates superior performance on benchmarks like MMLU, MGSM, and…
RESEARCH · CL_17729 · Apr 4 · 19:11

机器学习视觉导览 (2015)

本资源集提供了机器学习的广泛概述，涵盖了从基础概念、视觉导览到理论基础和实际应用。它包括一个分类任务的视觉指南，对机器学习基准的科学和伦理的讨论，以及全面的教科书和课程材料的链接。此外，它还重点介绍了可解释机器学习的工具以及在生产环境中部署模型所需的工程实践。
COMMENTARY · CL_04674 · Oct 9 · 00:00

Eugene Yan shares insights on LLM system building and AI engineering trends

Eugene Yan presented key learnings from building with Large Language Models (LLMs) at the AI Engineer World's Fair 2024. The keynote, co-authored with others, focused on practical aspects of LLM system development, incl…
RESEARCH · CL_32532 · Sep 18 · 00:00

3D Gaussian Splatting advances scene representation and editing

Researchers are advancing 3D Gaussian Splatting (3DGS) with new methods for improved scene representation, editing, and compression. Innovations include Skew-Normal Splatting for better modeling of asymmetric structures…
RESEARCH · CL_01274 · May 24 · 00:00

Hugging Face 推出用于高效 LLM 的先进量化技术

研究人员正在开发先进的量化技术，以提高大型语言模型 (LLM) 的效率。AutoRound、LATMiX 和 GSQ 等新方法旨在减小模型大小和计算需求，从而能够在功能较弱的硬件上进行部署。这些方法侧重于优化模型权重和激活在较低比特宽度下的表示方式，其中一些方法已达到与更高精度模型相当的准确性。创新包括用于训练后量化的新颖校准策略和用于提高鲁棒性的可学习仿射变换。
FRONTIER RELEASE · CL_02508 · Mar 14 · 07:00

OpenAI launches GPT-4, a multimodal model showing human-level performance on benchmarks

OpenAI has released GPT-4, a large multimodal model capable of processing both text and image inputs to generate text outputs. This new model demonstrates human-level performance on various professional and academic ben…

Language agents use auction to cut communication costs and boost reasoning

In the Arena: How LMSys changed LLM Benchmarking Forever

OpenAI 的 o1 模型展现出高级推理能力，而谷歌和苹果则在探索新的 LLM 训练方法。

大型语言模型在纠正错误方面有多好？一项使用 Keras 和 TPU 的聊天机器人竞技场实验

OpenAI launches affordable GPT-4o mini and open-weight gpt-oss models

机器学习视觉导览 (2015)

Eugene Yan shares insights on LLM system building and AI engineering trends

3D Gaussian Splatting advances scene representation and editing

Hugging Face 推出用于高效 LLM 的先进量化技术

OpenAI launches GPT-4, a multimodal model showing human-level performance on benchmarks