PulseAugur
实时 03:32:19
实体 Artificial Analysis

Artificial Analysis

PulseAugur coverage of Artificial Analysis — every cluster mentioning Artificial Analysis across labs, papers, and developer communities, ranked by signal.

Show in brief
总计 · 30天
13
90 天内 13
发布 · 30天
0
90 天内 0
论文 · 30天
2
90 天内 2
层级分布 · 90 天
情绪 · 30 天

5 天有情绪数据

最近 · 第 1/1 页 · 共 13 条
  1. SIGNIFICANT · CL_45509 ·

    阿里巴巴的 Qwen3.7-Max 领跑中国 LLM,全球排名第五

    第三方评估平台 Artificial Analysis 将阿里巴巴的 Qwen3.7-Max 评为中国表现最佳的大型语言模型,并位列全球第五。这款新的旗舰模型得分 56.6,超越了其他国内模型,并接近 GPT、Claude 和 Gemini 等国际领先模型的性能。Qwen3.7-Max 专为智能体任务设计,在编程、推理和工具利用方面取得了显著进步,能够处理具有大量工具调用的复杂、长期任务。

  2. COMMENTARY · CL_37856 ·

    LLM benchmarks mislead on inference speed for long contexts

    Current LLM inference benchmarks are misleading because they primarily measure short-context performance, which does not reflect real-world usage involving longer contexts. This discrepancy arises from the differing com…

  3. TOOL · CL_46562 ·

    Together AI models lead speech-to-text speed benchmarks

    Together AI's speech-to-text models have achieved the top two positions on the Artificial Analysis leaderboard for transcription speed. The NVIDIA Parakeet TDT 0.6B V3 model, running on Together AI, is currently ranked …

  4. RESEARCH · CL_27294 ·

    Luma Agents gain multimodal support; new coding agent benchmark released

    Artificial Analysis has released the Coding Agent Index, a benchmark evaluating AI coding agents across three key benchmarks, considering token usage and cost to aid in selection. Separately, Luma Labs has enhanced its …

  5. TOOL · CL_48108 ·

    AI 评估工具 IFBench 衡量提示遵循度

    Artificial Analysis 开发了 IFBench,这是一个旨在衡量 AI 模型在多大程度上遵循用户指令的评估工具。与许多很快就会饱和的其他基准测试不同,IFBench 保持有效,因为它评估了那些经常被忽视并持续挑战即使是先进 AI 模型方面的能力。该工具对于理解模型在标准性能指标之外的行为至关重要。

  6. SIGNIFICANT · CL_26035 ·

    Alibaba's Happy Horse-1.0 video model aims for cinematic storytelling

    Alibaba's Happy Horse-1.0 video generation model has entered a closed beta, aiming to advance beyond basic visual output to cinematic storytelling. Early tests show promise in maintaining character consistency across mu…

  7. TOOL · CL_25385 ·

    GPT-5.5 edges out Claude Opus on intelligence benchmark

    A recent analysis by Artificial Analysis indicates that GPT-5.5 has surpassed Claude Opus by three points on their intelligence benchmark. This benchmark evaluates models across categories like agents, coding, general k…

  8. TOOL · CL_18480 ·

    Artificial Analysis offers MiniMax-M2.7 with SambaNovaAI leading inference speed

    Artificial Analysis has made its MiniMax-M2.7 model available through six different inference providers, highlighting significant differences in speed and cost. SambaNovaAI leads in performance, achieving 435 tokens per…

  9. TOOL · CL_11949 ·

    AI advancements span robot manufacturing, dev tools, and cost-effective models

    A discussion is emerging around the potential for integrated model handoff stacks to serve as new Integrated Development Environments (IDEs), particularly for multimodal workflows involving image, vision, and 3D models.…

  10. FRONTIER RELEASE · CL_11237 ·

    X launches Grok 4.3 with improved agentic performance and lower price

    xAI has released Grok-4.3, a new iteration of its AI model, which offers improved agentic performance and a lower price point compared to its predecessor. The model achieved a significant increase of 321 ELO points on t…

  11. FRONTIER RELEASE · CL_10344 ·

    Alibaba's HappyHorse 1.0 leads AI video generation, with new GPT models and tools emerging

    Alibaba has released HappyHorse 1.0, an open-source AI video generator capable of producing 1080p videos with synchronized audio. This model, powered by a 15 billion parameter transformer, is being touted as the top AI …

  12. FRONTIER RELEASE · CL_40891 ·

    Google launches Gemini 3.5 Flash, Omni, and agent stack

    Google has launched Gemini 3.5 Flash, a new model designed for agentic workflows and coding tasks, available immediately across its consumer and developer platforms. This release also introduces Gemini Omni for multimod…

  13. SIGNIFICANT · CL_01759 ·

    Google DeepMind launches autonomous research agents powered by Gemini 3.1 Pro

    Google DeepMind has launched two new autonomous research agents, Deep Research and Deep Research Max, powered by Gemini 3.1 Pro. These agents are designed to securely analyze user-provided or third-party data, with Deep…