PulseAugur
实时 23:56:08
Türkçe(TR) Browser agent için 8 gorsel LLM'i ekran goruntusu temellendirmede kıyasladık. Şaşırtıcı bulgu: Qwen 3.5-9B, 308B parametreli MiMo V2.5'in kaçırdığı bir dropdown

Qwen 3.5-9B LLM outperforms MiMo V2.5 in browser agent screenshot grounding

A comparison was conducted on eight visual large language models (LLMs) for browser agents, focusing on their ability to ground screenshots. The surprising finding was that Qwen 3.5-9B outperformed MiMo V2.5, a model with 308 billion parameters, in this task. AI

影响 Highlights potential for smaller models to outperform larger ones in specific visual grounding tasks for agents.

排序理由 Comparison of multiple LLMs on a specific task, presented as a research finding. [lever_c_demoted from research: ic=1 ai=1.0]

在 Mastodon — sigmoid.social 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →

Qwen 3.5-9B LLM outperforms MiMo V2.5 in browser agent screenshot grounding

报道来源 [1]

  1. Mastodon — sigmoid.social TIER_1 Türkçe(TR) · [email protected] ·

    We compared 8 visual LLMs for browser agents in screenshot grounding. Surprising finding: Qwen 3.5-9B, a dropdown missed by MiMo V2.5 with 308B parameters

    Browser agent için 8 gorsel LLM'i ekran goruntusu temellendirmede kıyasladık. Şaşırtıcı bulgu: Qwen 3.5-9B, 308B parametreli MiMo V2.5'in kaçırdığı bir dropdown affordance'ını doğru sınıflandırıyor. Affordance parametre sayısıyla ölçeklenmiyor. 8 modelden sadece 1'i (Qwen 3.6-35B…