Whispers

last 72h

[16/16]

The long tail — singletons that escape Brief because nobody else has noticed yet. High novelty, narrow audience, AI-relevant. The opposite signal of consensus.

RESEARCH · Mastodon — mastodon.social Polski(PL) · 4d · [3 sources]

The latest Claude Mythos Preview model has reached the limits of METR organization's research methodology, demonstrating capabilities beyond current measurement standards.

Anthropic's Claude Mythos Preview model has demonstrated capabilities that push the boundaries of current evaluation methodologies, according to METR. The model achieved completion times of over 16 hours for 50% of tasks and 3 hours for 80%, surpassing previous benchmarks. This advancement highlights the rapid progress in AI capabilities and raises questions about the adequacy of existing assessment tools. AI

IMPACT Demonstrates AI models are outpacing current evaluation benchmarks, signaling a need for new assessment tools.
TOOL · Mastodon — sigmoid.social · 3d

Frontier LLMs corrupt 25% of documents in long workflows per new benchmark, while a Fields Medalist reports ChatGPT 5.5 Pro solving PhD-level math. Mayo Clinic

A new benchmark reveals that frontier large language models degrade approximately 25% of documents during extended workflows. Separately, a Fields Medal winner has reported that ChatGPT 5.5 Pro is capable of solving complex PhD-level mathematics problems. AI

IMPACT New benchmarks highlight potential data corruption issues with frontier LLMs, while advanced models demonstrate capabilities in complex academic domains.
TOOL · Medium — Claude tag · 1d

Welcome, Mythos.

Mythos, a new AI model, has been introduced, described as "The Day AI Sat on Bedrock." The announcement was made on Medium, with further details available via a link to the platform. AI

IMPACT Introduction of a new AI model, potentially impacting future AI development and applications.
RESEARCH · 雷峰网 (Leiphone) 中文(ZH) · 2d

Magic Atomic Lands in Silicon Valley, Industry's First 'Self-Evolving Embodied Brain' Released

MagicLab, a Chinese embodied AI company, hosted the Global Embodied Intelligence Summit (GEIS) in Silicon Valley, launching its "self-evolving embodied brain" called Magic-Mix. This new world model aims to address key industry challenges such as robots lacking physical common sense and precise manipulation. MagicLab also unveiled the H01 dexterous hand with advanced sensing and the MagicBot X1 humanoid robot, designed for heavy-duty industrial tasks and expected to reach mass commercial delivery by 2026. AI

IMPACT Sets new benchmarks for embodied AI capabilities, potentially accelerating the development and deployment of advanced robotics in industrial and consumer applications.
RESEARCH · 雷峰网 (Leiphone) 中文(ZH) · 2d

8 million Robotaxis in three years, 300,000 in 2030, what's the basis for Yin Qi and Zhao Ming?

Challenger startup Qianli Technology, co-founded by AI veteran Yin Qi and former Honor CEO Zhao Ming, aims to become a top global autonomous driving supplier within three years. The company is pursuing an aggressive strategy of deploying L4-level autonomous driving architecture into L2 production vehicles, leveraging a unified technical framework and a proprietary foundational model developed with Jieyue Xingchen. Qianli Technology has set ambitious targets, including delivering 8 million sets of intelligent driving solutions in three years and having 300,000 Robotaxis on the road by 2030, with early commercial successes seen in the Zeekr 8X model. AI

IMPACT Sets aggressive targets for L4-level autonomous driving in consumer vehicles, potentially accelerating the adoption of advanced driver-assistance systems and Robotaxi services.
TOOL · arXiv cs.CV (TL) · 2d

Count Anything at Any Granularity

Researchers have introduced a new framework for open-world object counting, addressing the brittleness of current vision-language models in accurately identifying and counting objects based on user intent. They propose redefining counting as a multi-grained problem, where both visual examples and detailed text prompts, including negative prompts, specify the target appearance and semantic granularity. To overcome the data limitations for this approach, they developed an automated pipeline using 3D synthesis and VLM filtering to create KubriCount, the largest dataset for counting tasks. Their new model, HieraCount, leverages both text and visual exemplars to significantly improve multi-grained counting accuracy and generalize to real-world scenarios. AI

IMPACT Introduces a more robust method for object counting, potentially improving applications that rely on visual scene understanding and quantification.
TOOL · arXiv cs.CL Suomi(FI) · 2d

Key-Value Means

Researchers have introduced Key-Value Means (KVM), a new attention mechanism for transformers that can handle both fixed-size and growing states. When implemented with a fixed-size cache, KVM functions as an O(N) chunked RNN with minimal parameter additions. A growable KVM cache version demonstrates competitive performance on long-context tasks, offering subquadratic prefill time and sublinear state growth. This approach is compatible with standard operations, supports chunk-wise parallelizable training, and provides a flexible trade-off between prefill time complexity and memory usage. AI

IMPACT Introduces a novel attention mechanism that improves transformer efficiency for long-context tasks.
TOOL · dev.to — LLM tag · 4d

I fine-tuned a bias judge for $30. The training was the easy part.

A developer fine-tuned Google's Gemma 4 E4B model into a bias judge for approximately $30, a process that took two weeks with most of the effort focused on data pipeline construction rather than GPU time. The resulting model, capable of running locally in 30 seconds, evaluates pairs of responses to identify social bias using the Bias Benchmark for QA (BBQ) dataset. The developer encountered challenges with classification leaks, data ceilings imposed by the BBQ dataset, and disagreements among different LLMs used for labeling, ultimately leading to a refined data construction strategy. AI

IMPACT Demonstrates cost-effective fine-tuning of open-source models for specialized tasks like bias detection, potentially lowering barriers for AI safety research.
TOOL · 36氪 (36Kr) 中文(ZH) · 14h

Hanvon Technology Releases Handwriting Pen M6

Hanwang Technology has launched the M6, a device that combines recording, note-taking, and reading functionalities. The M6 supports real-time translation for 51 languages, enabling seamless cross-lingual meeting experiences. It integrates Hanwang's proprietary 'Tiandi' large model, along with other models like DeepSeek and Tongyi Qianwen, to provide AI assistance for tasks such as summarizing meeting highlights and drafting documents. AI

IMPACT Integrates existing large language models into a hardware device to enhance productivity for cross-lingual communication.
RESEARCH · 36氪 (36Kr) 中文(ZH) · 23h

Scotiabank Canada: Global copper market expected to see a deficit of 350,000 tons in 2027

Xunfei's Doubao LLM is reportedly receiving enhanced capabilities, though specific details remain undisclosed. Separately, Scenovation Technology has secured nearly $100 million in Series C funding, led by Suzhou Industrial Park Investment Group, to advance its automotive and embodied AI chip development. Additionally, a report from Scotiabank predicts a global copper deficit of 350,000 tons by 2027, driven by robust demand and supply-side challenges. AI

IMPACT AI advancements in chip technology and LLMs continue, while market predictions highlight resource constraints impacting future AI development.
TOOL · Email — The Neuron Daily · 3d

😺 Hermes is eating OpenClaw's lunch

Nous Research has released version 0.13.0 of its Hermes Agent, a personal AI assistant that learns user workflows over time. This new release, dubbed "The Tenacity Release," saw significant development with 864 commits from 295 contributors in a single week and patched eight critical security vulnerabilities. Early adoption indicates about 30% of users have migrated from the previous OpenClaw assistant, citing improved setup, memory management, and a self-improving learning capability. AI

IMPACT Personal AI agents are becoming more capable, enabling users to build complex applications with natural language and learn user workflows.
TOOL · dev.to — LLM tag · 4d

Fed 15 papers into Gemma 4. Got back a hypothesis none of them actually state — with a null hypothesis, experiment design, and a confidence score that drops when the model reviews itself.

A developer fed 15 scientific papers into Google's Gemma 4 model to test its hypothesis generation capabilities. The model produced a hypothesis that was not explicitly stated in any of the provided papers. Interestingly, when the model was asked to review its own generated hypothesis, its confidence score decreased. AI

IMPACT Demonstrates potential for LLMs to assist in scientific discovery by generating novel hypotheses from existing literature.
RESEARCH · 36氪 (36Kr) 中文(ZH) · 1d

Lantu Motors: Dongfeng Hong Kong increases holdings by 20.192 million H shares

Samsung Electronics is set to begin providing samples of its next-generation CXL 3.1 memory modules (CMM-D) to major server and data center manufacturers in the third quarter. Following customer quality certification, the company plans to initiate mass production preparations, including finalizing production scale and schedules for the fourth quarter. Separately, Google's new Gemini Omni model has been previewed, showcasing its ability to accurately interpret and process video content, including complex academic scenarios. AI

IMPACT Samsung's CXL 3.1 memory module samples will enable faster data processing for AI workloads, while Gemini Omni's video capabilities could enhance AI's understanding of complex real-world scenarios.
RESEARCH · 量子位 (QbitAI) 中文(ZH) · 1d

SenseTime's "Goodwill" Siu Mai Robot Store Opens in Shanghai, Bringing Robots to Offline Retail

SenseMartGo, a new robotic convenience store solution from SenseTime's SenseTime Huihui, has opened its first three locations in Shanghai. These stores utilize embodied AI to handle all retail tasks, from sales to inventory management, and can operate autonomously 24/7. The system aims to redefine offline retail by integrating AI-driven operations, diverse product offerings including non-standard items, and personalized customer interactions. AI

IMPACT This launch signifies a step towards autonomous, AI-powered retail operations, potentially impacting efficiency and customer experience in the sector.
RESEARCH · 雷峰网 (Leiphone) 中文(ZH) · 2d

After releasing the A10, can Leapmotor challenge 100,000 monthly sales?

Chinese electric vehicle maker Leapmotor has launched its new A10 model, priced between 65,800 and 86,800 yuan, aiming to capture a significant share of the 100,000 yuan market segment. The A10 differentiates itself with advanced intelligent driving features, including lidar and high-end Qualcomm chips, alongside an extended pure electric range of up to 500km. Leapmotor plans to release several new models this year, targeting a total sales goal of 1 million vehicles by 2026, while also aiming to elevate its brand image. AI

IMPACT Accelerates the integration of advanced driver-assistance systems into the mass-market EV segment.
RESEARCH · 36氪 (36Kr) 中文(ZH) · 4d

Sulfur prices have risen by about 80% this year, and downstream titanium dioxide and phosphate fertilizer companies are taking multiple measures to control costs

DeepSeek, a prominent AI research lab, is reportedly planning a significant fundraising round, aiming to secure up to 50 billion yuan. This move signals continued investment and growth within the AI sector, particularly for companies focused on advanced model development. The news comes amidst broader market trends of rising costs in raw materials like sulfur, impacting downstream industries. AI

IMPACT Confirms continued strong investor appetite for AI research labs, potentially fueling further frontier model development.