Brief

last 24h

[3/3] 221 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · 量子位 (QbitAI) 中文(ZH) · 21h

Ant Lingbo LingBot-VA Paper Accepted by Top Robotics Conference RSS 2026, Enabling Robots to Reason While Acting

Ant Group's LingBot-VA, a causal world modeling framework for robot control, has been accepted into the prestigious Robotics: Science and Systems (RSS) 2026 conference. This framework enables robots to predict environmental changes before acting, mimicking human-like observation, judgment, and action. LingBot-VA utilizes a Mixture-of-Transformers architecture and has demonstrated high success rates on simulated and real-world robotic tasks, showcasing strong data efficiency and generalization capabilities. The research aims to advance robots from simple instruction followers to systems with enhanced environmental understanding and autonomous decision-making. AI

IMPACT Advances robot control by enabling predictive world modeling, potentially leading to more autonomous and adaptable robotic systems.
RESEARCH · arXiv cs.AI English(EN) · 1w · [2 sources]

Key-Gram: Extensible World Knowledge for Embodied Manipulation

Researchers have developed Key-Gram, a new framework designed to improve embodied control systems by separating linguistic knowledge from visual reasoning. This approach uses a conditional-memory module to store and retrieve instruction-derived knowledge, allowing the main model backbone to focus on visual processing and action inference. Key-Gram has demonstrated significant performance gains across various robotic manipulation tasks, including RoboTwin2.0 and real-world dual-arm scenarios, by enhancing compositional grounding and transfer learning. AI

IMPACT Externalizing linguistic memory in embodied AI could lead to more adaptable and efficient robotic systems capable of complex instruction following.
TOOL · arXiv cs.AI English(EN) · 4d

VLANeXt: Recipes for Building Strong VLA Models

Researchers have developed VLANeXt, a new Vision-Language-Action (VLA) model that improves upon existing architectures by systematically analyzing and optimizing design choices. Through a unified framework and evaluation setup, they identified 12 key findings that form a practical recipe for building strong VLA models. VLANeXt demonstrates superior performance on benchmarks like LIBERO and LIBERO-plus, and shows effectiveness in real-world applications. The team has also released a comprehensive codebase to facilitate reproduction and further development in the VLA space. AI

IMPACT Provides a structured approach and reproducible codebase for developing more capable Vision-Language-Action models.

Brief

Ant Lingbo LingBot-VA Paper Accepted by Top Robotics Conference RSS 2026, Enabling Robots to Reason While Acting

Key-Gram: Extensible World Knowledge for Embodied Manipulation

VLANeXt: Recipes for Building Strong VLA Models