PulseAugur
EN
LIVE 08:19:31

RhinoVLA model enables real-time robotic control on edge hardware

Researchers have developed RhinoVLA, a Vision-Language-Action model designed for real-time robotic manipulation on edge hardware. The model utilizes a token-efficient Qwen3-VL backbone and a continuous Action Expert to reduce computational load and latency. RhinoVLA also introduces a unified interface for cross-robot learning and is optimized for hardware deployment, achieving comparable downstream performance to existing models while meeting a 10 Hz real-time control target. AI

IMPACT Enables real-time robotic manipulation on edge devices, potentially accelerating autonomous systems.

RANK_REASON The cluster contains a technical report detailing a new model and its performance on specific hardware.

Read on arXiv cs.LG →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

  1. arXiv cs.LG TIER_1 English(EN) · Huixi Intelligence, :, Chen Zhang, Chenyang Zhou, Guanglei Ding, Guanghui He, Haibin Gao, Jiajia Chen, Jianyong Zhang, Lianyi Yu, Ningyi Xu, Ping Xu, Qingchen Li, Yingjun Hu, Yijia Zhang, Yuxi Liu ·

    RhinoVLA Technical Report

    arXiv:2606.07383v1 Announce Type: cross Abstract: Vision-Language-Action (VLA) models have shown strong potential for robotic manipulation, but real-time deployment on edge hardware remains challenging. In this work, we identify VLM visual and context tokens as a major source of …

  2. arXiv cs.LG TIER_1 English(EN) · Yuxi Liu ·

    RhinoVLA Technical Report

    Vision-Language-Action (VLA) models have shown strong potential for robotic manipulation, but real-time deployment on edge hardware remains challenging. In this work, we identify VLM visual and context tokens as a major source of deployment latency: for GEMM-dominated projection …