RhinoVLA model enables real-time robotic control on edge hardware

By PulseAugur Editorial · [2 sources] · 2026-06-05 15:21

Researchers have developed RhinoVLA, a Vision-Language-Action model designed for real-time robotic manipulation on edge hardware. The model utilizes a token-efficient Qwen3-VL backbone and a continuous Action Expert to reduce computational load and latency. RhinoVLA also introduces a unified interface for cross-robot learning and is optimized for hardware deployment, achieving comparable downstream performance to existing models while meeting a 10 Hz real-time control target. AI

IMPACT Enables real-time robotic manipulation on edge devices, potentially accelerating autonomous systems.

RANK_REASON The cluster contains a technical report detailing a new model and its performance on specific hardware.

Read on arXiv cs.LG →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

arXiv cs.LG TIER_1 English(EN) · Huixi Intelligence, :, Chen Zhang, Chenyang Zhou, Guanglei Ding, Guanghui He, Haibin Gao, Jiajia Chen, Jianyong Zhang, Lianyi Yu, Ningyi Xu, Ping Xu, Qingchen Li, Yingjun Hu, Yijia Zhang, Yuxi Liu · 2026-06-08 04:00

RhinoVLA Technical Report

arXiv:2606.07383v1 Announce Type: cross Abstract: Vision-Language-Action (VLA) models have shown strong potential for robotic manipulation, but real-time deployment on edge hardware remains challenging. In this work, we identify VLM visual and context tokens as a major source of …
arXiv cs.LG TIER_1 English(EN) · Yuxi Liu · 2026-06-05 15:21

RhinoVLA Technical Report

Vision-Language-Action (VLA) models have shown strong potential for robotic manipulation, but real-time deployment on edge hardware remains challenging. In this work, we identify VLM visual and context tokens as a major source of deployment latency: for GEMM-dominated projection …

COVERAGE [2]

RhinoVLA Technical Report

RhinoVLA Technical Report

RELATED ENTITIES

RELATED TOPICS