I Ran Five Small Multimodal Models on a Jetson. The Fastest One Was Not the Best Baseline.
A recent test of five small multimodal models on a Jetson device for an industrial edge AI runtime found that Gemma 4 E2B remained the baseline despite not being the fastest. While SmolVLM2 was the quickest, its outputs were too generic. Qwen2.5-VL showed strong performance, particularly in OCR and visual inspection tasks, making it a serious contender. InternVL3 struggled with context errors and latency at higher settings, and Qwen2.5-Omni is better suited for future audio/video workflows. The selection criteria emphasized local deployment, structured output, and integration within a system that provides audit trails and confirmation gates, favoring Gemma 4 E2B for its overall fit. AI
IMPACT Edge AI model selection prioritizes system integration and auditability over raw speed, guiding practical deployment strategies.