English(EN) Scaling State-Space Models from Lines to Paragraphs: An Ablation of Mamba-based OCR

基于 Mamba 的 OCR 模型速度提升但手写体识别落后

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-22 16:07

一项新研究探讨了基于 Mamba 的状态空间模型 (SSM) 在光学字符识别 (OCR) 任务中的性能，特别关注其从短行到完整段落的可扩展性。研究人员发现，虽然 SSM 在长序列上比 Transformer 具有显著的速度优势，并且在干净的合成数据上能达到可比的准确性，但由于数据稀缺，它们在处理真实手写体时遇到困难。研究确定解码器深度和状态维度是提高 SSM 在 OCR 中长序列准确性的关键超参数。 AI

影响 SSM 在长文本 OCR 方面显示出更快的潜力，但手写体识别需要更多数据。

排序理由研究论文，详细介绍了特定任务的特定模型架构的消融研究。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CV TIER_1 English(EN) · Thierry Paquet · 2026-06-22 16:07

Scaling State-Space Models from Lines to Paragraphs: An Ablation of Mamba-based OCR

End-to-end OCR increasingly relies on autoregressive sequence models, where the quadratic cost of Transformer attention limits efficient transcription of long, paragraph-level text. State-Space Models (SSMs) such as Mamba offer linear-time decoding and have recently been shown to…

报道来源 [1]

Scaling State-Space Models from Lines to Paragraphs: An Ablation of Mamba-based OCR

相关实体

相关话题