基于 Mamba 的 OCR 模型速度提升但手写体识别落后

作者 PulseAugur 编辑部 · [2 个来源] · 2026-06-22 16:07

一项新研究探讨了基于 Mamba 的状态空间模型 (SSM) 在光学字符识别 (OCR) 任务中的性能，特别关注其从短行到完整段落的可扩展性。研究人员发现，虽然 SSM 在长序列上比 Transformer 具有显著的速度优势，并且在干净的合成数据上能达到可比的准确性，但由于数据稀缺，它们在处理真实手写体时遇到困难。研究确定解码器深度和状态维度是提高 SSM 在 OCR 中长序列准确性的关键超参数。 AI

影响 SSM 在长文本 OCR 方面显示出更快的潜力，但手写体识别需要更多数据。

排序理由研究论文，详细介绍了特定任务的特定模型架构的消融研究。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.LG TIER_1 English(EN) · Merveilles Agbeti-Messan, Pierrick Tranouez, St\'ephane Nicolas, Cl\'ement Chatelain, Thierry Paquet · 2026-06-24 04:00

A Benchmark of State-Space Models vs. Transformers and BiLSTM-based Models for Historical Newspaper OCR

arXiv:2604.00725v2 Announce Type: replace-cross Abstract: End-to-end OCR for historical newspapers remains challenging, as models must handle long text sequences, degraded print quality, and complex layouts. While Transformer-based recognizers dominate current research, their qua…
arXiv cs.CV TIER_1 English(EN) · Thierry Paquet · 2026-06-22 16:07

Scaling State-Space Models from Lines to Paragraphs: An Ablation of Mamba-based OCR

End-to-end OCR increasingly relies on autoregressive sequence models, where the quadratic cost of Transformer attention limits efficient transcription of long, paragraph-level text. State-Space Models (SSMs) such as Mamba offer linear-time decoding and have recently been shown to…

报道来源 [2]

A Benchmark of State-Space Models vs. Transformers and BiLSTM-based Models for Historical Newspaper OCR

Scaling State-Space Models from Lines to Paragraphs: An Ablation of Mamba-based OCR

相关实体

相关话题