DeepSeek OCR
PulseAugur coverage of DeepSeek OCR — every cluster mentioning DeepSeek OCR across labs, papers, and developer communities, ranked by signal.
6 day(s) with sentiment data
DeepSeek OCR's R-SWA attention mechanism to be applied beyond OCR
The Unlimited OCR model's core innovation, Reference Sliding Window Attention (R-SWA), is explicitly noted as being applicable to other sequence-based tasks such as Automatic Speech Recognition (ASR) and translation. This indicates a potential for broader impact and adoption of this attention mechanism across various NLP domains.
Unlimited OCR addresses key limitations in long-document processing
The development of Unlimited OCR, utilizing Reference Sliding Window Attention (R-SWA) to maintain a constant KV cache, directly tackles the memory and speed bottlenecks that plague current OCR systems when processing extensive documents. This innovation is a significant step towards efficient, single-pass transcription of multi-page documents.
DeepSeek OCR's Unlimited OCR to see integration with vLLM and SGLang
Baidu's release of Unlimited OCR, which builds on DeepSeek OCR, highlights its integration with inference providers like vLLM and SGLang. This suggests a strategic push to make the technology more accessible and performant for real-world applications, especially those dealing with long documents.
-
Baidu releases Unlimited OCR, challenging long-context AI memory mechanisms · 1 source tracked
Baidu has open-sourced a new OCR model called Unlimited OCR, which excels at processing long documents by mimicking human reading habits. Unlike traditional OCR systems that process documents page by page and then stitc…
-
Open-source OCR models and benchmarks consolidated on Papers with Code
A new resource has been created to track open-source optical character recognition (OCR) models, consolidating information on top-performing models, benchmarks, and links to their papers and code. This initiative highli…
-
Unsloth Studio boosts GLM-5.2 support with 3x longer context
Unsloth has released version 0.1.471-beta, introducing support for GLM-5.2 and enhancing context length capabilities. The update features an auto-fit algorithm that allows for three times longer context windows, enablin…
-
Unlimited OCR model uses new attention to process long documents efficiently
Researchers have developed Unlimited OCR, a new model that addresses the memory and speed limitations of current OCR systems when processing long documents. By replacing standard attention layers with Reference Sliding …
-
Baidu releases Unlimited OCR with constant KV cache for long documents
Baidu has released Unlimited OCR, a 3-billion-parameter Mixture-of-Experts model designed for efficient long-document parsing. The model utilizes Reference Sliding Window Attention (R-SWA) to maintain a constant KV cach…
-
Unsloth Studio boosts context length by 3x with GLM 5.2 support
Unsloth Studio has released version 0.1.47-beta, introducing support for GLM 5.2 GGUFs and an improved auto-fit algorithm that enables three times longer context lengths. This update also brings enhanced features such a…
-
Spotlight system cuts DiT RL post-training costs using spot GPUs
Researchers have developed Spotlight, a novel system designed to significantly reduce the cost of post-training Diffusion Transformers (DiTs) for reinforcement learning. By leveraging insights into exploration tolerance…
-
Study finds PDF conversion quality crucial for RAG question-answering
A new study published on arXiv evaluates four open-source PDF-to-Markdown conversion frameworks for their impact on domain-specific question-answering accuracy within Retrieval-Augmented Generation (RAG) systems. The re…
-
New multi-agent system automates document processing, cuts costs and emissions
Researchers have developed MADP, a multi-agent system designed to automate document processing in enterprise settings. The system combines deep learning for classification and parsing with large language models for extr…
-
RTPrune boosts DeepSeek-OCR inference speed by 1.23x with novel token pruning
Researchers have developed RTPrune, a novel two-stage token pruning method designed to enhance the efficiency of DeepSeek-OCR inference. This method mimics the model's two-stage reading process, first prioritizing high-…
-
In the Arena: How LMSys changed LLM Benchmarking Forever
The AraGen benchmark, developed by Hugging Face, aims to improve LLM evaluation by addressing limitations of static benchmarks. It introduces a crowdsourced approach similar to LMSys's Chatbot Arena, allowing for more d…