English(EN) Which Chinese open-source parser is better for Japanese RAG? It's a crossover — BM25 says DeepDoc, dense says MinerU

中文解析器DeepDoc、MinerU在日本RAG表现上出现交叉

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-13 14:29

对两个中文开源文档解析器DeepDoc和MinerU在日本RAG系统中的比较分析显示，基于所使用的检索方法，它们的性能出现了交叉。DeepDoc在使用BM25检索时表现出更优异的结果，而MinerU在使用密集检索时表现出色。这表明最佳解析器的选择取决于具体的检索策略，而不是某一个解析器普遍更好。 AI

影响文档解析器的选择显著影响RAG的性能，在日文场景下，密集检索倾向于MinerU，BM25检索倾向于DeepDoc。

排序理由文章对特定技术任务的两个开源工具进行了比较评估，包括方法和结果，这构成了研究。[lever_c_demoted from research: ic=1 ai=1.0]

在 dev.to — LLM tag 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

dev.to — LLM tag TIER_1 English(EN) · elvisyao007 · 2026-06-13 14:29

Which Chinese open-source parser is better for Japanese RAG? It's a crossover — BM25 says DeepDoc, dense says MinerU

<blockquote> <p>Final part of a series measuring Chinese open-source document parsing on Japanese documents.<br /> Repo + raw 3×2 results: <a href="https://github.com/elvisyao007/eval-driven-llm/tree/main/reports/deepdoc-eval-v2" rel="noopener noreferrer">https://github.com/elvis…

报道来源 [1]

Which Chinese open-source parser is better for Japanese RAG? It's a crossover — BM25 says DeepDoc, dense says MinerU

相关实体

相关话题