PulseAugur
EN
LIVE 02:31:52

LLM pipeline preserves document structure in Marathi-to-English translation

Researchers have developed a novel framework for translating government documents from Marathi to English, specifically addressing the challenge of preserving document structure and formatting. This system integrates layout-aware OCR, coordinate-based text extraction, and large language models to ensure that the translated documents maintain their original layout and hierarchical elements. Evaluations on real-world Marathi government PDFs show that this approach significantly improves structural preservation, translation coherence, and terminological consistency compared to standard text-only translation methods, aiming to enhance multilingual accessibility in e-governance. AI

IMPACT Enhances accessibility of government documents across languages, potentially streamlining administrative processes and policy analysis.

RANK_REASON Academic paper detailing a novel technical approach to document translation. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

LLM pipeline preserves document structure in Marathi-to-English translation

COVERAGE [1]

  1. arXiv cs.CL TIER_1 English(EN) · Manasi Waghe, Danish Chandargi, Mohammad Aamir Rayyan, Raviraj Joshi, A. R. Deshpande ·

    Structure-Preserving Document Translation via Multi-Stage LLM Pipeline: A Case Study in Marathi

    arXiv:2606.28796v1 Announce Type: new Abstract: Government documents in India are predominantly issued in regional languages such as Marathi, creating substantial accessibility barriers for non-native readers, interstate administrative bodies, and policy analysts. Although recent…