FormalASR system converts spoken Chinese to formal text end-to-end

By PulseAugur Editorial · [1 sources] · 2026-05-19 02:27

Researchers have developed FormalASR, a novel end-to-end system designed to directly convert spoken Chinese into formal written text. This approach bypasses the need for a separate large language model (LLM) for post-editing, reducing latency and computational costs for on-device applications. FormalASR utilizes fine-tuned Qwen3-ASR models at 0.6B and 1.7B parameters, trained on newly created datasets, WenetSpeech-Formal and Speechio-Formal, achieving significant reductions in character error rate and improvements in text quality metrics. AI

IMPACT Offers a more efficient, on-device solution for spoken-to-written text conversion, reducing reliance on multi-stage LLM pipelines.

RANK_REASON The cluster describes a new academic paper detailing a novel model and dataset for speech recognition. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

FormalASR system converts spoken Chinese to formal text end-to-end

COVERAGE [1]

arXiv cs.CL TIER_1 English(EN) · Yufei Zhang · 2026-05-19 02:27

FormalASR: End-to-End Spoken Chinese to Formal Text

Automatic speech recognition (ASR) systems are typically optimized for verbatim transcription, which preserves disfluencies, filler words, and informal spoken structures that are often unsuitable for downstream writing-oriented applications. A common workaround is a two-stage ASR…

COVERAGE [1]

FormalASR: End-to-End Spoken Chinese to Formal Text

RELATED ENTITIES

RELATED TOPICS