Korean legal chatbot uses novel dataset generation for 91% accuracy

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-08 08:32

Researchers have developed a novel method for generating large, labeled datasets for Korean legal chatbots, addressing the challenge of high labeling costs. Their approach utilizes local grammar graphs (LGGs) to create diverse utterances and associated labels, which are then used to train a DIET classifier. This method produced 700 million utterances and resulted in a chatbot named LIGA that achieved a 91% F1-score in identifying relevant legal cases. AI

影响 This dataset generation technique could improve access to legal information by enabling more accurate and cost-effective development of legal chatbots.

排序理由 Academic paper detailing a new method for generating training data for a specific AI application. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CL TIER_1 English(EN) · Eric Laporte · 2026-05-08 08:32

Generating training datasets for legal chatbots in Korean

Chatbots are robots that can communicate with humans using text or voice signals. Legal chatbots improve access to justice, since legal representation and legal advice by lawyers come with a high cost that excludes disadvantaged and vulnerable people. However, capturing the diver…

报道来源 [1]

Generating training datasets for legal chatbots in Korean

相关实体

相关话题