Researchers have developed Naamah, a synthetic dataset of over 100,000 Sanskrit sentences designed to improve Named Entity Recognition (NER) for classical Sanskrit literature. The dataset was generated by combining entity extraction from DBpedia with a 24-billion parameter hybrid reasoning model. Naamah aims to overcome the scarcity of annotated resources and was used to benchmark XLM RoBERTa and IndicBERTv2 transformer architectures. AI
影响 Provides a crucial dataset for advancing NLP capabilities in classical Sanskrit, potentially enabling new research and applications.
排序理由 Academic paper introducing a new dataset for a specific NLP task.
AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →