Researchers have developed Naamah, a synthetic dataset of over 100,000 Sanskrit sentences designed to improve Named Entity Recognition (NER) for classical Sanskrit literature. The dataset was generated by combining entity extraction from DBpedia with a 24-billion parameter hybrid reasoning model. Naamah aims to overcome the scarcity of annotated resources and was used to benchmark XLM RoBERTa and IndicBERTv2 transformer architectures. AI
Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →
IMPACT Provides a crucial dataset for advancing NLP capabilities in classical Sanskrit, potentially enabling new research and applications.
RANK_REASON Academic paper introducing a new dataset for a specific NLP task.