ChemQuests: A Curated Chemistry Question-Answer Database Extracted from ChemRxiv papers
Researchers have developed ChemQuests, a new dataset containing 952 question-answer pairs extracted from chemistry papers on ChemRxiv. This dataset, created using a pipeline involving OCR, GPT-4o for QA generation, and fuzzy-search verification, aims to support natural language processing in chemistry. ChemQuests is designed for applications such as retrieval-based QA systems, search engine development, and fine-tuning large language models for the chemistry domain. AI
IMPACT Provides a specialized dataset to improve AI's understanding and application of chemistry knowledge.