Text-to-SPARQL Generation with Reinforcement Learning: A GRPO-based Approach on DBLP
Researchers have explored using reinforcement learning to train smaller language models for zero-shot Text-to-SPARQL generation, a task crucial for knowledge graph question answering. They applied Group-Relative Policy Optimization (GRPO) to the Qwen3-1.7B model, utilizing execution feedback and answer-level rewards instead of requiring gold query annotations. The GRPO-trained models showed significant improvement over a zero-shot baseline, demonstrating the viability of outcome-based reinforcement learning for this task when full supervision is unavailable. AI
IMPACT Demonstrates a viable method for training smaller models on complex tasks without extensive labeled data, potentially lowering barriers to knowledge graph querying.