Brief · PulseAugur

TOOL · arXiv cs.CL English(EN) · 6d

Text-to-SPARQL Generation with Reinforcement Learning: A GRPO-based Approach on DBLP

Researchers have explored using reinforcement learning to train smaller language models for zero-shot Text-to-SPARQL generation, a task crucial for knowledge graph question answering. They applied Group-Relative Policy Optimization (GRPO) to the Qwen3-1.7B model, utilizing execution feedback and answer-level rewards instead of requiring gold query annotations. The GRPO-trained models showed significant improvement over a zero-shot baseline, demonstrating the viability of outcome-based reinforcement learning for this task when full supervision is unavailable. AI

IMPACT Demonstrates a viable method for training smaller models on complex tasks without extensive labeled data, potentially lowering barriers to knowledge graph querying.

Qwen3-1.7B
GRPO
DBLP-QuAD