How to Use AgentTrove: Streaming 1.7M Agentic Traces and Building a Clean ShareGPT SFT Dataset in Python
This tutorial introduces AgentTrove, a large open-source dataset of agentic interaction traces, accessible via streaming to avoid full downloads. It details methods for inspecting conversation schemas, normalizing turns, and parsing agent outputs, including shell commands. The process also covers creating a clean ShareGPT-style dataset for supervised fine-tuning by summarizing statistics and visualizing patterns from thousands of traces. AI
IMPACT Enables researchers to efficiently analyze and fine-tune agent models using a large, accessible dataset.