PulseAugur
EN
LIVE 07:40:38

Run RAG agent offline with LangGraph, Ollama, and embedded Qdrant

This article details how to run a Retrieval-Augmented Generation (RAG) agent entirely offline using LangGraph, Ollama, and an embedded Qdrant vector store. The setup avoids the need for API keys by configuring the system to use local models for both chat and embeddings. The author demonstrates how to swap between local Ollama and remote OpenAI providers through configuration, and how to switch between an embedded Qdrant instance and a remote server. The process involves setting up Ollama with specific models like Qwen3.5:9b for chat and BGE M3-Embedding for embeddings, and configuring Qdrant to persist data locally. The article highlights a method to dynamically determine embedding vector dimensions by probing the active embedder, ensuring compatibility when switching providers. AI

IMPACT Enables local development and deployment of RAG agents, reducing reliance on cloud APIs and potentially lowering costs.

RANK_REASON The article describes a technical implementation for running an existing AI framework (RAG agent) with specific tools (LangGraph, Ollama, Qdrant) in an offline configuration, which is a practical application rather than a novel release or research.

Read on dev.to — LLM tag →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Run RAG agent offline with LangGraph, Ollama, and embedded Qdrant

COVERAGE [1]

  1. dev.to — LLM tag TIER_1 English(EN) · duke ·

    Running a Whole RAG Agent Offline: LangGraph + Ollama + Embedded Qdrant (Zero API Keys)

    <p>Most RAG tutorials open with "set your <code>OPENAI_API_KEY</code>." This one doesn't need it. In <a href="https://dev.to/javaking1129/running-a-langgraph-react-agent-in-production-openai-compatible-api-multi-model-gateway--emi">Part 1</a> I claimed the LLM and embeddings are …