Brief · PulseAugur

TOOL · dev.to — LLM tag English(EN) · 4h

Building a Fully-Local Research RAG on 2 GTX 1080 Ti + an RTX 3090 — 3 Gotchas

A researcher detailed the process of building a local Retrieval-Augmented Generation (RAG) system for research papers using consumer-grade GPUs. The project, named paper-rag, involved setting up a hybrid retrieval system with dense and sparse embeddings, reranking, and a local LLM. Key challenges included an embedding model freezing GPUs, which was resolved by offloading to the CPU, and a large-context LLM running slowly due to excessive KV cache, fixed by capping the context size. The researcher also advised against merging older and newer GPUs for inference due to network bottlenecks. AI

IMPACT Provides practical insights for individuals building local RAG systems on consumer hardware.

Ollama
RTX 3090
qwen3.6:27b
BGE-M3
Qdrant
GTX 1080 Ti
fastembed
paper-rag