Brief · PulseAugur

TOOL · r/LocalLLaMA English(EN) · 2d

How I do use the recent llama.cpp native tools to do web rag a.k.a. web_fetch (or anything else for the matter) directly from inside the llama-server's webui

A user on Reddit's r/LocalLLaMA shared a detailed method for enabling Retrieval Augmented Generation (RAG) and other command-line functionalities within the llama.cpp server's web UI. This approach involves enabling native tools in llama-server, installing and configuring `firejail` for system-wide sandboxing, and creating a dedicated user with a virtual machine container harness called `smolmachines`. The setup culminates in a multi-layered sandboxing process that allows the LLM to safely execute commands, such as fetching web content using `wget`, directly from its interface. AI

IMPACT Enables more sophisticated RAG and command execution directly from local LLM interfaces, enhancing their utility for complex tasks.

llama.cpp
firejail
wget
smolmachines
Qwen3.6-35B-A3B_MTP-UD-Q8_K_XL.gguf