Brief

last 24h

[2/2] 222 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

COMMENTARY · r/LocalLLaMA English(EN) · 3d

Local run for multi users: which software set?

A user on Reddit's r/LocalLLaMA subreddit is seeking advice on setting up a multi-user local LLM service. They have experimented with vLLM and llama.cpp, using llama-swap as a frontend, but are encountering limitations with concurrency and API key management. The user is looking for open-source software recommendations to enable external access, including HTTPS, a web chat interface, and API access with key management for fewer than 10 users. AI

IMPACT N/A
- Reddit
- llama.cpp
- vLLM
- r/LocalLLaMA
- LibreChat
- llama-swap
MEME · r/LocalLLaMA English(EN) · 1d

anybody got llama-swap working answering concurrent requests for a single model?

A user on the r/LocalLLaMA subreddit is seeking assistance with configuring llama-swap to handle concurrent requests for a single model. They have successfully set up Qwen 3.6 35B A3B with multi-GPU support and concurrency enabled via llama-server, but llama-swap appears to serialize requests instead of processing them in parallel. The user has explored various configuration options and issue trackers without success, specifically aiming to avoid running multiple llama-cpp instances to conserve GPU memory. AI

Local run for multi users: which software set?