Brief

last 24h

[7/7] 224 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

COMMENTARY · r/LocalLLaMA English(EN) · 2d

Executing a plan under context constraints

A user running the Qwen 3.6 35B-A3B model locally encountered high context window usage while executing a refactoring plan. The model reached 92.6% context window utilization before auto-compaction occurred. The user is seeking advice on how to manage context window pressure during plan execution to prevent such issues, suggesting methods like starting a new session with the previous plan pasted in. AI

IMPACT Users may need strategies to manage context window limitations when executing complex plans with local LLMs.
COMMENTARY · r/LocalLLaMA English(EN) · 1w

Local vs Frontier on low-level systems engineering

A user found that Anthropic's Claude Opus model significantly outperformed other frontier and local models, including GPT-5, in complex low-level systems engineering tasks. The user detailed a project where Opus successfully reverse-engineered firmware, identified CRC structures, and automated binary patching for an AirPlay speaker to disable an idle timer. This experience led the user to conclude that Opus operates on a different level for demanding binary analysis tasks. AI

IMPACT Highlights Claude Opus's advanced capabilities in complex technical tasks, potentially influencing its adoption for specialized engineering and reverse-engineering applications.
TOOL · r/LocalLLaMA Deutsch(DE) · 1w

A Simple Coding Benchmark: Step 3.7 vs Qwen 3.5 122B-A10B vs Qwen 3.6 27B vs Qwen 3.6 35B-A3B

A user on Reddit has published results from a coding benchmark comparing several Qwen models against Step 3.7. The benchmark focused on evaluating the models' performance in coding tasks. The results indicate that Qwen 3.5 122B-A10B and Qwen 3.6 35B-A3B performed notably well in this specific coding evaluation. AI

IMPACT Provides insights into the coding capabilities of various Qwen models, useful for developers choosing models for coding tasks.
TOOL · r/LocalLLaMA English(EN) · 1w

Qwen 3.6-35B-A3B with 977 tk/s prompt processing and 262k context window on Intel Arc B70 Pro

A user has successfully run the Qwen 3.6-35B-A3B model on an Intel Arc B70 Pro GPU, achieving impressive performance metrics. The setup utilized llama.cpp with SYCL backend, yielding a prompt processing speed of 977 tokens per second and supporting a context window of 262,000 tokens. This configuration has enabled the user to develop a functional poker game without encountering issues like model loops or crashes. AI

IMPACT Demonstrates high performance for local LLM inference on consumer GPUs, potentially lowering barriers to entry for advanced AI applications.
COMMENTARY · r/LocalLLaMA English(EN) · 1w

Stop asking what model to run. There are literally only two.

A Reddit post on r/LocalLLaMA argues that users should stop asking for model recommendations, stating that only two viable local models currently exist: Qwen 3.6 35b a3b and Qwen 3.6 27b. The author dismisses the relevance of user hardware or specific use cases, advocating for heavily quantized versions of these large models even if they strain system resources. The post suggests that users seeking more advanced capabilities should consider commercial options like Claude Code. AI

IMPACT This commentary highlights user frustration with local LLM accessibility and suggests a shift towards commercial models for advanced tasks.
MEME · r/LocalLLaMA English(EN) · 1w

anybody got llama-swap working answering concurrent requests for a single model?

A user on the r/LocalLLaMA subreddit is seeking assistance with configuring llama-swap to handle concurrent requests for a single model. They have successfully set up Qwen 3.6 35B A3B with multi-GPU support and concurrency enabled via llama-server, but llama-swap appears to serialize requests instead of processing them in parallel. The user has explored various configuration options and issue trackers without success, specifically aiming to avoid running multiple llama-cpp instances to conserve GPU memory. AI
TOOL · r/LocalLLaMA English(EN) · 1mo

Field report: coding with Qwen 3.6 35B-A3B on an M2 Macbook Pro with 32GB RAM

A user has successfully configured the Qwen 3.6 35B-A3B model to run locally on a 32GB RAM M2 Macbook Pro for coding tasks. The setup involves building the llama.cpp software from source and downloading specific model and vision adapter files from Hugging Face. The user provides detailed instructions and command-line arguments for running the model, emphasizing the need to close other applications to manage memory constraints. AI

IMPACT Enables local execution of a capable coding LLM on consumer-grade hardware, reducing reliance on cloud services.

Brief

Executing a plan under context constraints

Local vs Frontier on low-level systems engineering

A Simple Coding Benchmark: Step 3.7 vs Qwen 3.5 122B-A10B vs Qwen 3.6 27B vs Qwen 3.6 35B-A3B

Qwen 3.6-35B-A3B with 977 tk/s prompt processing and 262k context window on Intel Arc B70 Pro

Stop asking what model to run. There are literally only two.

anybody got llama-swap working answering concurrent requests for a single model?

Field report: coding with Qwen 3.6 35B-A3B on an M2 Macbook Pro with 32GB RAM