Lawyer builds 16-GPU AI cluster for legal drafting with MoE models

By PulseAugur Editorial · [1 sources] · 2026-05-25 21:50

A lawyer has updated their local AI setup for legal drafting, now featuring twelve V100 SXM2 32GB GPUs and an additional box with four RTX 3090s and two V100 PCIe cards. They switched from vLLM to llama.cpp for running Mixture-of-Experts (MoE) GGUF models, finding that MoE models offer significantly better performance and context handling on their V100 hardware compared to dense models. The system now employs an orchestrator that routes tasks across multiple local models, utilizing all 16 GPUs for complex jobs like drafting affidavits and motions. AI

IMPACT Demonstrates effective local deployment of MoE models for specialized tasks, potentially reducing reliance on cloud services for niche applications.

RANK_REASON User-level deployment of hardware and software for a specific task.

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Lawyer builds 16-GPU AI cluster for legal drafting with MoE models

COVERAGE [1]

r/LocalLLaMA TIER_1 English(EN) · /u/TumbleweedNew6515 · 2026-05-25 21:50

Update on 12x32gb sxm v100 cluster / local AI for legal drafting

<table> <tr><td> <a href="https://www.reddit.com/r/LocalLLaMA/comments/1tnn29i/update_on_12x32gb_sxm_v100_cluster_local_ai_for/"> <img alt="Update on 12x32gb sxm v100 cluster / local AI for legal drafting" src="https://preview.redd.it/4h07vk82uc3h1.jpeg?width=640&crop=smart&a…

COVERAGE [1]

Update on 12x32gb sxm v100 cluster / local AI for legal drafting

RELATED ENTITIES

RELATED TOPICS