Brief · PulseAugur

TOOL · dev.to — LLM tag English(EN) · 6d

Free 35B Multimodal LLM Server on Kaggle GPU — Accessible from Any OpenAI-Compatible Client

A developer has created a method to run a 35 billion parameter multimodal LLM on free Kaggle GPUs, overcoming the typical limitations of such platforms. The solution involves using Qwen3.6-35B-A3B quantized to 4-bit, hosted on Kaggle's T4 GPUs for up to 12 hours per session. It leverages llama.cpp for inference and an OpenAI-compatible API, with Cloudflare Quick Tunnel providing a stable public URL that supports token streaming, unlike other free tunneling services. AI

IMPACT Enables developers to run powerful LLMs on free cloud GPUs, bypassing costly hardware or API fees.

OpenAI
Kaggle
llama.cpp
Unsloth
Qwen3.6-35B-A3B
Cloudflare Quick Tunnel