PulseAugur
EN
LIVE 05:51:50

Developer creates C#-native Ollama replacement for LLM inference

A developer has created a new inference server for Large Language Models (LLMs) entirely in C# using SpawnDev.ILGPU.ML. This server is designed to be a drop-in replacement for Ollama, supporting Ollama's API and reading models directly from the Ollama cache without re-downloading. While still in early development, its performance for interactive chat is comparable to Ollama, with token generation speeds approaching those of the established llama.cpp backend. The project aims to provide a fully C#-native solution for running LLMs, including tokenizer, dequantization, and attention mechanisms, with GPU kernels transpiled from C#. AI

IMPACT Offers a C#-native alternative for local LLM inference, potentially simplifying integration for .NET developers.

RANK_REASON This is a third-party tool that integrates with existing LLM infrastructure, rather than a direct release from a frontier lab.

Read on dev.to — LLM tag →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Developer creates C#-native Ollama replacement for LLM inference

COVERAGE [1]

  1. dev.to — LLM tag TIER_1 English(EN) · Todd Tanner ·

    I Built a Drop-In Ollama Replacement in Pure C# - No llama.cpp, No Native Binaries, Just ILGPU Kernels

    <p>A month ago I <a href="https://dev.to/lostbeard/i-built-a-neural-network-engine-in-c-that-runs-in-your-browser-no-onnx-runtime-no-javascript-4aj3">shipped a neural network engine written entirely in C#</a> - six GPU backends, no ONNX Runtime, no JavaScript bridge, no native bi…