Pulse

last 48h

[50/156] 97 sources

What AI is actually talking about — clusters surfacing on Bluesky, Reddit, HN, Mastodon and Lobsters, re-ranked to elevate originality and crush noise.

TOOL · r/LocalLLaMA English(EN) · 52m · REDDIT

Since when the RTX 6000 PRO is priced at 13250USD on the official NVIDIA Page?

The NVIDIA RTX 6000 PRO workstation GPU is now listed at $13,250 USD on NVIDIA's official marketplace. This high price point for the professional-grade graphics card has surprised users in the local LLM community. The GPU is designed for demanding AI and professional visualization tasks. AI

IMPACT High-end GPUs like the RTX 6000 PRO are crucial for local AI model training and inference, impacting the cost and accessibility of advanced AI development.
TOOL · r/LocalLLaMA English(EN) · 55m · REDDIT

[PSA] 5070ti 16GB is as low as $500.99 at Best Buy.

Nvidia's RTX 5070 Ti graphics card with 16GB of VRAM is currently on clearance at Best Buy for as low as $500.99. This price point is considered a significant value for the performance offered, making it an attractive option for consumers looking for a powerful GPU. AI

IMPACT GPU price drops can lower the barrier to entry for AI development and local model deployment.
TOOL · r/singularity Français(FR) · 4h · [2 sources] · REDDIT

Claude fable aka Claude Mythos in Google Cloud

Anthropic's Claude model is reportedly being integrated into Google Cloud under the codename "Claude fable" or "Claude Mythos." This suggests a potential partnership or offering where Google Cloud will host and provide access to Anthropic's AI capabilities. The exact nature of this integration, whether for internal use, specific customer offerings, or broader availability, remains to be detailed. AI

IMPACT This integration could expand access to Anthropic's models via Google's cloud infrastructure, potentially impacting enterprise AI adoption.
TOOL · Mastodon — fosstodon.org English(EN) · 7h · MASTO

Wall Street Journal: Meta launches ‘Workforce Academy’ to train workers to build data centers. This is an MSN-syndicated version of the article and has no paywa

Meta has introduced "Workforce Academy," a new five-week training program designed to equip individuals with the skills needed for data center construction. This initiative, a collaboration with CBRE and Associated Builders and Contractors, offers free training and guarantees employment at a Meta data center construction site upon completion. AI

IMPACT Meta's initiative aims to address labor shortages in data center construction, a critical infrastructure component for AI development and deployment.
TOOL · X — MiniMax AI English(EN) · 46m · [2 sources] · X

MiniMax is live on @RespanAI Gateway

MiniMax AI has announced its models are now available on the Respan AI Gateway. This integration aims to provide developers with easier access to MiniMax's suite of AI models for various applications including text, speech, image, video, and music. AI

IMPACT Increases accessibility of MiniMax AI models for developers building multimodal AI applications.
TOOL · r/StableDiffusion English(EN) · 2h · REDDIT

NAVA FP8 ComfyUI

A workaround has been developed to enable FP8 inference for Baidu's NAVA model within ComfyUI. This solution, available on GitHub, provides pre-configured workflow templates for various voice control features. Users can now integrate NAVA's capabilities into their ComfyUI projects for enhanced performance. AI

IMPACT Enables more efficient inference for a specific AI model on a popular creative platform.
TOOL · r/LocalLLaMA English(EN) · 2h · REDDIT

Watch agents fight: a live challenge to speed up Gemma 4 E4B inference on a single A10G

A live challenge is underway to optimize the inference speed of Google's Gemma 4 E4B model on a single A10G GPU. The competition, hosted on Hugging Face, invites participants to develop agents that can achieve faster processing times for the model. This event highlights efforts within the local LLM community to push the boundaries of hardware efficiency for AI models. AI

IMPACT Demonstrates community-driven efforts to improve inference efficiency for open-source models on consumer-grade hardware.
TOOL · X — Together (inference / OSS) English(EN) · 2h · X

The best AI infrastructure shouldn't be reserved for the biggest companies. Together AI is partnering with @pax8 to bring powerful, cost-efficient AI and leadi

Together AI has partnered with Pax8 to make advanced AI infrastructure and open-source models accessible to small and medium-sized businesses. This collaboration aims to democratize access to powerful AI tools, ensuring they are not exclusively available to large corporations. The partnership will focus on delivering cost-efficient AI solutions to a broader market. AI

IMPACT Expands access to AI tools for SMBs, potentially increasing adoption and innovation in smaller businesses.
TOOL · Mastodon — fosstodon.org Русский(RU) · 5h · MASTO

May Digest — CDN, New York, and City Networks If you're going to close out spring, do it like this: with growth to 150,000 clients, a fourfold increase in agents

Timeweb has released several updates in May, including improvements to their CDN, new agent capabilities for search and generation, and expanded data center locations. The company also saw significant growth, reaching 150,000 clients and quadrupling its agent count. These developments focus on the underlying infrastructure, such as networks and hardware, alongside new product features. AI

IMPACT Enhances AI agent capabilities for search and generation, potentially improving user experience and efficiency for AI-powered services.
TOOL · Mastodon — fosstodon.org English(EN) · 9h · MASTO

ALEPH — biologically-inspired AI runtime on embedded hardware. Security by design: immune system architecture, SHA256 whitelist, stateful iptables, anomaly clas

ALEPH is a new AI runtime designed for embedded hardware, drawing inspiration from biological immune systems for security. It features a SHA256 whitelist, stateful iptables, and an anomaly classifier to differentiate between inference loads and denial-of-service attacks. The system operates without cloud connectivity, pre-trained weights, or large language models, and has reportedly run for over 407,000 ticks without any crashes. AI

IMPACT This novel runtime could enable more secure and self-sufficient AI applications on resource-constrained embedded devices.
TOOL · Mastodon — mastodon.social Italiano(IT) · 10h · MASTO

⚡ Asynchronous Neural Networks: AI Aims to Consume Up to 100x Less, Paving the Way for More Efficient and Sustainable Models. # AI # Sustainability 🔗 https:/

Researchers are developing asynchronous neural networks that could significantly reduce AI's energy consumption, potentially by up to 100 times. This advancement aims to create more efficient and sustainable AI models. The breakthrough could pave the way for widespread adoption of AI by addressing its substantial environmental footprint. AI

IMPACT Could drastically lower the operational costs and environmental impact of AI, enabling more widespread and sustainable deployment.
TOOL · Mastodon — mastodon.social Italiano(IT) · 5h · MASTO

How to Set Up Your First Local LLM with Ollama in 5 Minutes. Installation in 3 commands. No cost, total privacy. https://ollama.ai #AI #Ollama #Pr

Ollama provides a straightforward method for users to set up their first local Large Language Model (LLM) in under five minutes. The installation process requires only three commands, offering a cost-free and privacy-focused solution for running AI models on personal devices. AI

IMPACT Enables easier local deployment and experimentation with LLMs for individuals.
TOOL · r/LocalLLaMA English(EN) · 5h · REDDIT

People are making single-slot, half height pcie v100 with nvlink in China

A Chinese company called "GPU god" has developed a single-slot, half-height PCIe version of the NVIDIA V100 GPU. This custom-designed card retains the full performance of the V100 core and is intended for passive cooling, with an option for higher power delivery. The 16GB version is expected to retail for under $220 USD, with a 32GB model also planned. AI

IMPACT Offers a more compact and potentially lower-cost option for AI hardware deployments, especially in space-constrained environments.
TOOL · r/LocalLLaMA English(EN) · 6h · REDDIT

Apple announced new on device inference engine for Apple Silicon

Apple has introduced CoreAI, a new on-device inference engine designed to replace CoreML and offer an alternative to existing frameworks like MLX and llama.cpp. This engine is optimized for Apple Silicon, particularly for mobile devices, and supports larger models, including a 20 billion parameter foundation model. While performance comparisons are pending, CoreAI aims to enable the deployment of more sophisticated AI models directly within applications. AI

IMPACT Enables larger, more sophisticated AI models to run directly on Apple devices, potentially increasing adoption of on-device AI features.
TOOL · r/LocalLLaMA English(EN) · 6h · REDDIT

I put together a Rust-native, CPU-only implementation of LFM2.5-8B-A1B

A developer has created a Rust-native, CPU-only implementation of the LFM2.5-8B-A1B language model. This project, still in progress, has been published as a cargo crate and includes features like tool use callbacks. The implementation offers a decode speed of approximately 37 tokens/s on a Ryzen 7950x and can run on systems with as little as 16GB of RAM, with memory usage around 7GB. AI

IMPACT Enables running a specific LLM on consumer hardware without dedicated GPUs.
TOOL · Mastodon — fosstodon.org 日本語(JA) · 9h · [2 sources] · MASTO

News RLWRLD and NVIDIA Announce Initiatives to Build Next-Generation Industrial Foundation for Humanoid AI – AI Watch https://www.yayafa.com/2818676/ # AgenticAi # AI # AIUtilization # ArtificialGeneralIntelligence # Artifici

Osmo has developed a system that digitizes smell, significantly reducing AI costs by 200x through the use of Meta's Llama model on AWS. Separately, RLWRLD and NVIDIA are collaborating to build the next generation of industrial infrastructure for humanoid AI. AI

IMPACT Osmo's cost reduction highlights efficiency gains in AI deployment, while the RLWRLD-NVIDIA partnership signals progress in physical AI infrastructure.
TOOL · Mastodon — fosstodon.org English(EN) · 10h · MASTO

We just launched the AI WiFi Survey Agent — and it's live on the Microsoft Commercial Marketplace 🚀 Upload a floor plan, and it reads the walls + materials, pre

Excoms AI has launched its AI WiFi Survey Agent, now available on the Microsoft Commercial Marketplace. This tool allows users to upload floor plans, which the AI then analyzes to determine wall materials and predict wireless coverage across various frequencies. The agent generates a branded PDF report with recommended access point placements, eliminating the need for on-site visits or specialized equipment. AI

IMPACT This tool automates WiFi network planning, potentially streamlining deployment for IT professionals and reducing the need for manual site surveys.
TOOL · dev.to — LLM tag English(EN) · 11h · [2 sources] · MASTO

redb.Route 3.1.0 — LLM(AI) as just another connector: `.To("llm://claude")` and tools-as-routes

The redb.Route integration framework has released version 3.1.0, introducing two new transports: redb.Route.Llm and redb.Route.Exec. The LLM transport allows developers to treat language models as addressable endpoints, similar to Kafka or HTTP, enabling seamless integration of LLM calls within existing integration workflows. This release also introduces the capability to define agent tools as routes with an `.AsLlmTool()` aspect, unifying AI functionalities within the framework's existing DSL and infrastructure. AI

IMPACT Enables developers to integrate LLMs as standard endpoints within existing integration frameworks, simplifying AI adoption.
TOOL · Mastodon — fosstodon.org English(EN) · 12h · MASTO

The paper that could pop the trillion dollar AI bubble Alternatives to current Transformer architectures could eliminate its greatest weakness: The inference ef

A new research paper proposes an alternative to the Transformer architecture, which powers most large language models. This alternative aims to address the significant computational cost associated with Transformer inference. If successful, this could potentially reduce the massive financial investment currently driving the AI industry. AI

IMPACT Potential for significantly reduced inference costs could reshape AI infrastructure and investment.
TOOL · Mastodon — fosstodon.org English(EN) · 3h · MASTO

Google Fi just made overseas travel less painful with these upgrades and perks Google Fi’s huge roaming upgrade includes faster 5G and a massive price cut. http

Google Fi has announced significant upgrades to its international roaming services, including faster 5G speeds and reduced prices for data usage abroad. These enhancements aim to make international travel more convenient and affordable for its users. The changes are part of Google Fi's ongoing efforts to improve its global connectivity offerings. AI

IMPACT Minimal direct impact on AI operators; primarily a consumer telecom service improvement.
TOOL · Mastodon — fosstodon.org English(EN) · 5h · MASTO

"Xbox Is Unable to Meet Demand for New Consoles", Rethinking Approach to Project Helix Is the demand for # Xbox consoles in the room with us right now? 🤭 Intere

Xbox is reportedly struggling to meet the demand for its new consoles, with speculation pointing to the ongoing AI boom as a contributing factor to hardware shortages. The article suggests that Microsoft's own involvement in the AI sector may be exacerbating the issue by driving up prices and demand for essential components like RAM and GPUs. This situation is prompting a reevaluation of Microsoft's "Project Helix." AI

IMPACT AI's demand for hardware components is creating supply chain pressures that affect other tech sectors like gaming consoles.
TOOL · r/LocalLLaMA English(EN) · 9h · REDDIT

Jetson Orin NX Build for Hermes Agent + Benchmarking

A user has successfully configured a Jetson Orin NX for running the Hermes Agent, achieving impressive performance metrics. The build prioritizes silence and aesthetic appeal while delivering over 10 tokens/sec for text generation and 300 tokens/sec for prompt processing. The setup supports a context window of at least 65,000 tokens, with specific testing showing a Gemma 4 26B model achieving 10.21 tokens/sec at 60,000 tokens of context. AI

IMPACT Demonstrates efficient local LLM deployment on compact hardware, enabling advanced agent capabilities.
TOOL · Mastodon — fosstodon.org English(EN) · 10h · MASTO

The fastest way to hit Google AI Pro limits (and how to avoid it) I spent hours pushing Gemini's limits, and the biggest quota killer wasn't what I expected. ht

A user discovered that Google's Gemini AI Pro has a hidden rate limit that is easily triggered by frequent API calls, even for simple tasks. This limit is not clearly documented and can be hit within hours of consistent usage, unlike other more predictable usage caps. The user found that making many small, rapid API requests, rather than complex or long-running ones, was the primary cause of hitting these limits. AI

IMPACT Highlights potential friction for developers integrating Gemini Pro via API due to undocumented rate limits.
TOOL · Mastodon — sigmoid.social English(EN) · 11h · MASTO

Dev Update #2 for smista․ai is out. This dev update follows our second milestone, which was about building smista-storage, the crate that gives user sessions a

Smista.ai has released its second development update, detailing the creation of smista-storage. This component is designed to manage AI user sessions, which are complex structures involving messages, tool calls, and routing decisions. The team selected SurrealDB for its flexibility, enabling both local, embedded use and future scalability for a SaaS offering, adhering to their 'local-first, but not local-only' principle. AI

IMPACT This development focuses on improving the underlying infrastructure for managing AI sessions, potentially enhancing user experience and enabling future SaaS capabilities.
TOOL · Mastodon — mastodon.social Italiano(IT) · 11h · MASTO

🤖 OpenRouter simplifies access to AI models: compare costs and performance, integrate via API, and choose the most convenient option. # AI # OpenRouter 🔗 https://

OpenRouter is a platform designed to simplify access to various AI models. It allows users to compare the costs and performance of different models and integrate them via API. The service aims to help users select the most cost-effective AI options for their needs. AI

IMPACT Provides a centralized platform for developers to compare and integrate various AI models, potentially streamlining AI adoption.
TOOL · Mastodon — sigmoid.social English(EN) · 11h · MASTO

A bit tired of clunkyness of Ollama+AnythingLLM, I decided to try something new. # LocalAI is a great piece of software. All-in-one solution for downloading mod

A user found LocalAI to be a superior alternative to Ollama and AnythingLLM for running AI models locally. They highlighted LocalAI's all-in-one solution for model downloads, backends, and a WebUI, all manageable within Docker with GPU acceleration. The user also noted impressive performance, achieving 95 tokens/second on their RX7900XTX GPU. AI

IMPACT LocalAI offers a streamlined, high-performance solution for running AI models locally, potentially simplifying adoption for hobbyists and developers.
TOOL · Mastodon — mastodon.social English(EN) · 14h · MASTO

Defend against frontier cyber models: Cloudflare's architecture as customer zero https://blog.cloudflare.com/frontier-model-defense/ # Security # AI # Networkin

Cloudflare is leveraging its own infrastructure to defend against advanced AI-powered cyber threats. The company is using its extensive network and security architecture as a testing ground, or "customer zero," to develop and deploy defenses against sophisticated attacks. This proactive approach aims to stay ahead of evolving cyber threats that utilize frontier AI models. AI

IMPACT Demonstrates how large infrastructure companies are applying AI to enhance cybersecurity defenses.
TOOL · Mastodon — fosstodon.org English(EN) · 5h · MASTO

🧵Domesticated AI The Apple Cloud grounds personal context, and can be extended to include the context of family members. Apple's on-device models extended by se

Apple is developing its "Domesticated AI" services, which will leverage on-device models and a secure Private Cloud Compute infrastructure. These AI capabilities will be integrated with Apple's cloud services, potentially offered for free to most users or as part of an Apple Cloud upgrade for expanded token usage. The system aims to ground personal context and can be extended to include family members' contexts. AI

IMPACT This integration could enhance user experience by providing personalized AI features within the Apple ecosystem.
TOOL · Mastodon — fosstodon.org English(EN) · 17h · MASTO

🔥 رائج 📢 GIGABYTE Showcases Full-Stack AI Infrastructure from Rack-Scale Systems to Real-World Deployment at COMPUTEX 2026 - afp.com 🔗 https:// news.google.com/

Gigabyte is presenting its comprehensive AI infrastructure solutions at COMPUTEX 2026. Their display spans from large-scale rack systems to practical deployment applications. The company aims to highlight its end-to-end capabilities in the AI hardware sector. AI

IMPACT Demonstrates the breadth of AI hardware solutions available for deployment.
TOOL · r/LocalLLaMA English(EN) · 19h · REDDIT

New MLX LM Server From Apple

Apple has released MLX LM Server, a new tool designed to enhance the performance of large language models on Mac hardware. It leverages the M5 chip's neural accelerators for faster prompt processing and employs continuous batching to manage multiple requests concurrently. For extremely large models, the server supports distributed inference across multiple Macs using Thunderbolt RDMA. AI

IMPACT Enhances LLM inference capabilities on Apple hardware, potentially improving local AI development and deployment.
TOOL · Mastodon — fosstodon.org English(EN) · 23h · [2 sources] · MASTO

Apple Core AI Framework https:// developer.apple.com/documentat ion/coreai/ # ai # apple

Apple has released its Core AI Framework, a new set of tools for developers to integrate machine learning capabilities into their applications. The framework is detailed in documentation available on Apple's developer portal. This release aims to empower developers to build more intelligent and responsive apps across Apple's ecosystem. AI

IMPACT Enables developers to more easily integrate advanced AI features into applications across Apple devices.
TOOL · Mastodon — fosstodon.org 日本語(JA) · 12h · MASTO

To an Era Where the 'Face of Threat' is Visible — Cloudflare's Attacker Name Blocking Feature Changes Information Asymmetry in Security Operations. Cloudflare Integrates Threat Intelligence into WAF, Enabling Blocking by Attacker Name and Past Targeted Industries. The Shift from 'Passive' to 'Contextual' Security Defense Begins. 🔗 https://techscop

Cloudflare has integrated threat intelligence into its Web Application Firewall (WAF), allowing users to block attacks based on the attacker's name and their targeted industries. This move shifts security defenses from a passive approach to a more contextual one, aiming to provide greater visibility into threats. The new feature is expected to change how security operations manage information asymmetry in the face of evolving cyber threats. AI

IMPACT Enhances security tooling by providing more context for threat blocking.
TOOL · r/LocalLLaMA English(EN) · 17h · REDDIT

ggml-webgpu: Improve prefill speeds for k-quants + refactor matmul for Q4/Q5/Q8 and k-quants by yomaytk · Pull Request #24225 · ggml-org/llama.cpp

A pull request for the llama.cpp project introduces optimizations for k-quantized models, significantly improving prefill speeds. The changes focus on the matrix multiplication (matmul) operations for various quantization levels, including Q4, Q5, and Q8. Benchmarks on an M2 Pro chip show speedups of up to 3.78x for certain quantizations, enhancing the performance of local large language models. AI

IMPACT Improves performance for running local LLMs, potentially enabling more complex models on consumer hardware.
TOOL · r/singularity English(EN) · 19h · REDDIT

SpaceX has just revealed it's first AI satellite design

SpaceX has unveiled its initial design for an AI-powered satellite. This satellite is intended to enhance SpaceX's Starlink internet constellation by integrating artificial intelligence capabilities directly into its space-based infrastructure. The move signifies a significant step in merging AI technology with satellite operations for improved performance and functionality. AI

IMPACT Integrates AI into satellite infrastructure, potentially improving Starlink's performance and capabilities.
TOOL · r/LocalLLaMA (TL) · 20h · REDDIT

Pipeline parallelism in llama.cpp may be wasting your VRAM

A user discovered that the default pipeline parallelism in llama.cpp may be wasting VRAM without providing any speed benefits. By compiling llama.cpp with the flag -DGGML_SCHED_MAX_COPIES=1, users can avoid this unnecessary VRAM allocation. This optimization is particularly relevant when all model layers are offloaded to the GPU. AI

IMPACT Users can reclaim VRAM by disabling default pipeline parallelism in llama.cpp, potentially allowing for larger models or contexts.
TOOL · Mastodon — fosstodon.org 日本語(JA) · 21h · [2 sources] · MASTO

Microsoft unveils AI development-specific mini PC "Surface RTX Spark Dev Box" with 120 billion parameters... https://www.yayafa.com/2818336/ # AgenticAi # AI # ArtificialGeneralIntelligence # ArtificialIntel

Microsoft has unveiled the Surface RTX Spark Dev Box, a compact PC specifically designed for AI development. The company also announced Scout, an autonomous agent built on the OpenClaw framework with MCP support. These announcements highlight Microsoft's continued investment in AI infrastructure and agentic AI capabilities. AI

IMPACT These tools could streamline AI development workflows and enable new agentic applications.
TOOL · r/LocalLLaMA English(EN) · 22h · REDDIT

Quick note on the QAT of recent

A Reddit user has identified issues with Google's quantization process for large language models, specifically noting that the llama-quantize function is hardcoded incorrectly and misaligns block groups. The user suggests that the unsloth Q4_K_XL quantization method is a more reliable alternative for now. A patch is reportedly in development to address these quantization errors. AI

IMPACT Highlights potential issues in LLM quantization tools, impacting model efficiency and performance.
TOOL · Mastodon — mastodon.social English(EN) · 1d · MASTO

Apple expanded its developer tools at WWDC 2026 to route AI tasks between on-device models, Private Cloud Compute, and external servers. The move ties Foundatio

Apple is enhancing its developer tools to better integrate AI capabilities across various platforms. Developers can now route AI tasks between on-device models, Apple's Private Cloud Compute, and external servers. This integration aims to deepen AI functionality within Siri and offer developers more flexibility in processing, though device memory and regional availability remain limitations. AI

IMPACT Developers gain more control over AI processing location, potentially optimizing performance and privacy for AI-powered applications.
TOOL · r/LocalLLaMA English(EN) · 1d · REDDIT

GLM-5.1 and Kimi K2.6 THE CHEAPEST WAY TO RUN

Users on the r/LocalLLaMA subreddit are discussing the most cost-effective hardware configurations for running the GLM-5.1 and Kimi K2.6 large language models. Participants are seeking advice on achieving inference speeds of 15-20 tokens per second with minimal expense. Suggestions range from high-end consumer GPUs like the RTX 5090 paired with substantial RAM, to professional-grade hardware such as Threadripper CPUs, Mac Studio Ultra machines, or multiple V100 GPUs. AI

IMPACT Users are seeking optimal hardware setups for running specific LLMs, indicating a focus on efficient deployment and accessibility.
TOOL · Mastodon — fosstodon.org English(EN) · 1d · MASTO

Lookspan now bills reasoning tokens at their own rate. If your model pricing sets a reasoning rate, reasoning tokens (a subset of output, OpenAI o-series style)

Lookspan has updated its billing system to specifically track and charge for reasoning tokens. This change ensures that if a model's pricing includes a distinct rate for reasoning, those specific tokens will be billed accordingly, preventing double-charging with general output tokens. The update aims to provide more precise cost-per-span calculations for models that utilize reasoning capabilities. AI

IMPACT Provides more accurate cost tracking for AI model usage, aiding operators in managing expenses.
TOOL · Hacker News — AI stories ≥50 points English(EN) · 1d · HN

Apple Core AI Framework

Apple has released its Core AI framework, a new set of tools designed to help developers integrate artificial intelligence capabilities into their applications. The framework provides access to on-device machine learning models and functionalities, enabling richer and more responsive AI experiences within the Apple ecosystem. Developers can leverage Core AI to build features such as image analysis, natural language processing, and predictive text directly into their iOS, macOS, and other Apple platform applications. AI

IMPACT Enables developers to more easily integrate on-device AI features into Apple applications, potentially leading to more intelligent and responsive user experiences.
TOOL · Mastodon — mastodon.social English(EN) · 1d · MASTO

Uber burned its entire 2026 AI budget by April. Teams are watching token counters the way AOL subscribers watched the clock in 1993. Per-token pricing is a tran

Uber has already exhausted its artificial intelligence budget for 2026 by April, indicating a significant overspend on AI services. Employees are reportedly monitoring token usage closely, reminiscent of early internet users rationing data. This situation highlights the unsustainable per-token pricing model for AI inference, suggesting that current costs will not persist as the technology evolves. AI

IMPACT Highlights the potential for high operational costs in AI adoption, pressuring companies to find more efficient inference methods.
TOOL · Mastodon — sigmoid.social English(EN) · 1d · MASTO

One healthcare organization saw token usage grow 8-10% monthly, adding $6M in unplanned costs before finance caught it. The gap is driving adoption of AI gatewa

A healthcare organization experienced an 8-10% monthly increase in AI token usage, resulting in $6 million in unexpected expenses. This significant cost overrun has prompted the organization to adopt AI gateways and observability tools for better spend attribution. The situation highlights a broader industry challenge in tracking AI expenditures, with a call for standards in tokenomics to improve cost transparency. AI

IMPACT Adoption of AI cost management tools and standards is crucial for enterprises to control burgeoning AI expenditures and ensure financial accountability.
TOOL · Mastodon — mastodon.social English(EN) · 1d · MASTO

I managed to set up and self host AI models on my home server this was way easier than I thought well, the biggest benefit is privacy. biggest drawback is that

A user successfully set up and self-hosted AI models on their home server, finding the process easier than anticipated. The primary advantage of this setup is enhanced privacy. However, the main disadvantage is the slow performance, attributed to hardware limitations. AI

IMPACT Enables individuals to run AI models locally, prioritizing privacy over speed and potentially lowering barriers to AI experimentation.
TOOL · r/LocalLLaMA English(EN) · 1d · REDDIT

Here are some tips on hitting nearly 200 tok/s for DeepSeek v4 Flash on Hopper

A user shared optimization tips for running the DeepSeek v4 Flash model locally, achieving nearly 200 tokens per second on a Hopper system. By utilizing specific quants from Canada-Quant and patching the MTP code in vLLM, the user managed to significantly improve inference speed. The post also details the cost implications, noting that electricity costs for token generation currently exceed revenue. AI

IMPACT Provides practical insights for optimizing local LLM inference speeds, potentially reducing operational costs for users.
TOOL · The Register — AI English(EN) · 1d · [2 sources] · MASTO

Canonical sends Ubuntu into the AI agent era

Canonical has introduced a new approach to AI agent development using Ubuntu, leveraging LXD "containervisors" and snap packaging. This system creates isolated sandboxes for LLM agents, granting them controlled access to resources like GPUs and specific files while preventing access to sensitive personal data. The initiative aims to simplify the installation and execution of AI agents while enhancing security through resource limitation. AI

IMPACT Simplifies AI agent deployment and enhances security, potentially accelerating adoption of LLM-based tools.
TOOL · Mastodon — mastodon.social 日本語(JA) · 1d · [2 sources] · MASTO

Vertiv Partners with NVIDIA for Digital Twin in AI Data Center Design – ZDNET Japan https://www.yayafa.com/2818161/ # AgenticAi # AI # ArtificialGeneralIntelligence # ArtificialIntelligence # N

Meta is reportedly planning to establish data centers within temporary structures, a move detailed by GIGAZINE. Concurrently, ZDNET Japan reports that Vertiv is integrating NVIDIA's digital twin technology into its AI data center designs. These developments highlight evolving strategies in AI infrastructure, focusing on both novel deployment methods and advanced design tools. AI

IMPACT These infrastructure strategies and design tools are crucial for scaling AI capabilities, impacting the efficiency and deployment of future AI systems.
TOOL · Mastodon — mastodon.social English(EN) · 1d · [4 sources] · MASTO

🤖 End-to-end encrypted ML inference with Amazon SageMaker AI and FHE This blog has previously discussed FHE for ML inference in the post Enable fully homomorphi

Amazon SageMaker is now supporting end-to-end encrypted machine learning inference using fully homomorphic encryption (FHE). This advancement allows for secure processing of sensitive data without decryption, enhancing privacy in AI applications. The integration builds upon previous discussions about FHE's potential for secure ML inference. AI

IMPACT Enhances privacy and security for AI applications processing sensitive data.
TOOL · r/LocalLLaMA English(EN) · 1d · REDDIT

Luce Spark: a 35B MoE on a 16 GB GPU, without the offload tax

Luce Spark is a new open-source system that allows large Mixture-of-Experts (MoE) language models, specifically 33-35 billion parameters, to run on a single 16GB GPU. It achieves this by intelligently keeping only the currently active experts on the GPU, while the rest are stored in system RAM and swapped in as needed. This method avoids the performance penalty typically associated with offloading, enabling models that would otherwise not fit to run efficiently. AI

IMPACT Enables running large MoE models on consumer-grade hardware, democratizing access to advanced AI capabilities.
TOOL · r/StableDiffusion English(EN) · 1d · REDDIT

"Testing LCM on a GTX 750 Ti 4GB: Surprisingly Usable for Low-VRAM AI Image Generation"

A user tested the Latent Consistency Model (LCM) on an older GTX 750 Ti GPU with 4GB of VRAM. The results showed that LCM significantly speeds up AI image generation, with initial model loads taking around 30 seconds and subsequent generations completing in 12-15 seconds. While image quality is slightly reduced compared to higher-step generations, it remains usable for previews and concept art, making older hardware viable for AI image creation. AI

IMPACT LCM technology allows users with low-VRAM GPUs to generate AI images faster, extending the life of older hardware.