Brief

last 24h

[6/6] 221 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · arXiv cs.LG English(EN) · 1d

DCC: Data-Centric Compilation of Machine Learning Kernels for Processing-In-Memory Architectures

Researchers have developed DCC, a novel data-centric compiler designed to optimize machine learning kernels for Processing-In-Memory (PIM) architectures. This compiler addresses the challenges of data rearrangement and compute code optimization by jointly optimizing these interdependent processes. DCC supports multiple PIM backends through a multi-layer abstraction and has demonstrated significant speedups, achieving up to 7.68x on HBM-PIM and 13.17x on AttAcc PIM compared to GPU-only execution. For end-to-end LLM inference, DCC on AttAcc accelerated GPT-3 and LLaMA-2 by an average of 4.52x. AI

IMPACT Enables significant acceleration for LLM inference and other ML workloads on specialized Processing-In-Memory hardware.
- GPT-3
- LLaMA-2
- Machine Learning
- DCC
- Processing-In-Memory
- HBM-PIM
- AttAcc PIM
COMMENTARY · Forbes — Innovation English(EN) · 6d

AI’s Dirty Secret: It Mostly Speaks English

Despite claims of multilingual capabilities, most AI systems primarily operate in English due to training data imbalances. Large language models are predominantly trained on English content, with studies indicating up to 90% of training tokens are English. This linguistic bias means AI often processes information through an English-centric lens, even when translating outputs, potentially overlooking cultural nuances and local contexts. Consequently, AI performance can be weaker and error rates higher in non-English languages, impacting its effectiveness in diverse global applications. AI

IMPACT AI systems' English-centric training limits their effectiveness and cultural nuance in non-English languages, impacting global applications.
SIGNIFICANT · Fortune English(EN) · 6d · [2 sources]

Exclusive: AI startup Viktor raises $75 million to put a virtual ‘coworker’ in Slack and Teams

AI startup Viktor has secured $75 million in Series A funding to develop its virtual coworker agent, designed to integrate with platforms like Slack and Microsoft Teams. The agent aims to automate tedious knowledge work by connecting to various business systems and learning organizational workflows. This funding round was led by Accel Partners, with participation from other venture capital firms and angel investors, including co-founders of Slack. AI

IMPACT This funding could accelerate the development of AI agents designed to integrate into team workflows, potentially changing how knowledge workers collaborate.
RESEARCH · Medium — MLOps tag English(EN) · 1w · [4 sources]

Your LLM Server Is Wasting 80% of Its GPU Memory — Here’s How vLLM Fixes That

Large language models (LLMs) face a significant bottleneck in serving efficiency due to the memory demands of KV cache, which stores intermediate attention calculations. This KV cache, essential for enabling faster responses and handling longer context windows, can consume up to 80% of GPU memory. Innovations like vLLM's PagedAttention, inspired by operating system memory management, are addressing this by optimizing KV cache storage and reducing memory fragmentation, leading to substantial improvements in inference throughput. AI

IMPACT Optimizing KV cache and memory usage is crucial for reducing LLM serving costs and improving inference speed, enabling wider adoption of AI applications.
- Claude
- GPT-4
- LLM
- KV cache
- vLLM
- GPU
- PagedAttention
- Llama-2-7b-hf
- Llama-2
- Medium
- LLMs
- Tensormesh
- SemiAnalysis
- dev.to
TOOL · dev.to — LLM tag English(EN) · 4d · [43 sources]

Hot To Run LLMs Locally

This series of guides provides comprehensive instructions for setting up and running large language models (LLMs) locally on Linux systems. It details hardware and software prerequisites, recommends using llama.cpp for its balance of performance and ease of use, and covers model selection, quantization, and API integration. The guides also include steps for setting up systemd services for 24/7 operation, monitoring performance, and optimizing for various hardware constraints. AI

IMPACT Enables developers to run and experiment with LLMs locally, reducing reliance on cloud services and facilitating custom application development.
- Cursor
- Ollama
- Continue.dev
- VS Code
- Large Language Models
- Qwen2.5-coder
- Claude API
- Llama-3
- OpenAI API
- RTX 4090
- Apple Silicon
- Qwen 2.5
- DeepSeek-R1
- RTX 3090
- NVIDIA GPU
- NVIDIA RTX 3060
- Mac
- llama.cpp
- Mistral-7B
- Ubuntu
- CPU
- RAM
- VRAM
- Linux
- RTX 3060
- Q4_K_M
- Q5_K_M
- NVIDIA
- Llama 2
- Qwen
- CodeLlama
- Phi-3
- Q8_0
- AMD
SIGNIFICANT · Replit blog English(EN) · 34mo · [3 sources]

Replit Storage: The Next Generation

Replit has launched Expandable Storage, a new infrastructure that significantly increases storage limits for its users. The platform now supports up to 1 TiB of storage per account, with tiered increases for free, Hacker, and Pro plans, and offers additional à la carte options. This upgrade, powered by a new system called Margarine that uses incremental snapshots, addresses a long-standing user request and is crucial for developing larger projects, including AI applications that require substantial storage for models and dependencies. Alongside this, Replit has also overhauled its filetree system for improved performance and accessibility, utilizing virtualization and local caching to handle large projects more efficiently. AI

IMPACT Enables developers to build and deploy larger AI applications on the platform by removing storage constraints.
- Replit
- Llama 2
- NFS
- Margarine
- btrfs
- Expandable Storage
- LVM thin pools

Brief

DCC: Data-Centric Compilation of Machine Learning Kernels for Processing-In-Memory Architectures

AI’s Dirty Secret: It Mostly Speaks English

Exclusive: AI startup Viktor raises $75 million to put a virtual ‘coworker’ in Slack and Teams

Your LLM Server Is Wasting 80% of Its GPU Memory — Here’s How vLLM Fixes That

Hot To Run LLMs Locally

Replit Storage: The Next Generation