Brief

last 24h

[3/3] 221 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

COMMENTARY · X — Together (inference / OSS) English(EN) · 6d

"One thing that we've been seeing recently is that inference benchmarks don't really match production workloads that well." - @realDanFu, VP of Kernels

Inference benchmarks may not accurately reflect real-world production workloads, according to Dan Fu, VP of Kernels at Together. This is particularly true when running numerous concurrent coding agents that require large context windows. Fu suggests that benchmarks should better align with these complex, high-demand operational scenarios. AI

IMPACT Highlights a potential disconnect between AI model evaluation and practical application, suggesting a need for more relevant benchmarks.
- Together
- Dan Fu
TOOL · Together AI blog English(EN) · 1mo

Inside the Together AI kernels team

The Together AI kernels team, including researchers Dan Fu and Tri Dao, developed FlashAttention, a software layer that significantly optimizes GPU performance for AI models. This breakthrough, achieved by applying database system principles to GPU memory movement, resulted in 2-3x speedups, challenging the notion that transformer attention was already fully optimized. The team's subsequent work, including the ThunderKittens library, aims to accelerate kernel development for new hardware like NVIDIA's Blackwell GPUs, addressing the critical software-hardware gap in AI infrastructure. AI

IMPACT Optimizes AI inference and training by bridging the software-hardware gap, potentially lowering costs and improving responsiveness.
- NVIDIA
- Stanford
- Together AI
- Andrej Karpathy
- Tesla
- GPU
- FlashAttention
- ThunderKittens
- Tri Dao
- Dan Fu
COMMENTARY · Together AI blog English(EN) · 5mo

Research POV: Yes, AGI Can Happen – A Computational Perspective

Together AI's VP of Kernels, Dan Fu, argues that the pursuit of AGI is not hitting a hardware wall. He posits that current AI systems are significantly underutilizing existing hardware, with training runs often achieving only 20% Mean FLOP Utilization (MFU) and inference in the single digits. Fu suggests that advancements in software-hardware co-design and innovations like FP4 training could unlock substantial performance gains, and that future compute power from next-generation hardware has yet to be fully integrated. AI

IMPACT Argues that significant performance gains are achievable through software-hardware co-design, potentially accelerating AGI development.
- DeepSeek-V3
- AGI
- Together AI
- Llama-4
- Dan Fu

Brief

"One thing that we've been seeing recently is that inference benchmarks don't really match production workloads that well." - @realDanFu, VP of Kernels

Inside the Together AI kernels team

Research POV: Yes, AGI Can Happen – A Computational Perspective