TorchInductor
PulseAugur coverage of TorchInductor — every cluster mentioning TorchInductor across labs, papers, and developer communities, ranked by signal.
-
PassNet uses LLMs to generate compiler passes for performance optimization
Researchers have introduced PassNet, a novel framework designed to leverage large language models (LLMs) for generating compiler passes, which are crucial for optimizing code performance. Existing tensor compilers strug…
-
CuTeDSL emerges as new GPU kernel path for LLM inference, challenging CUTLASS
The landscape of GPU kernel engineering for LLM inference is shifting, with CuTeDSL emerging as a potential successor to C++ CuTe/CUTLASS. This evolution is highlighted by industry trends in technologies like FlashAtten…
-
GraphMend compiler technique fixes PyTorch 2 graph breaks, boosting performance
Researchers have developed GraphMend, a novel compiler technique designed to address issues with FX graph breaks in PyTorch 2 programs. These breaks, caused by dynamic control flow and unsupported Python constructs, oft…