dvlt.cu: inference engine written from scratch in CUDA/C++ for NVIDIA's DVLT 3D transformer model
A new inference engine called dvlt.cu has been developed from scratch using CUDA/C++ for NVIDIA's DVLT 3D transformer model. This standalone 5MB binary has minimal dependencies, relying only on cuBLASLt and the header-only cuTLASS library. It efficiently handles bf16 weights, performs a single bulk GPU upload, and offers deterministic output, making it suitable for 3D reconstruction tasks. AI
IMPACT Provides a specialized, dependency-light inference engine for 3D transformer models, potentially improving performance for specific applications.