Warp-types: GPU shuffle UB becomes a compile error (Rust, Lean proof, zero cost) (github.com)

modelmiser 13 hours ago

GPU shuffle operations read from other lanes in a warp. If those lanes are inactive (diverged), you get undefined behavior — not a crash, just silently wrong results. NVIDIA deprecated an entire API family over this. Their own reference code still contains the bug (cuda-samples#398). A plasma physics simulation ran for months with the UB undetected.

warp-types tracks active lane masks in Rust's type system. Warp<All> has shuffle methods. Warp<Even> doesn't — they're not checked and rejected, they literally don't exist on the type. Diverge produces complementary sub-warps; merge requires proof they're complements. The types erase completely — Warp<S> is PhantomData, so the generated PTX is byte-identical to untyped code.

345 tests, 8 real-bug examples (NVIDIA, PyTorch, OpenCV, TVM), real GPU execution on RTX 4000 Ada, and a Lean 4 soundness proof (28 theorems, zero sorry, zero axioms).