SVDQuant+NVFP4: 4× Smaller, 3× Faster FLUX with 16-bit Quality on Blackwell GPUs (hanlab.mit.edu)
With Moore's law slowing down, hardware vendors are shifting toward low-precision inference. NVIDIA's latest Blackwell architecture introduces a new 4-bit floating point format (NVFP4), improving upon the previous MXFP4 format. NVFP4 features more precise scaling factors and a smaller microscaling group size (16 v.s. 32), enabling it to maintain 16-bit model accuracy even at 4-bit precision while delivering 4× higher peak performance.