DeepSeek Open Source FlashMLA – MLA Decoding Kernel for Hopper GPUs
(github.com/deepseek-ai)
FlashMLA is an efficient MLA decoding kernel for Hopper GPUs, optimized for variable-length sequences serving.
FlashMLA is an efficient MLA decoding kernel for Hopper GPUs, optimized for variable-length sequences serving.