Hacker News with Generative AI: Matrix Multiplication

Optimizing Matrix Multiplication on RDNA3 (seb-v.github.io)
In this post, I will share with you all the steps to write an optimized FP32 matrix multiplication on AMD RDNA3 GPU outperforming rocBLAS by 60%. I will cover some basics and explain all the optimizations I have implemented. This will be done in a iterative way in 8 differents Kernels.
Karatsuba Matrix Multiplication and Its Efficient Hardware Implementations (arxiv.org)
While the Karatsuba algorithm reduces the complexity of large integer multiplication, the extra additions required minimize its benefits for smaller integers of more commonly-used bitwidths.
Experiments with Byte Matrix Multiplication (github.com/serge-sans-paille)
It's quite common in machine learning operations to multiply a matrix of unsigned byte by a matrix of signed byte.
Matrix Multiplication in Finite Fields (fileforma.substack.com)
ffGEMM is a fixed-point arithmetic library for fast matrix multiplications on CPU. This article introduces the underlying mathematics for Fileforma’s ffGEMM library.
Fast Multidimensional Matrix Multiplication on CPU from Scratch (2022) (siboehm.com)