Hacker News with Generative AI: Matrix Multiplication

I verified DeepMind's AlphaEvolve matrix multiplication breakthrough with Claude (github.com/PhialsBasement)
Verification of Google DeepMind's AlphaEvolve 48-multiplication matrix algorithm, a breakthrough in matrix multiplication after 56 years.

Verification, Machine Learning, Matrix Multiplication, AI, DeepMind

30 points by Phiality 54 days ago | 0 comments

X X^t can be faster (arxiv.org)
We present a new algorithm RXTX that computes product of matrix by its transpose $XX^{t}$. RXTX uses $5\%$ less multiplications and additions than State-of-the-Art and achieves accelerations even for small sizes of matrix $X$. The algorithm was discovered by combining Machine Learning-based search methods with Combinatorial Optimization.

Machine Learning, Optimization, Algorithms, Matrix Multiplication

201 points by robinhouston 55 days ago | 60 comments

How to Write a Fast Matrix Multiplication from Scratch with Tensor Cores (2024) (alexarmbr.github.io)
This post details my recent efforts to write an optimized matrix multiplication kernel in CUDA using tensor cores on a NVIDIA Tesla T4 GPU. The goal is to compute $D = \alpha * A * B + \beta * C$, as fast as possible. In this equation $D,A,B$ and $C$ are large matrices full of half precision floating point numbers, and $\alpha$, $\beta$ are constants. This problem is usually referred to as a Half-precision Generalized Matrix Multiply, or HGEMM for short.

CUDA, GPU Programming, Matrix Multiplication, Optimization, Tensor Cores

147 points by skidrow 82 days ago | 17 comments

Optimizing Matrix Multiplication (coffeebeforearch.github.io)
Matrix multiplication is an incredibly common operation across numerous domains. It is also known as being “embarrassingly parallel”. As such, one common optimization is parallelization across threads on a multi-core CPU or GPU. However, parallelization is not a panacea. Poorly parallelized code may provide minimal speedups (if any).

Optimization, Matrix Multiplication, Performance, Parallel Computing, Computer Science

3 points by jxmorris12 97 days ago | 0 comments

Optimizing Matrix Multiplication on RDNA3 (seb-v.github.io)
In this post, I will share with you all the steps to write an optimized FP32 matrix multiplication on AMD RDNA3 GPU outperforming rocBLAS by 60%. I will cover some basics and explain all the optimizations I have implemented. This will be done in a iterative way in 8 differents Kernels.

GPU Optimization, AMD RDNA3, Matrix Multiplication, Performance Tuning

118 points by skidrow 108 days ago | 26 comments

Karatsuba Matrix Multiplication and Its Efficient Hardware Implementations (arxiv.org)
While the Karatsuba algorithm reduces the complexity of large integer multiplication, the extra additions required minimize its benefits for smaller integers of more commonly-used bitwidths.

Algorithms, Matrix Multiplication, Hardware, Computer Science

5 points by emacs28 124 days ago | 0 comments

Experiments with Byte Matrix Multiplication (github.com/serge-sans-paille)
It's quite common in machine learning operations to multiply a matrix of unsigned byte by a matrix of signed byte.

Machine Learning, Programming, Computer Science, Matrix Multiplication, Open Source

41 points by serge-ss-paille 181 days ago | 4 comments

Matrix Multiplication in Finite Fields (fileforma.substack.com)
ffGEMM is a fixed-point arithmetic library for fast matrix multiplications on CPU. This article introduces the underlying mathematics for Fileforma’s ffGEMM library.

Matrix Multiplication, Finite Fields, Computer Science, Libraries, Algorithms

4 points by muragekibicho 297 days ago | 0 comments

Fast Multidimensional Matrix Multiplication on CPU from Scratch (2022) (siboehm.com)

CPU, Matrix Multiplication, Performance, Algorithms, Programming

74 points by georgehill 344 days ago | 23 comments