Hacker News with Generative AI: Tensor Cores

How to Write a Fast Matrix Multiplication from Scratch with Tensor Cores (2024) (alexarmbr.github.io)
This post details my recent efforts to write an optimized matrix multiplication kernel in CUDA using tensor cores on a NVIDIA Tesla T4 GPU. The goal is to compute $D = \alpha * A * B + \beta * C$, as fast as possible. In this equation $D,A,B$ and $C$ are large matrices full of half precision floating point numbers, and $\alpha$, $\beta$ are constants. This problem is usually referred to as a Half-precision Generalized Matrix Multiply, or HGEMM for short.

CUDA, GPU Programming, Matrix Multiplication, Optimization, Tensor Cores

147 points by skidrow 330 days ago | 17 comments

Nvidia Tensor Core Programming (leimao.github.io)
NVIDIA Tensor Cores are dedicated accelerators for general matrix multiplication (GEMM) operations on NVIDIA GPUs since the Volta architecture.

Nvidia GPUs, Tensor Cores, Machine Learning, Programming

8 points by bjourne 427 days ago | 0 comments

Zen, CUDA, and Tensor Cores – Part 1 [video] (youtube.com)

Zen, CUDA, Tensor Cores, Video, Programming

30 points by surprisetalk 557 days ago | 13 comments