Hacker News with Generative AI: GPU Optimization

An Almost Pointless Exercise in GPU Optimization (speechmatics.com)
Not everyone is able to write funky fused operators to make ML models run faster on GPUs using clever quantisation tricks. However lots of developers work with algorithms that feel like they should be able to leverage the thousands of cores in a GPU to run faster than using the dozens of cores on a server CPU. To see what is possible and what is involved, I revisited the first problem I ever considered trying to accelerate with a GPU.

GPU Optimization, Machine Learning, Performance, Algorithms

87 points by atomlib 61 days ago | 3 comments

A handy metric is needed for gauging if GPUs are being used optimally (theregister.com)
GPU accelerators used in AI processing are costly items, so making sure you get the best usage out of them ought to be a priority, yet the industry lacks an effective way of measuring this, says the Uptime Institute.

GPU Optimization, Artificial Intelligence, Hardware, Performance Measurement

8 points by cyberhost 62 days ago | 2 comments

Mipmap selection in too much detail (pema.dev)
In this post, I want to shed some light on something I’ve been wondering about for a while: How exactly are mipmap levels selected when sampling textures on the GPU?

Computer Graphics, Texture Mapping, GPU Optimization

99 points by luu 71 days ago | 25 comments

Optimizing Matrix Multiplication on RDNA3 (seb-v.github.io)
In this post, I will share with you all the steps to write an optimized FP32 matrix multiplication on AMD RDNA3 GPU outperforming rocBLAS by 60%. I will cover some basics and explain all the optimizations I have implemented. This will be done in a iterative way in 8 differents Kernels.

GPU Optimization, AMD RDNA3, Matrix Multiplication, Performance Tuning

118 points by skidrow 118 days ago | 26 comments

Sorting for Rendering (linebender.org)
Many rendering algorithms (including a proposed sparse strip technique for path rendering, and also Gaussian Splatting) rely on sorting. Because the GPU has a different architecture to the CPU, programs running on the GPU have different performance characteristics, and this changes which sorting algorithms are optimal for a particular context. In particular, sorting algorithms that exploit parallelism tend to be more suited to the GPU.

Rendering, Algorithms, GPU Optimization, Computer Graphics

7 points by matt_d 263 days ago | 0 comments