Hacker News with Generative AI: Performance Tuning

Optimizing Matrix Multiplication on RDNA3 (seb-v.github.io)
In this post, I will share with you all the steps to write an optimized FP32 matrix multiplication on AMD RDNA3 GPU outperforming rocBLAS by 60%. I will cover some basics and explain all the optimizations I have implemented. This will be done in a iterative way in 8 differents Kernels.
Java Throughput/Latency and Power Efficiency Tuning for AMD EPYC Turin (phoronix.com)
Last month I looked at the impact of AMD's BIOS tuning guide on AI / machine learning workloads for new 5th Gen EPYC "Turin" processors. In today's article I am looking at the performance and power efficiency impact of AMD EPYC 9005 series processors with AMD's BIOS tuning recommendations for Java workloads on Linux.
Bpftune uses BPF to auto-tune Linux systems (github.com/oracle)
How to get the most out of Postgres memory settings (tembo.io)