Hacker News with Generative AI: GPU Programming

Next-Gen GPU Programming: Hands-On with Mojo and Max Modular HQ (youtube.com)
PyGraph: Robust Compiler Support for CUDA Graphs in PyTorch (arxiv.org)
CUDA Graphs -- a recent hardware feature introduced for NVIDIA GPUs -- aim to reduce CPU launch overhead by capturing and launching a series of GPU tasks (kernels) as a DAG. However, deploying CUDA Graphs faces several challenges today due to the static structure of a graph. It also incurs performance overhead due to data copy. In fact, we show a counter-intuitive result -- deploying CUDA Graphs hurts performance in many cases.
CubeCL: GPU Kernels in Rust for CUDA, ROCm, and WGPU (github.com/tracel-ai)
With CubeCL, you can program your GPU using Rust, taking advantage of zero-cost abstractions to develop maintainable, flexible, and efficient compute kernels.
How to Write a Fast Matrix Multiplication from Scratch with Tensor Cores (2024) (alexarmbr.github.io)
This post details my recent efforts to write an optimized matrix multiplication kernel in CUDA using tensor cores on a NVIDIA Tesla T4 GPU. The goal is to compute $D = \alpha * A * B + \beta * C$, as fast as possible. In this equation $D,A,B$ and $C$ are large matrices full of half precision floating point numbers, and $\alpha$, $\beta$ are constants. This problem is usually referred to as a Half-precision Generalized Matrix Multiply, or HGEMM for short.
Zig and GPUs (alichraghi.github.io)
GPU programming used to mean wrangling C++ compilers, bloated SDKs, and vendor-specific toolchains. That’s changing. You can now write GPU code in modern languages like Rust and Zig with fewer layers. This post walks through the current state of Zig’s GPU backends and how they stack up across Vulkan, OpenCL, and native ISAs.
Rust CUDA Project (github.com/Rust-GPU)
An ecosystem of libraries and tools for writing and executing extremely fast GPU code fully in Rust.
Show HN: HipScript – Run CUDA in the browser with WebAssembly and WebGPU (lights0123.com)
Online compiler for HIP and NVIDIA® CUDA® code to WebGPU
GPU Programming Glossary (modal.com)
Using Libc for GPUs (llvm.org)
Once you have finished building the GPU C library it can be used to run libc or libm functions directly on the GPU. Currently, not all C standard functions are supported on the GPU. Consult the list of supported functions for a comprehensive list.
Rust GPU: The future of GPU programming (rust-gpu.github.io)
Finally, a modern language for GPUs <p>Rust GPU makes it possible to write and run GPU software in Rust, leveraging the language's powerful safety and concurrency features to enhance performance and reliability. With Rust GPU, you can seamlessly develop for both CPU and GPU using a unified codebase, all while benefiting from Rust’s existing ecosystem.</p>
GPU Puzzles (github.com/srush)
This notebook is an attempt to teach beginner GPU programming in a completely interactive fashion. Instead of providing text with concepts, it throws you right into coding and building GPU kernels.
Ask HN: Resources for GPU Compilers? (ycombinator.com)
Taichi: Productive, portable, and performant GPU programming in Python (github.com/taichi-dev)
Gpu.cpp: A lightweight library for portable low-level GPU computation (answer.ai)
ILGPU: Write GPU programs with C# and F# (github.com/m4rs-mt)
Show HN: Metashade – a Pythonic GPU shading/compute EDSL (github.com/ppenenko)