Hacker News with Generative AI: CUDA

Show HN: HipScript – Run CUDA in the browser with WebAssembly and WebGPU (lights0123.com)
Online compiler for HIP and NVIDIA® CUDA® code to WebGPU
Show HN: Lightweight Llama3 Inference Engine – CUDA C (github.com/abhisheknair10)
Llama3.cu is a CUDA native implementation of the LLaMA3 architecture for causal language modeling.
Show HN: A GPU-accelerated MD5 Hash Cracker, Written Using Rust and CUDA (github.com/vaktibabat)
MD5 hash cracking with CUDA and Rust, implemented from scratch
Show HN: Cudair – live-reloading for developing CUDA applications (github.com/ei-sugimoto)
cudair enable live-reloading for developing CUDA applications like golang-air. I recommend using docker.
Train a Mnist VAE with C and CUDA (github.com/ggerganov)
Hi, I just want to share what I have been working on recently. This is an example of training a MNIST VAE. The goal is to use only ggml pipeline and its implementation of ADAM optimizer.
Fast LLM Inference From Scratch (using CUDA) (andrewkchan.dev)
This post is about building an LLM inference engine using C++ and CUDA from scratch without libraries.
Check if your performance intuition still works with CUDA (wordsandbuttons.online)
For those of you who don't know what CUDA is, let me explain. Imagine, buses were never invented. There are cars, trains, planes, and motorcycles, just not buses. And one day someone smart asks himself: “wouldn't it be splendid to have cars that would fit a lot of people? One guy could be driving, and all the rest will enjoy the ride.” “Right, like trucks but for people!” “No-no-no, who on earth would ever want to travel by truck?
CUDA Programming Course – High-Performance Computing with GPUs [video] (youtube.com)
John Nickolls "ultimately willed CUDA into existence" (twitter.com)
Initial CUDA Performance Lessons (probablydance.com)
I am somehow very late to learning CUDA. I didn’t even know until recently that CUDA is just C++ with a small amount of extra stuff. If I had known that there is so little friction to learning it, I would have checked it out much earlier. But if you come in with C++ habits, you’ll write suboptimal code, so here are some lessons I had to learn to get things to run fast.
Zen, CUDA, and Tensor Cores – Part 1 [video] (youtube.com)
Zen, CUDA, and Tensor Cores, Part I: The Silicon (computerenhance.com)
Gemlite: Towards Building Custom Low-Bit Fused CUDA Kernels (mobiusml.github.io)
LibreCUDA – Launch CUDA code on Nvidia GPUs without the proprietary runtime (github.com/mikex86)
Open-Source AMD GPU Implementation of CUDA "Zluda" Has Been Taken Down (phoronix.com)
How to optimize a CUDA matmul kernel for cuBLAS-like performance (2022) (siboehm.com)
Run CUDA, unmodified, on AMD GPUs (scale-lang.com)
Show HN: UNet diffusion model in pure CUDA (github.com/clu0)
The One Billion Row Challenge in CUDA (tspeterkim.github.io)