Hacker News with Generative AI: GPU

GPU-Driven Clustered Forward Renderer (logdahl.net)

GPU, Rendering, Graphics, Computer Graphics

116 points by logdahl 54 days ago | 28 comments

CUDA version of GROMACS is faster on AMD than HIP port (scale-lang.com)
With the release of version 1.3.1, SCALE has reached a major compatibility milestone: the ability to run the CUDA version of GROMACS on AMD GPUs.

CUDA, AMD, GPU, Scientific Computing, Software

15 points by msond 54 days ago | 3 comments

Arm's Bifrost Architecture and the Mali-G52 (chipsandcheese.com)
Arm (the company) is best known for its Cortex CPU line. But Arm today has expanded to offer a variety of licensable IP blocks, ranging from interconnects to IOMMUs to GPUs.

Computer Architecture, Arm, GPU, Chips, Technology

7 points by klelatti 64 days ago | 1 comments

Linear Programming for Fun and Profit (modal.com)
If you haven’t noticed, the GPU market is highly volatile. NVIDIA repeatedly spews out new chip architectures, doubling FLOPS every few years. Everyone shifts towards the newest cards, causing temporary supply crunches and high prices. But Modal’s customers don’t want to think about these price fluctuations. They want GPUs of all kinds at predictable and good prices, and the ability to demand thousands of GPUs on a moment’s notice, without having to worry about pricing, capacity planning, or supply.

Linear Programming, Cloud Computing, GPU, Artificial Intelligence

62 points by hmac1282 65 days ago | 15 comments

Doom GPU Flame Graphs (brendangregg.com)
AI Flame Graphs are now open source and include Intel Battlemage GPU support, which means it can also generate full-stack GPU flame graphs for providing new insights into gaming performance, especially when coupled with FlameScope (an older open source project of mine). Here's an example of GZDoom, and I'll start with flame scopes for both CPU and GPU utilization, with details annotated:

Gaming Performance, Open Source, GPU, Flame Graphs, Software

107 points by zdw 73 days ago | 30 comments

GPU Price Tracker (unitedcompute.ai)
Track current prices, specifications, and historical trends for the most popular GPUs

GPU, Hardware, Pricing, Trends

54 points by ushakov 77 days ago | 45 comments

EGPU: Extending eBPF Programmability and Observability to GPUs (aptaracorp.com)
Precise GPU observability and programmability are essential for optimizing performance in AI workloads and other computationally intensive high-performance computing (HPC) applications.

GPU, Programming, Observability, AI

15 points by tanelpoder 94 days ago | 4 comments

GPU Server with 8 RTX 4090 (a16z.com)
In today’s AI-driven world, the ability to train AI models locally and perform fast inference on GPUs at an optimal cost is more important than ever.

GPU, AI, Hardware, Training, Inference

5 points by m3at 99 days ago | 2 comments

The Asus Ascent GX10 a Nvidia GB10 Mini PC with 128GB of Memory and 200GbE (servethehome.com)
NVIDIA’s platform, previously codenamed Project DIGITS, is a hit at GTC 2025. Apparently, big customers are asking if they can get a DGX Spark thrown in with large GPU purchases. The reason is simple, this is a mini PC form factor that packs an Arm CPU and a Blackwell GPU that are co-packaged, a 128GB LPDDR5x shared memory, multiple ports of USB4, and even a ConnectX-7 NIC for 200GbE clustering.

Mini PCs, NVIDIA, Arm, GPU, Networking

46 points by rbanffy 115 days ago | 50 comments

AMD Radeon RX 9070 Series Linux GPU Compute Performance (phoronix.com)
In addition to the Radeon RX 9070 series Linux gaming/graphics benchmarks with today's embargo lift, I've also spent some time working on some GPU compute benchmarks for these first RDNA4 graphics cards.

AMD, Linux, GPU, Graphics Cards, Performance

13 points by mfiguiere 129 days ago | 1 comments

DeepSeek open source DeepEP – library for MoE training and Inference (github.com/deepseek-ai)
DeepEP is a communication library tailored for Mixture-of-Experts (MoE) and expert parallelism (EP). It provides high-throughput and low-latency all-to-all GPU kernels, which are also as known as MoE dispatch and combine. The library also supports low-precision operations, including FP8.

Machine Learning, Open Source, Software, Deep Learning, GPU

536 points by helloericsf 138 days ago | 71 comments

DeepSeek Open Source FlashMLA – MLA Decoding Kernel for Hopper GPUs (github.com/deepseek-ai)
FlashMLA is an efficient MLA decoding kernel for Hopper GPUs, optimized for variable-length sequences serving.

GPU, Open Source, Machine Learning, Decoding, Computer Vision

441 points by helloericsf 139 days ago | 108 comments

The Ultra-Scale Playbook: Training LLMs on GPU Clusters (huggingface.co)
Refreshing

Machine Learning, GPU, Training, AI

33 points by jxmorris12 144 days ago | 3 comments

Ask HN: Confused about how DeepSeek hurts Nvidia (ycombinator.com)
I’m genuinely confused about why people think Deepseeks results will mean fewer GPUs being needed in the future.

Artificial Intelligence, GPU, Hardware

36 points by prng2021 166 days ago | 43 comments

Bilinear down/upsampling, aligning pixel grids, and that infamous GPU half pixel (2021) (bartwronski.com)
See this ugly pixel shift when upsampling a downsampled image? My post describes where it can come from and how to avoid those!

Image Processing, Computer Graphics, GPU, Pixel Manipulation, Downsampling/Upsampling

136 points by fanf2 167 days ago | 23 comments

Show HN: A GPU-accelerated MD5 Hash Cracker, Written Using Rust and CUDA (github.com/vaktibabat)
MD5 hash cracking with CUDA and Rust, implemented from scratch

Security, Programming, Rust, GPU, CUDA

4 points by vaktibabat 194 days ago | 0 comments

Chinese GPU designers received key technologies from British company Imagination (tomshardware.com)

Chinese Companies, GPU, Technology, Semiconductors

15 points by ksec 205 days ago | 2 comments

GPU Glossary (modal.com)

GPU, Glossary, Graphics, Hardware, Technology

5 points by georgehill 207 days ago | 0 comments

Show HN: Svader – Create GPU-rendered Svelte components (github.com/sockmaster27)
Create GPU-rendered Svelte components with WebGL and WebGPU fragment shaders.

Web Development, Svelte, GPU, Graphics, WebGPU

189 points by sokmastr 211 days ago | 41 comments

GPU Glossary (modal.com)
We wrote this glossary to solve a problem we ran into working with GPUs here at Modal : the documentation is fragmented, making it difficult to connect concepts at different levels of the stack, like Streaming Multiprocessor Architecture , Compute Capability , and nvcc compiler flags .

GPU, Glossary, Hardware, Computer Science, Documentation

15 points by iamwil 212 days ago | 1 comments

Exploring inference memory saturation effect: H100 vs. MI300x (dstack.ai)
GPU memory plays a critical role in LLM inference, affecting both performance and cost. This benchmark evaluates memory saturation’s impact on inference using NVIDIA's H100 and AMD's MI300x with Llama 3.1 405B FP8.

GPU, Benchmarking, Inference, Hardware

57 points by latchkey 220 days ago | 12 comments

Compilation on the GPU? A Feasibility Study (dl.acm.org)

GPU, Compilation, Feasibility Study, Performance

27 points by luu 221 days ago | 13 comments

Compilation on the GPU? A Feasibility Study (2022) (dl.acm.org)
The emergence of highly parallel architectures has led to a renewed interest in parallel compilation.

GPU, Compilation, Feasibility Study, Computer Science

5 points by fanf2 222 days ago | 0 comments

Scale (run CUDA on AMD GPUs without mods) supports gfx900 and gfx1102 (scale-lang.com)

GPU, Open Source, Programming, AMD

28 points by mgl 223 days ago | 2 comments

Optimizing a Rust GPU matmul kernel (rust-gpu.github.io)
I read the excellent post Optimizing a WebGPU Matmul Kernel for 1TFLOP+ Performance by Zach Nussbaum and thought it might be fun to reimplement it with Rust GPU.

Rust, GPU, Performance Optimization, WebGPU

97 points by lukastyrychtr 225 days ago | 12 comments

Optimizing a Rust GPU matmul kernel (rust-gpu.github.io)
I read the excellent post Optimizing a WebGPU Matmul Kernel for 1TFLOP+ Performance by Zach Nussbaum and thought it might be fun to reimplement it with Rust GPU.

Rust, GPU, Performance Optimization

5 points by LegNeato 230 days ago | 0 comments

$2 H100s: How the GPU Rental Bubble Burst (latent.space)
H100s used to be $8/hr if you could get them. Now there's 7 different places sometimes selling them under $2. What happened?

GPU, Cloud Computing, Market Trends, Hardware

403 points by swyx 275 days ago | 279 comments

Scuda – Virtual GPU over IP (github.com/kevmo314)
SCUDA is a GPU over IP bridge allowing GPUs on remote machines to be attached to CPU-only machines.

Remote Computing, Virtualization, GPU, Software, Networking

207 points by kevmo314 277 days ago | 40 comments

Show HN: Squey, an open-source GPU-accelerated data visualization software (squey.org)
Squey 5.0 is out! Check out the new Parquet plugin and the revamped UISquey

Open Source, Data Visualization, Software, GPU

66 points by jbleonesio 278 days ago | 13 comments

炊紙(kashikishi) is a text editor that utilizes GPU to edit text in a 3D space (github.com/mitoma)
炊紙は三次元空間上でテキストを編集できるテキストエディタです。「かしきし」と発音します。

Text Editors, GPU, 3D, Software, Programming

240 points by hiroshi3110 282 days ago | 101 comments