Ask HN: Confused about how DeepSeek hurts Nvidia
(ycombinator.com)
I’m genuinely confused about why people think Deepseeks results will mean fewer GPUs being needed in the future.
I’m genuinely confused about why people think Deepseeks results will mean fewer GPUs being needed in the future.
Bilinear down/upsampling, aligning pixel grids, and that infamous GPU half pixel (2021)
(bartwronski.com)
See this ugly pixel shift when upsampling a downsampled image? My post describes where it can come from and how to avoid those!
See this ugly pixel shift when upsampling a downsampled image? My post describes where it can come from and how to avoid those!
Show HN: A GPU-accelerated MD5 Hash Cracker, Written Using Rust and CUDA
(github.com/vaktibabat)
MD5 hash cracking with CUDA and Rust, implemented from scratch
MD5 hash cracking with CUDA and Rust, implemented from scratch
Show HN: Svader – Create GPU-rendered Svelte components
(github.com/sockmaster27)
Create GPU-rendered Svelte components with WebGL and WebGPU fragment shaders.
Create GPU-rendered Svelte components with WebGL and WebGPU fragment shaders.
GPU Glossary
(modal.com)
We wrote this glossary to solve a problem we ran into working with GPUs here at Modal : the documentation is fragmented, making it difficult to connect concepts at different levels of the stack, like Streaming Multiprocessor Architecture , Compute Capability , and nvcc compiler flags .
We wrote this glossary to solve a problem we ran into working with GPUs here at Modal : the documentation is fragmented, making it difficult to connect concepts at different levels of the stack, like Streaming Multiprocessor Architecture , Compute Capability , and nvcc compiler flags .
Exploring inference memory saturation effect: H100 vs. MI300x
(dstack.ai)
GPU memory plays a critical role in LLM inference, affecting both performance and cost. This benchmark evaluates memory saturation’s impact on inference using NVIDIA's H100 and AMD's MI300x with Llama 3.1 405B FP8.
GPU memory plays a critical role in LLM inference, affecting both performance and cost. This benchmark evaluates memory saturation’s impact on inference using NVIDIA's H100 and AMD's MI300x with Llama 3.1 405B FP8.
Compilation on the GPU? A Feasibility Study (2022)
(dl.acm.org)
The emergence of highly parallel architectures has led to a renewed interest in parallel compilation.
The emergence of highly parallel architectures has led to a renewed interest in parallel compilation.
Optimizing a Rust GPU matmul kernel
(rust-gpu.github.io)
I read the excellent post Optimizing a WebGPU Matmul Kernel for 1TFLOP+ Performance by Zach Nussbaum and thought it might be fun to reimplement it with Rust GPU.
I read the excellent post Optimizing a WebGPU Matmul Kernel for 1TFLOP+ Performance by Zach Nussbaum and thought it might be fun to reimplement it with Rust GPU.
Optimizing a Rust GPU matmul kernel
(rust-gpu.github.io)
I read the excellent post Optimizing a WebGPU Matmul Kernel for 1TFLOP+ Performance by Zach Nussbaum and thought it might be fun to reimplement it with Rust GPU.
I read the excellent post Optimizing a WebGPU Matmul Kernel for 1TFLOP+ Performance by Zach Nussbaum and thought it might be fun to reimplement it with Rust GPU.
$2 H100s: How the GPU Rental Bubble Burst
(latent.space)
H100s used to be $8/hr if you could get them. Now there's 7 different places sometimes selling them under $2. What happened?
H100s used to be $8/hr if you could get them. Now there's 7 different places sometimes selling them under $2. What happened?
Scuda – Virtual GPU over IP
(github.com/kevmo314)
SCUDA is a GPU over IP bridge allowing GPUs on remote machines to be attached to CPU-only machines.
SCUDA is a GPU over IP bridge allowing GPUs on remote machines to be attached to CPU-only machines.
Show HN: Squey, an open-source GPU-accelerated data visualization software
(squey.org)
Squey 5.0 is out! Check out the new Parquet plugin and the revamped UISquey
Squey 5.0 is out! Check out the new Parquet plugin and the revamped UISquey
炊紙(kashikishi) is a text editor that utilizes GPU to edit text in a 3D space
(github.com/mitoma)
炊紙は三次元空間上でテキストを編集できるテキストエディタです。「かしきし」と発音します。
炊紙は三次元空間上でテキストを編集できるテキストエディタです。「かしきし」と発音します。
Hetzner introduces GPU server for AI training
(hetzner.com)
Discover the next level of performance with the new GEX130 dedicated GPU server. Equipped with the NVIDIA RTX™ 6000 Ada generation graphics card, it can put ideas into practice even faster and highly complex tasks can be completed efficiently.
Discover the next level of performance with the new GEX130 dedicated GPU server. Equipped with the NVIDIA RTX™ 6000 Ada generation graphics card, it can put ideas into practice even faster and highly complex tasks can be completed efficiently.
Writing Portable Rendering Code with Nvrhi
(nvidia.com)
Modern graphics APIs, such as Direct3D 12 and Vulkan, are designed to provide relatively low-level access to the GPU and eliminate the GPU driver overhead associated with API translation.
Modern graphics APIs, such as Direct3D 12 and Vulkan, are designed to provide relatively low-level access to the GPU and eliminate the GPU driver overhead associated with API translation.
Show HN: Oblivus GPU Cloud – On-Demand H100s from $1.98/hr – $25 Free Credit
(oblivus.com)
Democratized GPU Cloud starting at only $0.12/hr! No quotas, no restrictions.
Democratized GPU Cloud starting at only $0.12/hr! No quotas, no restrictions.
GPU Debug Scopes
(wunkolo.github.io)
Rendering APIs these days tend to capture their gpu workloads into a serialized form such as a command-buffer or command-list to be dispatched at a later time into a work-queue.
Rendering APIs these days tend to capture their gpu workloads into a serialized form such as a command-buffer or command-list to be dispatched at a later time into a work-queue.