Hacker News with Generative AI: PyTorch

PyGraph: Robust Compiler Support for CUDA Graphs in PyTorch (arxiv.org)
CUDA Graphs -- a recent hardware feature introduced for NVIDIA GPUs -- aim to reduce CPU launch overhead by capturing and launching a series of GPU tasks (kernels) as a DAG. However, deploying CUDA Graphs faces several challenges today due to the static structure of a graph. It also incurs performance overhead due to data copy. In fact, we show a counter-intuitive result -- deploying CUDA Graphs hurts performance in many cases.
Show HN: Keep your PyTorch model in VRAM by hot swapping code (github.com/valine)
This is an example of how to hotswap PyTorch training code without unloading your model weights from VRAM.
Simple Denoising Diffusion (github.com/utkuozbulak)
This repository contains a bare-bone implementation of denoising diffusion [1,2] in PyTorch, with majority of its code taken from The Annotated Diffusion and Phil Wang's diffusion repository.
PyTorch Internals: Ezyang's Blog (ezyang.com)
This post is a long form essay version of a talk about PyTorch internals, that I gave at the PyTorch NYC meetup on May 14, 2019.
Ways to Use Torch.compile (ezyang.com)
On the surface, the value proposition of torch.compile is simple: compile your PyTorch model and it runs X% faster. But after having spent a lot of time helping users from all walks of life use torch.compile, I have found that actually understanding how this value proposition applies to your situation can be quite subtle! In this post, I want to walk through the ways to use torch.compile, and within these use cases, what works and what doesn't.
Using uv with PyTorch (astral.sh)
The PyTorch ecosystem is a popular choice for deep learning research and development. You can use uv to manage PyTorch projects and PyTorch dependencies across different Python versions and environments, even controlling for the choice of accelerator (e.g., CPU-only vs. CUDA).
PyTorch 101: Understanding Graphs, Automatic Differentiation and Autograd (digitalocean.com)
PyTorch is one of the foremost python deep learning libraries out there. It’s the go to choice for deep learning research, and as each days passes by, more and more companies and research labs are adopting this library.
PyTorch Native Architecture Optimization: Torchao (pytorch.org)
We’re happy to officially launch torchao, a PyTorch native library that makes models faster and smaller by leveraging low bit dtypes, quantization and sparsity. torchao is an accessible toolkit of techniques written (mostly) in easy to read PyTorch code spanning both inference and training. This blog will help you pick which techniques matter for your workloads.
Show HN: LeanRL: Fast PyTorch RL with Torch.compile and CUDA Graphs (github.com/pytorch-labs)
LeanRL is a lightweight library consisting of single-file, pytorch-based implementations of popular Reinforcement Learning (RL) algorithms.
PyTorch 2.4 Now Supports Intel GPUs for Faster Workloads (pytorch.org)
FlexAttention: The Flexibility of PyTorch with the Performance of FlashAttention (pytorch.org)
LightRAG: The PyTorch Library for Large Language Model Applications (github.com/SylphAI-Inc)
Official PyTorch Documentary Revisits Its Past, and Its Future (thenewstack.io)
Build and train GPT-2 from scratch using PyTorch (differ.blog)
LeRobot: Machine Learning for Real-World Robotics in PyTorch (github.com/huggingface)
What's the best PyTorch model visualization tool? (ycombinator.com)