Hacker News with Generative AI: Systems Architecture

How to scale your model: A systems view of LLMs on TPUs (jax-ml.github.io)
Training LLMs often feels like alchemy, but understanding and optimizing the performance of your models doesn't have to. This book aims to demystify the science of scaling language models on TPUs: how TPUs work and how they communicate with each other, how LLMs run on real hardware, and how to parallelize your models during training and inference so they run efficiently at massive scale.
Production Twitter on One Machine? 100Gbps NICs and NVMe Are Fast (thume.ca)
In this post I’ll attempt the fun stunt of designing a system that could serve the full production load of Twitter with most of the features intact on a single (very powerful) machine.