Hacker News with Generative AI: Scaling

Scaling with PostgreSQL without boiling the ocean (shayon.dev)
“Postgres was great when we started but now that our service is being used heavily we are running into a lot of ‘weird’ issues”
Value-Based Deep RL Scales Predictably (arxiv.org)
Scaling data and compute is critical to the success of machine learning. However, scaling demands predictability: we want methods to not only perform well with more compute or data, but also have their performance be predictable from small-scale runs, without running the large-scale experiment.
How to scale your model: A systems view of LLMs on TPUs (jax-ml.github.io)
Training LLMs often feels like alchemy, but understanding and optimizing the performance of your models doesn't have to. This book aims to demystify the science of scaling language models on TPUs: how TPUs work and how they communicate with each other, how LLMs run on real hardware, and how to parallelize your models during training and inference so they run efficiently at massive scale.
S1: Simple Test-Time Scaling (github.com/simplescaling)
This repository provides an overview of all resources for the paper "s1: Simple test-time scaling".
How we scaled Slack to support 1000s of developers (railway.com)
Railway makes software infrastructure for humans. Our pitch is simple. You give us a docker image or GitHub repo. We deploy and scale it, no friction.
Bottleneck Dirty Webs (staysaasy.com)
Delegation, specialization, and federation are critical to scaling companies. But scaling doesn’t mean stepping back from everything. Especially for unsavory, cross-functional, time intensive tasks, leaders should position themselves as bottlenecks - owners that feel pressure when the work grows too much, forcing them to find ways to push back on the growth in time and effort.
Ask HN: What are your experiences with scaling a company? (ycombinator.com)
Let's say your company has 20 employees (markets, products, dev) and one stable product they offer. The plan is to introduce another new product and maybe add new members to the team.
Facebook's Little Red Book (map.cv)
In 2012, Facebook was facing a challenge as it hit a billion users: rapid scaling was outpacing their ability to maintain focus on the big picture. Narratives became fragmented, and with them, the essence of what tied the company to Zuckerberg's vision began to fade.
The Practical Guide to Scaling Django (slimsaas.com)
Most Django scaling guides focus on theoretical maximums. But real scaling isn’t about handling hypothetical millions of users - it’s about systematically eliminating bottlenecks as you grow. Here’s how to do it right, based on patterns that work in production.
Data movement bottlenecks to large-scale model training: Scaling past 1e28 FLOP (epochai.org)
Data movement bottlenecks limit LLM scaling beyond 2e28 FLOP, with a "latency wall" at 2e31 FLOP. We may hit these in ~3 years. Aggressive batch size scaling could potentially overcome these limits.
Possible futures for the Ethereum protocol, part 2: The Surge (eth.limo)
At the beginning, Ethereum had two scaling strategies in its roadmap. One (eg. see this early paper from 2015) was "sharding": instead of verifying and storing all of the transactions in the chain, each node would only need to verify and store a small fraction of the transactions. This is how any other peer-to-peer network (eg. BitTorrent) works too, so surely we could make blockchains work the same way.
Upgrading Uber's MySQL Fleet (uber.com)
What can we do to make games scale? (twitter.com)
Does It Scale (Down)? (bugsink.com)
It’s 2024, and software is in a ridiculous state.
How we built ngrok's data platform (ngrok.com)
At ngrok, we manage an extensive data lake with an engineering team of one (me!).
How Discord stores trillions of messages (2023) (discord.com)
In 2017, we wrote a blog post on how we store billions of messages.
Sharding and Scaling PostgreSQL, No Citus (pg-sharding.tech)
Scaling Rails and Postgres to millions of users at Microsoft (stepchange.work)
We run migrations across 2,800 microservices (monzo.com)
How OpenAI Scaled Kubernetes to 7,500 Nodes by Removing One Plugin (betterstack.com)
Scaling One Million Checkboxes to 650M checks (eieio.games)
SPQR 1.5.0: a production-ready system for horizontal scaling of PostgreSQL (github.com/pg-sharding)
Our Wandering Path to Supporting 1000s of Domain Names (fusionauth.io)
Building and scaling Notion's data lake (notion.so)
SPQR: Scaling PostgreSQL via Sharding (mintlify.app)
Will We Run Out of Data? Limits of LLM Scaling Based on Human-Generated Data (epochai.org)
Nexus zkVM: Efficient, massively-parallel, zero-knowledge proving (nexus.xyz)
Pinterest Scaled to 11M Users with Only 6 Engineers (medium.com)
Meritocracy at Scale (scale.com)
Scaling Clubhouse From 10K to 10M Users In 6 Months With Postgres (stepchange.work)