Hacker News with Generative AI: Performance

What Do Van Halen and Brown M&M's Have to Do with Safety? (safetydimensions.com.au)
There’s a long tradition of musicians and actors adding absurd demands to their performance contracts just because they could.
Fast Video Generation with Sliding Tile Attention (hao-ai-lab.github.io)
TL;DR: Video generation with DiTs is painfully slow – HunyuanVideo takes 16 minutes to generate just a 5-second video on an H100 with FlashAttention3. Our sliding tile attention (STA) slashes this to 5 minutes with zero quality loss, no extra training required. Specifically, STA accelerates attention alone by 2.8–17x over FlashAttention-2 and 1.6–10x over FlashAttention-3.
RPython Compilation Quick Reference (tilde.town)
Well first things first, this is way more ram hungry than you'd expect, and you're restricted to python 2. BUT, you do end up with nice fast binaries at the end. So let's proceed!
Kafka at the low end: how bad can it get? (broot.ca)
There is oft-quoted advice that Kafka does poorly as a job queue. I’ve experienced this myself, and I wanted to formalize it a bit.
One year after switching from Java to Go (glasskube.dev)
I always told people memory is cheap, black magic is OK and efficiency doesn't matter in most cases, but boy, how wrong was I...
Making my debug build run 100x faster so that it is finally usable (gaultier.github.io)
SIMD and dedicated silicon to the rescue.
Nvidia GeForce GTX 980 Through GeForce RTX 5080/5090 GPU Compute Performance (phoronix.com)
Complementing the recent Linux GPU benchmarks of the NVIDIA GeForce RTX 5080 and GeForce RTX 5090 looking at both the Linux / Steam Play gaming performance as well as GPU compute and other areas, in today's testing is a wide multi-generation look seeing how the NVIDIA GeForce performance has evolved going back to the GeForce GTX 980 Maxwell GPUs up through the newest GeForce RTX 5080/5090 graphics cards.
High-School Band Contests Turn Marching into a Sport–and an Art (newyorker.com)
Band kids today don’t just parade up and down the field playing fight songs. They flow across it in shifting tableaux, with elaborate themes and spandex-clad dancers.
The Impact of Metadata Configurations on Text-to-SQL Performance [pdf] (corraldata.com)
AVX-512 gotcha: avoid compressing words to memory with AMD Zen 4 processors (lemire.me)
The recent AMD processors (Zen 4) provide extensive support for the powerful AVX-512 instructions.
Linux kernel cgroups writeback high CPU troubleshooting (dasl.cc)
We’ve been upgrading the operating system from CentOS to Ubuntu on hosts across our fleet. Our CentOS hosts run an outdated Linux kernel version (3.10), whereas our Ubuntu hosts run a more modern kernel version (6.8). In August 2024, we began rolling out the Ubuntu upgrade across our Apache web servers. When we migrated larger portions of our fleet to Ubuntu, we began seeing elevated listen overflow errors.
Rust: Doubling Throughput with Continuous Profiling and Optimization (polarsignals.com)
“68.37% of CPU [...] with a one-line code change [...] went down to 31.82%”
Tiny JITs for a Faster FFI (railsatscale.com)
Can we have a faster FFI for CRuby? Yes.
The average CPU performance of PCs and notebooks fell for the first time (cpubenchmark.net)
Over 1,000,000 CPUs Benchmarked
Linux 6.13 Performance for 250Hz vs. 1000Hz Timer Frequency Comparison (phoronix.com)
Given the recent patch proposal to raise the Linux kernel's default timer frequency from 250Hz to 1000Hz, I ran some fresh benchmarks looking at the 250Hz vs. 1000Hz comparison on some modern desktop hardware.
Show HN: Fluvio 38.8x faster than Kafka (infinyon.com)
At InfinyOn for the past 6 years we have obsessed over developer ergonomics, functionality, and reliability of Fluvio and Stateful DataFlow. It’s not trivial to build a distributed streaming engine from the ground up.
Do It Yourself Database CDN with Embedded Replicas (turso.tech)
Imagine you have a user in Singapore, and your database is in the US. Every time the user makes a request, it has to travel halfway around the world, which can lead to high latency and poor performance.
Git clone –depth 2 is vastly better than –depth 1 if you want to Git push later (stackoverflow.com)
I've done a shallow clone of a large repo (git clone --depth 1 [email protected]:myOrg/myRepo.git). I can push new changes to the remote but the first push is horribly slow. Subsequent pushes are fine. The command indicates that the first push writes a lot of data to the remote:
The first yearly drop in average CPU performance in its 20 years of benchmarks (tomshardware.com)
Go's new map implementation in 1.24 is powered by Swiss Tables (twitter.com)
Can Cheap MiniPC with FreeBSD 14 Outperform MacBook Pro M1 Pro? (interfacecraft.online)
I put my €1800 MacBook Pro M1 Pro head-to-head with a €300 mini PC and found the cheaper option surprisingly fast.
The Lengthiest HTTP Headers (fastly.com)
Large web page bodies make your page load slowly, but what about large headers?
The missing tier for query compilers (scattered-thoughts.net)
Database query engines used to be able to assume that disk latency was so high that the overhead of interpreting the query plan didn't matter. Unfortunately these days a cheap nvme ssd can supply data much faster than a query interpreter can process it.
Scaling with PostgreSQL without boiling the ocean (shayon.dev)
“Postgres was great when we started but now that our service is being used heavily we are running into a lot of ‘weird’ issues”
JEP draft: 4-byte Object Headers (Experimental (openjdk.org)
Reduce the size of object headers in the HotSpot JVM from between 64 and 128 bits down to 32 bits on 64-bit architectures. This will reduce heap size, improve deployment density, and increase data locality.
No-Libc Zig Now Outperforms Glibc Zig (ziglang.org)
No-Libc Zig Now Outperforms Glibc Zig
No-Libc Zig Now Outperforms Glibc Zig (ziglang.org)
No-Libc Zig Now Outperforms Glibc Zig
GCC 15 Compiler Showing Off Nice Performance Improvements on AMD Zen 5 (phoronix.com)
There were a number of other applications with small but consistent performance improvements when built by GCC 15.
Solving Postgres' Search Limitations (paradedb.com)
We recently completed one of our biggest engineering bets to date: migrating pg_search, a Postgres extension for full text search and analytics, to Postgres' block storage system. In doing so, pg_search is the first-ever extension1 to port an external file format to Postgres block storage.
From hours to 360ms: over-engineering a puzzle solution (danielh.cc)
In January 2025, Jane Street posted an interesting puzzle: