Hacker News with Generative AI: Performance

GNU Coreutils 9.6 Released with Changes for POSIX 2024, More AVX2, AVX-512 use (phoronix.com)
GNU Coreutils 9.6 released today as the updated version of these core utilities common to Linux systems and elsewhere.
Skyvern Browser Agent 2.0: How We Reached State of the Art in Evals (skyvern.com)
We’ve been working hard cooking up something new to share with you all!
Intel Arc B570 Graphics Performance on Linux Review (phoronix.com)
Last month when Intel formally introduced Battlemage graphics their initial products in the B-Series were the B570 and B580 graphics cards.
Is there such a thing as a web-safe font? (highperformancewebfonts.com)
Tools and resources on how to use web fonts without sacrificing page loading speeds
Is WebAssembly Memory64 worth using? (spidermonkey.dev)
After many long years, the Memory64 proposal for WebAssembly has finally been released in both Firefox 134 and Chrome 133. In short, this proposal adds 64-bit pointers to WebAssembly.
LLM runs faster since we switched from AMD's driver to our AM driver (twitter.com)
YJIT 3.4: Even Faster and More Memory-Efficient (railsatscale.com)
It’s 2025, and this year again, the YJIT team brings you a new version of YJIT that is even faster, more stable, and more memory-efficient.
Scaling LLMs with Golang: How we serve millions of LLM requests (assembled.com)
While the LLM ecosystem is overwhelmingly Python-first, we've found Go to be exceptionally well-suited for production deployments. Our Go-based infrastructure handles millions of monthly LLM requests with minimal performance tuning. Beyond Go's well-documented advantages (see Rob Pike’s excellent distillation of Go's benefits), three capabilities have proven particularly valuable for LLM workloads: static type checking for handling model outputs, goroutines for managing concurrent API calls, and interfaces for building composable response pipelines.
JUring: Experimental IO_uring for Java with Big Performance Gains (phoronix.com)
For those looking toward better I/O performance with Java, there is JUring for making use of IO_uring and the reported performance benefits are very enticing.
The missing tier for query compilers (scattered-thoughts.net)
Database query engines used to be able to assume that disk latency was so high that the overhead of interpreting the query plan didn't matter. Unfortunately these days a cheap nvme ssd can supply data much faster than a query interpreter can process it.
Show HN: JUring – Java bindings for io_uring file I/O (github.com/davidtos)
JUring is a high-performance Java library that provides bindings to Linux's io_uring asynchronous I/O interface using Java's Foreign Function & Memory API. Doing Random reads JUring achieves 33% better performance than Java NIO FileChannel operations for local files and 78% better performance for remote files.
Why I Chose Common Lisp (djhaskin.com)
After ~7 years, I was done with Clojure. I was writing a some CLI apps, and I hated how long they took to start up. The community at large seemed not to care about this problem, except for the babashka folks. However, I spent long, hard hours banging my head against native-image and it just wasn't working out. It was incredibly painful, and at the end of it, I still didn't have standalone, fast-starting native executables.
uv: An extremely fast Python package and project manager, written in Rust. (github.com/astral-sh)
An extremely fast Python package and project manager, written in Rust.
Using black magic to make a fast circular buffer (2017) (calho.st)
Yesterday, I took a glance at the Wikipedia page for the circular buffer and was intrigued by an alleged optimization technique that I was not familiar with:
MessagePack: It's like JSON, but fast and small. (msgpack.org)
MessagePack is an efficient binary serialization format. It lets you exchange data among multiple languages like JSON. But it's faster and smaller. Small integers are encoded into a single byte, and typical short strings require only one extra byte in addition to the strings themselves.
Understanding JVM Garbage Collector Performance (mill-build.org)
Garbage collectors are a core part of many programming languages. While they generally work well, on occasion when they go wrong they can fail in very unintuitive ways. This article will discuss the fundamental design of how garbage collectors work, and tie it to real benchmarks of how GCs perform on the Java Virtual Machine.
A Deep Dive into JVM Start Up (inside.java)
Make sure to check the video description.
If GPUs Are So Good, Why Do We Still Use CPUs at All? (codingstuff.substack.com)
There’s this old video from 2009 that’s been going viral on Twitter recently. Its supposed to give viewers an intuition of the difference between CPUs and GPUs.
Musings on Tracing in PyPy (pypy.org)
Last summer, Shriram Krishnamurthi asked on Twitter:
AMD Ryzen 9 9950X3D and 9900X3D claims 20% faster gaming performance vs. Intel (tomshardware.com)
HTTP/2 Flow Control Deadlock (ycombinator.com)
Our real-world experience with flow control deadlock that manifested as infinite HTTP request hangs.
Lord of the Io_uring (2020) (unixism.net)
io_uring is a powerful new way to do asynchronous I/O programming under Linux. Doing away with various limitations of previous generation I/O subsystems, io_uring holds immense promise. For more details on what io_uring brings to the table, please see the chapter What is io_uring?
Optimizing uint64_t Digit Counting: A Method that Beats Lemire's by up to 143% (github.com/RealTimeChris)
Optimizing Ruby's JSON, Part 5 (byroot.github.io)
In the previous post, we showed how we eliminated two malloc/free pairs of calls when generating small JSON documents, and how that put us ahead of Oj when reusing the JSON::State object.
Multi-Path TCP: revolutionizing connectivity, one path at a time (cloudflare.com)
The Internet is designed to provide multiple paths between two endpoints. Attempts to exploit multi-path opportunities are almost as old as the Internet, culminating in RFCs documenting some of the challenges. Still, today, virtually all end-to-end communication uses only one available path at a time.
Do Files want to be Actors? (lewiscampbell.tech)
In the world of high performance linux apps, io_uring is changing how we communicate with the operating system.
Visually Compare Retry Algorithms (compareretries.com)
Compare linear, exponential, and capped exponential backoff strategies with configurable jitter
Production Twitter on One Machine? 100Gbps NICs and NVMe Are Fast (thume.ca)
In this post I’ll attempt the fun stunt of designing a system that could serve the full production load of Twitter with most of the features intact on a single (very powerful) machine.
Women are closing in on men when it comes to ultra-endurance events (medicalxpress.com)
Men are dominant at most athletic events but ultra-endurance sports (exercising for six hours or more) represent a unique domain where the performance gap between men and women is narrowing significantly.
An Unreasonable Amount of Time (allenpike.com)
Years ago, Teller performed a magic trick.