Hacker News with Generative AI: Performance

Fast Allocations in Ruby 3.5 (railsatscale.com)
Many Ruby applications allocate objects. What if we could make allocating objects six times faster? We can! Read on to learn more!
Improving performance of rav1d video decoder (ohadravid.github.io)
Making the rav1d Video Decoder 1% Faster
Oodle 2.9.14 and Intel 13th/14th gen CPUs (wordpress.com)
There’s a hardware problem affecting Intel 13th/14th gen CPUs, mostly desktop ones. This made the rounds through the press last year and has been on forums etc. for much longer than that. For months, we thought this was a rare bug in the decoder, but from stats in Epic’s crash reports for Fortnite (as well as stats for other Unreal Engine and Oodle licensees) it was fairly striking that this issue really seemed to affect only some CPUs.
Direct TLS can speed up your connections (marc-bowes.com)
A few months ago, one of my Aurora DSQL teammates reported a curious finding.
Fast Allocations in Ruby 3.5 (railsatscale.com)
Many Ruby applications allocate objects. What if we could make allocating objects six times faster? We can! Read on to learn more!
JavaScript Ecosystem Performance (e18e.dev)
e18e (Ecosystem Performance) is an initiative to connect the folks and projects working to improve the performance of JS packages.
An Almost Pointless Exercise in GPU Optimization (speechmatics.com)
Not everyone is able to write funky fused operators to make ML models run faster on GPUs using clever quantisation tricks. However lots of developers work with algorithms that feel like they should be able to leverage the thousands of cores in a GPU to run faster than using the dozens of cores on a server CPU. To see what is possible and what is involved, I revisited the first problem I ever considered trying to accelerate with a GPU.
“ZLinq”, a Zero-Allocation LINQ Library for .NET (medium.com)
I’ve released ZLinq v1 last month! By building on structs and generics, it achieves zero allocations. It includes extensions like LINQ to Span, LINQ to SIMD, LINQ to Tree (FileSystem, JSON, GameObject, etc.), a drop-in replacement Source Generator for arbitrary types, and support for multiple platforms including .NET Standard 2.0, Unity, and Godot. It has now exceeded 2000 GitHub stars.
The Lost Decade of Small Data? (duckdb.org)
TL;DR: We benchmark DuckDB on a 2012 MacBook Pro to decide: did we lose a decade chasing distributed architectures for data analytics?
LLM-D: Kubernetes-Native Distributed Inference (llm-d.ai)
llm-d is a Kubernetes-native high-performance distributed LLM inference framework - a well-lit path for anyone to serve at scale, with the fastest time-to-value and competitive performance per dollar for most models across most hardware accelerators.
Visual Studio Code: Text Buffer Reimplementation (2018) (visualstudio.com)
The Visual Studio Code 1.21 release includes a brand new text buffer implementation which is much more performant, both in terms of speed and memory usage.
Real-Time Grass Simulation in the Browser – Over 1M Blades at 60 FPS (techredux.co)
FLOWINGGRASS FIELDS
SDB Scans the Ruby Stack Without the GVL (github.com/yfractal)
Monitoring Node.js: Key Metrics You Should Track (last9.io)
Understand which metrics matter in Node.js applications, why they’re important, and how to track them effectively in production.
Making iText's table rendering faster (itextpdf.com)
Here at Apryse, we occasionally have some free time at the end of our iText development sprints where we're encouraged to use our initiative to "work on whatever" we fancy.
A lost decade chasing distributed architectures for data analytics? (duckdb.org)
TL;DR: We benchmark DuckDB on a 2012 MacBook Pro to decide: did we lose a decade chasing distributed architectures for data analytics?
Pglocks.org (pglocks.org)
Community Stewardship of Faster CPython (python.org)
Pallene: A statically typed ahead-of-time compiled sister language to Lua, with (github.com/pallene-lang)
Pallene is a statically typed and ahead-of-time compiled sister language to Lua, with a focus on performance.
Comparing Parallel Functional Array Languages: Programming and Performance (arxiv.org)
Parallel functional array languages are an emerging class of programming languages that promise to combine low-effort parallel programming with good performance and performance portability.
Ryzen AI Max+ "Strix Halo" Delivers Best Performance on Linux over Windows 11 (phoronix.com)
Now having shown the very strong AMD Ryzen AI Max+ PRO 395 Linux performance for this "Strix Halo" SoC with Radeon 8060S iGPU for its integrated graphics, you may be wondering on the same hardware how this compares to Microsoft Windows 11.
The fastest Postgres inserts (hatchet.run)
At Hatchet, we spent the past half year running hundreds of benchmarks against different Postgres configurations. We set out with a simple question: at what scale does Postgres break?
Repair Time Requirements to Prevent Data Resurrection in Cassandra & Scylla (msun.io)
Cassandra and ScyllaDB share well known issues with race conditions between repair and garbage collection processes that can cause deleted data to resurrect.
Linux Swap Table Code Shows the Potential for Performance Gains (phoronix.com)
With Swap Tables the hope is for lower memory use, higher performance, dynamic swap allocation and growth, greater extensibility, and other improvements over the existing swap code within the Linux kernel.
Comparing Parallel Functional Array Languages: Programming and Performance (arxiv.org)
Parallel functional array languages are an emerging class of programming languages that promise to combine low-effort parallel programming with good performance and performance portability.
Lessons from Mixing Rust and Java: Fast, Safe, and Practical (medium.com)
ZJIT has been merged into Ruby (railsatscale.com)
Following Maxime’s presentation at RubyKaigi 2025, the Ruby developers meeting, and Matz-san’s approval, ZJIT has been merged into Ruby. Hurray! In this post, we will give a high-level overview of the project, which is very early in development.
New high-quality hash measures 71GB/s on M4 (github.com/Nicoshev)
rapidhash is wyhash' official successor, with improved speed, quality and compatibility.
SPITBOL – high performance implementation of SNOBOL for x64 (github.com/spitbol)
Lock-Free Rust: How to Build a Rollercoaster While It's on Fire (yeet.cx)