Hacker News with Generative AI: Optimization

Flattening ASTs with Arena Allocation (cs.cornell.edu)
Arenas, a.k.a. regions, are everywhere in modern language implementations.
Bit-twiddling optimizations in Zed's Rope (zed.dev)
A couple of weeks ago I came across one of Antonio's PRs, titled "Speed up point translation in the Rope" — now who doesn't stop to take a closer look at a PR with that title?
Representing filesystems in databases efficiently with Hierarchical Ordering (danthegoodman.substack.com)
The way the we interface with files in a file system and rows in a database are fundamentally not the same.
Optimizers: The Low-Key MVP (duckdb.org)
TL;DR: The query optimizer is an important part of any analytical database system as it provides considerable performance improvements compared to hand-optimized queries, even as the state of your data changes.
Reducing the cost of a single Google Cloud Dataflow Pipeline by Over 60% (allegro.tech)
In this article we’ll present methods for efficiently optimizing physical resources and fine-tuning the configuration of a Google Cloud Platform (GCP) Dataflow pipeline in order to achieve cost reductions.
Speeding up the Rust edit-build-run cycle (davidlattimore.github.io)
There are two main aspects to compile times that matter to developers. Cold build times, when building from scratch and warm build times when you’ve already built and you’re rebuilding following an edit. This article focuses on warm build times, which for rapid iteration during development is what generally matters most.
SQLite Index Visualization (mrsuh.com)
After learning about indexes, I understood their basic structure, but I wanted to dig deeper — to explore the data structure, understand the algorithm, and learn how the index data is stored on disk.
Histogramming Bytes with Positional Popcount (blogspot.com)
A while ago, after some back and forth on twitter/X with @corsix, I dropped some implementation of byte histogramming without explaining anything. This post aims to rectify that lack of explanation.
TinyCC Memory and Bounds Checks (bellard.org)
TinyCC (aka TCC) is a small but hyper fast C compiler.
Intel Spots 3888.9% Performance Improvement in Linux Kernel from 1 Line of Code (phoronix.com)
Intel's Linux kernel test robot has reported a 3888.9% performance improvement in the mainline Linux kernel as of this past week.
Assembly Optimization Tips by Mark Larson (2004) (masm32.com)
The most important thing to remember is to TIME your code. Trying different tricks might or might not speed up your code. So it is very important to time your code to see if you do get a speedup as you try each trick.
Evolving a NoSQL Database Schema (karmanivero.us)
In a NoSQL environment, Entity Manager organizes the physical distribution of data to support efficient query operations.
Speculations on arenas and custom strings in C++ (nullprogram.com)
My techniques with arena allocation and strings are oriented around C. I’m always looking for a better way, and lately I’ve been experimenting with building them using C++ features. What are the trade-offs? Are the benefits worth the costs? In this article I lay out my goals, review implementation possibilities, and discuss my findings. Following along will require familiarity with those previous two articles.
Texture-Less Text Rendering (poniesandlight.co.uk)
Sometimes, all you want is to quickly print some text into a Renderpass. But traditionally, drawing text requires you first to render all possible glyphs of a font into an atlas, to bind this atlas as a texture, and then to render glyphs one by one by drawing triangles on screen, with every triangle picking the correct glyph from the font atlas texture.
Converting ASCII strings to lower case at crazy speeds with AVX-512 (lemire.me)
AMD Zen 4 and Zen 5, as well as server-side recent Intel processors, support an advanced set of instructions called AVX-512. They are powerful SIMD (Single Instruction, Multiple Data) instructions. Importantly, they allow ‘masked’ operations. That is, you can compute a mask and only do an operation on bytes indicated by the mask. Thus you can easily store only the first k bytes of a block of 64 bytes of memory as one instruction.
Linus Torvalds lands a 2.6% performance improvement with minor Linux kernel patc (phoronix.com)
Linus Torvalds merged a patch on Wednesday that he authored that with reworking a few lines of code is able to score a 2.6% improvement within Intel's well-exercise "will it scale" per-thread-ops benchmark test case.
Windows Memory Mapped File IO (jeremyong.com)
Memory-mapping is my preferred way to do file I/O, on pretty much every platform I write code for (desktop and console). In this post, we’ll start by discussing some of the key advantages of this approach, as well as some disadvantages. Then, we’ll look at some Windows-specific undocumented user-mode functions which help mitigate some of those disadvantages. In particular, we’ll address an important perceived limitation of memory-mapped I/O, namely, file resizing.
WebSockets cost us $1M on our AWS bill (recall.ai)
IPC is something that is rarely top-of-mind when it comes to optimising cloud costs. But it turns out that if you IPC 1TB of video per second on AWS it can result in enormous bills when done inefficiently.
FFmpeg: A 94x speed improvement demonstrated using handwritten assembly (twitter.com)
Static Basic Block Versioning (dagstuhl.de)
Basic Block Versioning (BBV) is a compilation technique for optimizing program execution.
Analyzing Go Build Times (2023) (howardjohn.info)
Go is often praised for its fast build times. While they are pretty quick, they are slow enough that I spend a lot of time waiting for them, enough that it prompted me to go down the rabbit hole of thoroughly analyzing them. This post covers all aspects of what makes Go builds fast or slow.
Demystifying the regular expression that checks if a number is prime (2016) (illya.sh)
A while back I was researching the most efficient way to check if a number is prime. This lead me to find the following piece of code:
The carefulness knob (surfingcomplexity.blog)
It’s easy to forget that there is a fundamental tradeoff between how careful we can be and how much time it will take us to perform a task.
Tip of the day #2: A safer arena allocator (gaultier.github.io)
The most transformative action you can do to dramatically improve your code in a programming language where you are in control of the memory is: to use arenas.
Using less memory to look up IP addresses in Mess With DNS (jvns.ca)
I’ve been having problems for the last 3 years or so where Mess With DNS periodically runs out of memory and gets OOM killed.
We shrunk our Javascript monorepo git size (jonathancreamer.com)
This isn't click bait. We really did this! We work in a very large Javascript monorepo at Microsoft we colloquially call 1JS. It's large not only in terms of GB, but also in terms of sheer volume of code and contributions. We recently crossed the 1,000 monthly active users mark, about 2,500 packages, and ~20million lines of code! The most recent clone I did of the repo clocked in at an astonishing 178GB.
Goodhart’s law isn’t as useful as you might think (2023) (commoncog.com)
Goodhart’s Law is a famous adage that goes “when a measure becomes a target, it ceases to be a good measure.”
Why those particular integer multiplies? (wordpress.com)
The x86 instruction set has a somewhat peculiar set of SIMD integer multiply operations, and Intel’s particular implementation of several of these operations in their headline core designs has certain idiosyncrasies that have been there for literally over 25 years at this point.
Universal optimality of Dijkstra via beyond-worst-case heaps (arxiv.org)
This paper proves that Dijkstra's shortest-path algorithm is universally optimal in both its running time and number of comparisons when combined with a sufficiently efficient heap data structure.
Minotaur: A SIMD-Oriented Synthesizing Superoptimizer [pdf] (cs.utah.edu)