Hacker News with Generative AI: Optimization

Good Docker Files (gooddockerfiles.com)
Not sure about your Dockerfile? Confused? Overwhelmed? Get expert guidance for production-ready containers that are faster, smaller and more secure.
Ε, a Nuisance No More (zna.do)
For a while now I have been advocating for tuning ε in various parts of the modern deep learning stack, and in this post I’ll explain why.
Optimizing Ruby's JSON, Part 7 (byroot.github.io)
In the previous post, we started covering some parser optimizations. There’s just a handful more to cover until we reached what’s the state of the currently released version of ruby/json.
Exploring ways to mipmap alpha-tested textures (lisyarus.github.io)
In my village building game I'm using alpha-tested transparency for foliage — trees, bushes, grass, etc. This works simply by discarding (i.e. not drawing) any pixels having an alpha value of less than 0.5, and keeping the others. This is much cheaper than proper transparency, which requires sorting objects or pixels by distance to camera.
Using black magic to make a fast circular buffer (2017) (calho.st)
Yesterday, I took a glance at the Wikipedia page for the circular buffer and was intrigued by an alleged optimization technique that I was not familiar with:
A Deep Dive into JVM Start Up (inside.java)
Make sure to check the video description.
Optimizing uint64_t Digit Counting: A Method that Beats Lemire's by up to 143% (github.com/RealTimeChris)
Minimum bipartite matching via Riemann optimization (2023) (ocramz.github.io)
Some time ago I ran into two separate instances of the same combinatorial optimization problem in the span of a few days, and decided to read up a little on the fundamentals. The two applications were object tracking in videos, and peak alignment in chromatography data.
Size Optimization Tricks (2022) (justine.lol)
This blog post will cover some of the tricks I've used in the past to make c / c++ / python binaries smaller using x86 assembly.
Optimizing Ruby's JSON, Part 5 (byroot.github.io)
In the previous post, we showed how we eliminated two malloc/free pairs of calls when generating small JSON documents, and how that put us ahead of Oj when reusing the JSON::State object.
Learning to Write Less Slow C, C++, and Assembly Code (github.com/ashvardanian)
Learning how to write "Less Slow" code in C++20, from numerical micro-kernels and SIMD to coroutines, ranges, and polymorphic state machines
Performance (On an HP48GX Graphing Calculator) (blogspot.com)
I'm trying to code a simple action game for a calculator. Since the calculator is horribly slow in doing anything, I have to optimize everything. I will describe summaries of my thoughts, attempts and observations in this blog. Comments and criticism are welcome.
Beyond Gradient Averaging in Parallel Optimization (arxiv.org)
We introduce Gradient Agreement Filtering (GAF) to improve on gradient averaging in distributed deep learning optimization.
The smallest Hello World program (lohr.dev)
So, initially, I just wanted to see what the smallest binary size for a ‘Hello World’ program written in Rust would be. Why? Out of curiosity - it's probably just a simple compiler flag anyway, right? Well, turns out there are some that help, but you need a lot more work to get a truly minimal binary. Much of it is not even related to Rust!
CuClarabel: GPU Acceleration for a Conic Optimization Solver (arxiv.org)
We present the GPU implementation of the general-purpose interior-point solver Clarabel for convex optimization problems with conic constraints.
Reciprocal Approximation with 1 Subtraction (ycombinator.com)
Today's find: You can get a floating-point approximation of 1/x that's accurate to 3 bits with a single integer subtraction instruction.
Unintuitive optimization for performing paths union (minus-ze.ro)
I once had the task of processing whatever vector graphics the user would input into the software, and compute the contour of all shapes from that. That contour would then be sent to a printer. Most of the time users would upload vector art they made in something like Illustrator, and that art could get very complex. In this blog post I’ll take one real SVG I had to work with which consists of about 1981 paths.
SBCL "user-guided optimization" notice (github.com/sbcl)
Take a big shortcut when compiling (lambda () nil)
SQL query optimization: a comprehensive developer's guide (aiven.io)
An SQL optimization guide for developers. With best practices, warnings, and pro tips to speed up your SQL query optimization.
O3 Evals (lifearchitect.ai)
How bloom filters made SQLite 10x faster (avi.im)
This is the fascinating story of how researchers used Bloom filters cleverly to make SQLite 10x faster for analytical queries.
JEP 483: Ahead-of-Time Class Loading and Linking (openjdk.org)
Improve startup time by making the classes of an application instantly available, in a loaded and linked state, when the HotSpot Java Virtual Machine starts. Achieve this by monitoring the application during one run and storing the loaded and linked forms of all classes in a cache for use in subsequent runs. Lay a foundation for future improvements to both startup and warmup time.
Dividing unsigned 8-bit numbers (0x80.pl)
Division is quite an expansive operation. For instance, latency of the 32-bit division varies between 10 and 15 cycles on the Cannon Lake CPU, and for Zen4 this range is from 9 to 14 cycles. The latency of 32-bit multiplication is 3 or 4 cycles on both CPU models.
Show HN: K8s Cleaner – Roomba for Kubernetes (projectsveltos.io)
Designed for Kubernetes administrators, K8s Cleaner efficiently identifies and removes unused resources to enhance cluster performance and reduce operational costs.
If constexpr requires requires { requires } in C++ (think-cell.com)
Probably the two most useful features added to C++20 are requires and requires. They make it so much easier to control overload resolution, and when combined with if constexpr in C++17, they allow basic reflection-based optimizations in templates.
No More Adam: Learning Rate Scaling at Initialization Is All You Need (arxiv.org)
In this work, we question the necessity of adaptive gradient methods for training deep neural networks.
Tell HN: Deduplicating a 10.4 TiB game preservation archive (WIP) (ycombinator.com)
Waste Makes Haste? (anukari.com)
I've been really digging into MacOS optimizations over the last few days. Being ALU-bound is quite a pain in the butt, because unlike being memory-bound, there are a lot fewer big changes I can make to speed things up. Mostly I've been working on instruction-level optimizations, none of which have had a big impact. I've gotten to where I don't see anything else that is really worth optimizing at this level.
A journey of optimization of cloud-based geospatial data processing (terrafloww.com)
The rapid growth of Earth observation data in cloud storage, which will continue to grow exponentially, powered by falling rocket launch prices by companies like SpaceX, has pushed us to think of how we access and analyze satellite imagery.
New LLM optimization technique slashes memory costs (venturebeat.com)
Researchers at the Tokyo-based startup Sakana AI have developed a new technique that enables language models to use memory more efficiently, helping enterprises cut the costs of building applications on top of large language models (LLMs) and other Transformer-based models.