Bit-twiddling optimizations in Zed's Rope(zed.dev) A couple of weeks ago I came across one of Antonio's PRs, titled "Speed up point translation in the Rope" — now who doesn't stop to take a closer look at a PR with that title?
Optimizers: The Low-Key MVP(duckdb.org) TL;DR: The query optimizer is an important part of any analytical database system as it provides considerable performance improvements compared to hand-optimized queries, even as the state of your data changes.
Speeding up the Rust edit-build-run cycle(davidlattimore.github.io) There are two main aspects to compile times that matter to developers. Cold build times, when building from scratch and warm build times when you’ve already built and you’re rebuilding following an edit. This article focuses on warm build times, which for rapid iteration during development is what generally matters most.
113 points by tempaccount420 6 days ago | 122 comments
SQLite Index Visualization(mrsuh.com) After learning about indexes, I understood their basic structure, but I wanted to dig deeper — to explore the data structure, understand the algorithm, and learn how the index data is stored on disk.
Histogramming Bytes with Positional Popcount(blogspot.com) A while ago, after some back and forth on twitter/X with @corsix, I dropped some implementation of byte histogramming without explaining anything. This post aims to rectify that lack of explanation.
Assembly Optimization Tips by Mark Larson (2004)(masm32.com) The most important thing to remember is to TIME your code. Trying different tricks might or might not speed up your code. So it is very important to time your code to see if you do get a speedup as you try each trick.
Speculations on arenas and custom strings in C++(nullprogram.com) My techniques with arena allocation and strings are oriented around C. I’m always looking for a better way, and lately I’ve been experimenting with building them using C++ features. What are the trade-offs? Are the benefits worth the costs? In this article I lay out my goals, review implementation possibilities, and discuss my findings. Following along will require familiarity with those previous two articles.
45 points by KeyurRamoliya 11 days ago | 6 comments
Texture-Less Text Rendering(poniesandlight.co.uk) Sometimes, all you want is to quickly print some text into a Renderpass. But traditionally, drawing text requires you first to render all possible glyphs of a font into an atlas, to bind this atlas as a texture, and then to render glyphs one by one by drawing triangles on screen, with every triangle picking the correct glyph from the font atlas texture.
Converting ASCII strings to lower case at crazy speeds with AVX-512(lemire.me) AMD Zen 4 and Zen 5, as well as server-side recent Intel processors, support an advanced set of instructions called AVX-512. They are powerful SIMD (Single Instruction, Multiple Data) instructions. Importantly, they allow ‘masked’ operations. That is, you can compute a mask and only do an operation on bytes indicated by the mask. Thus you can easily store only the first k bytes of a block of 64 bytes of memory as one instruction.
Windows Memory Mapped File IO(jeremyong.com) Memory-mapping is my preferred way to do file I/O, on pretty much every platform I write code for (desktop and console). In this post, we’ll start by discussing some of the key advantages of this approach, as well as some disadvantages. Then, we’ll look at some Windows-specific undocumented user-mode functions which help mitigate some of those disadvantages. In particular, we’ll address an important perceived limitation of memory-mapped I/O, namely, file resizing.
WebSockets cost us $1M on our AWS bill(recall.ai) IPC is something that is rarely top-of-mind when it comes to optimising cloud costs. But it turns out that if you IPC 1TB of video per second on AWS it can result in enormous bills when done inefficiently.
Analyzing Go Build Times (2023)(howardjohn.info) Go is often praised for its fast build times.
While they are pretty quick, they are slow enough that I spend a lot of time waiting for them, enough that it prompted me to go down the rabbit hole of thoroughly analyzing them.
This post covers all aspects of what makes Go builds fast or slow.
The carefulness knob(surfingcomplexity.blog) It’s easy to forget that there is a fundamental tradeoff between how careful we can be and how much time it will take us to perform a task.
137 points by todsacerdoti 25 days ago | 41 comments
We shrunk our Javascript monorepo git size(jonathancreamer.com) This isn't click bait. We really did this! We work in a very large Javascript monorepo at Microsoft we colloquially call 1JS. It's large not only in terms of GB, but also in terms of sheer volume of code and contributions. We recently crossed the 1,000 monthly active users mark, about 2,500 packages, and ~20million lines of code! The most recent clone I did of the repo clocked in at an astonishing 178GB.
140 points by yagizdegirmenci 26 days ago | 81 comments
Why those particular integer multiplies?(wordpress.com) The x86 instruction set has a somewhat peculiar set of SIMD integer multiply operations, and Intel’s particular implementation of several of these operations in their headline core designs has certain idiosyncrasies that have been there for literally over 25 years at this point.