Hacker News with Generative AI: Computer Architecture

What Every Hacker Should Know About TLB Invalidation [pdf] (grsecurity.net)
Finite Field Assembly: A Language for Emulating GPUs on CPU (leetarxiv.substack.com)
FF-asm is a programming language founded on the thesis: Math is mostly invented, rarely discovered.
The missing tier for query compilers (scattered-thoughts.net)
Database query engines used to be able to assume that disk latency was so high that the overhead of interpreting the query plan didn't matter. Unfortunately these days a cheap nvme ssd can supply data much faster than a query interpreter can process it.
Customasm – An assembler for custom, user-defined instruction sets (github.com/hlorenzi)
customasm is an assembler that allows you to provide your own custom instruction sets to assemble your source files!
Reverse Engineering the Constants in the Pentium FPU (righto.com)
Intel released the powerful Pentium processor in 1993, establishing a long-running brand of high-performance processors.1 The Pentium includes a floating-point unit that can rapidly compute functions such as sines, cosines, logarithms, and exponentials. But how does the Pentium compute these functions? Earlier Intel chips used binary algorithms called CORDIC, but the Pentium switched to polynomials to approximate these transcendental functions much faster. The polynomials have carefully-optimized coefficients that are stored in a special ROM inside the chip's floating-point unit.
RISC-V is making moves, but it has work to do if it wants to hit the mainstream (theregister.com)
RISC-V has been talked up as a challenger to Arm and x86, offering an open royalty-free architecture that promises flexibility and innovation without licensing costs. But for all the noise, you're more likely to find it buried inside IoT gadgets and obscure embedded systems than powering anything that'll typically grab a headline.
Emulating the FMAdd Instruction, Part 1: 32-bit Floats (drilian.com)
A thing that I had to do at work is write an emulation of the FMAdd (fused multiply-add) instruction for hardware where it wasn't natively supported (specifically I was writing a SIMD implementation, but the idea is the same), and so I thought I'd share a little bit about how FMAdd works, since I've already been posting about how float rounding works.
Rambus DRAM (Rdram) (wikipedia.org)
Rambus DRAM (RDRAM), and its successors Concurrent Rambus DRAM (CRDRAM) and Direct Rambus DRAM (DRDRAM), are types of synchronous dynamic random-access memory (SDRAM) developed by Rambus from the 1990s through to the early 2000s.
Execution units are often pipelined (xoria.org)
In the context of out-of-order microarchitectures, I was under the impression that execution units remain occupied until the µop they’re processing is complete. This is often not the case.
Dividing unsigned 8-bit numbers (0x80.pl)
Division is quite an expansive operation. For instance, latency of the 32-bit division varies between 10 and 15 cycles on the Cannon Lake CPU, and for Zen4 this range is from 9 to 14 cycles. The latency of 32-bit multiplication is 3 or 4 cycles on both CPU models.
SVC16: Simplest Virtual Computer (github.com/JanNeuendorf)
This is the specification for an extremely simple "virtual computer" that can be emulated.
Bit-permuting 16 u32s at once with AVX-512 (blogspot.com)
The basic trick to apply the same bit-permutation to each of the u32s is to view them as matrix of 16 rows by 32 columns, transpose it into a 32 u16s, permute those u16s in the same way that we wanted to permute the bits of the u32s [1], then transpose back to 16 u32s. Easy:
Bit-permuting 16 u32s at once with AVX-512 (blogspot.com)
The basic trick to apply the same bit-permutation to each of the u32s is to view them as matrix of 16 rows by 32 columns, transpose it into a 32 u16s, permute those u16s in the same way that we wanted to permute the bits of the u32s [1], then transpose back to 16 u32s. Easy:
The Chiplet Revolution – Communications of the ACM (cacm.acm.org)
Reducing demands on a single chip by using smaller chips dedicated to specific functions.
Computer Architecture, Fifth Edition: A Quantitative Approach (2011) (dl.acm.org)
The computing world today is in the middle of a revolution: mobile clients and cloud computing have emerged as the dominant paradigms driving programming and hardware innovation today.
Computing with Time: Microarchitectural Weird Machines (cacm.acm.org)
Microarchitectural weird machines (µWM) can be used as a powerful obfuscation engine where computation operates based on events unobservable to conventional anti-obfuscation tools.
Amdahl's Law (wikipedia.org)
In computer architecture, Amdahl's law (or Amdahl's argument[1]) is a formula that shows how much faster a task can be completed when you add more resources to the system.
VSI OpenVMS v9.2-3 for x86-64 (vmssoftware.com)
VSI OpenVMS V9.2-3 for x86-64 is now available as part of our ongoing development of the port to the industry-standard CPU architecture.
M4 chips: E and P cores (eclecticlight.co)
In the two previous articles (links at the end), I explored some of the features and properties of Performance (P) cores in Apple’s latest M4 chips. This article looks at their Efficiency (E) cores by comparison.
The Soul of an Old Machine: Revisiting the Timeless von Neumann Architecture (ankush.dev)
This conversation between two protagonists in HCF describes two types of people in computing:
RISC-V Vector Extension overview (0x80.pl)
The goal of this text is to provide an overview of RISC-V Vector extension (RVV), and compare — when applicable — with widespread SIMD vector instruction sets: SSE, AVX, AVX-512, ARM Neon and SVE.
RISC-V Vector Extension overview (0x80.pl)
The goal of this text is to provide an overview of RISC-V Vector extension (RVV), and compare — when applicable — with widespread SIMD vector instruction sets: SSE, AVX, AVX-512, ARM Neon and SVE.
Ask HN: Are There Any Fully-Documented Computers (ycombinator.com)
I am writing an operating system as a side project. I am still in the design phase of the project, and I’ve been brushing up on my low-level programming skills. Recently there was a discussion here about the Asahi Linux team’s efforts reverse-engineering ARM Macs, and it had me thinking: are there any fully-documented modern computers? I know that RISC-V and POWER are open source.
Broadwell's EDRAM: VCache Before VCache Was Cool (chipsandcheese.com)
Up to Haswell’s 2013 release, Intel’s “tick-tock” strategy seemed unstoppable. Broadwell sought to continue Intel’s juggernaut by porting Haswell to a new 14 nm node.
Cyrix 6x86 (wikipedia.org)
The Cyrix 6x86 is a line of sixth-generation, 32-bit x86 microprocessors designed and released by Cyrix in 1995.
Microprocessor's Romance with Integers (freecodecamp.org)
One of the first things we learn about computers is that they only understand 0s and 1s, or bits.
Why those particular integer multiplies? (wordpress.com)
The x86 instruction set has a somewhat peculiar set of SIMD integer multiply operations, and Intel’s particular implementation of several of these operations in their headline core designs has certain idiosyncrasies that have been there for literally over 25 years at this point.
Kronos: Soviet Processor Family for High-Level Languages (2006) [pdf] (hal.science)
Open-Source, Chiplet-Compatible RISC-V Controller (semiengineering.com)
A new technical paper titled “ControlPULPlet: A Flexible Real-time Multi-core RISC-V Controller for 2.5D Systems-in-package” was published by researchers at ETH Zurich and University of Bologna.
Solving the Mystery of ARM7TDMI Multiply Carry Flag (bmchtech.github.io)
The Gameboy Advance has a pretty neat CPU - the ARM7TDMI. This CPU is quite complicated - it allows the program counter to be used a general purpose register, implying it can be used as the output to any data processing instruction. That’s like allowing a drunk driver to change their tires while going 30 over the speed limit near a school. It’s a unique feature that can lead to very funny instructions.