Hacker News with Generative AI: Computer Architecture

32 bits that changed microprocessor design (ieee.org)
In the late 1970s, a time when 8-bit processors were state of the art and CMOS was the underdog of semiconductor technology, engineers at AT&T’s Bell Labs took a bold leap into the future.
Intel's Lunar Lake intricacies revealed in new high-resolution die shots (tomshardware.com)
Reverse engineering the 386 processor's prefetch queue circuitry (righto.com)
In 1985, Intel introduced the groundbreaking 386 processor, the first 32-bit processor in the x86 architecture.
Intel: Winning and Losing (abortretry.fail)
This article continues a lengthy series. You may be interested in the start of silicon valley, Fairchild, the founding of Intel, the start of the x86 architecture, Intel’s pivot to become a processor company, the i960 and i486, the Intel Inside campaign, the FDIV bug and the Pentium Pro, MMX, the Pentium II, and the Pentium III, Pentium M, and the launch of Intel Core.
Arm's Bifrost Architecture and the Mali-G52 (chipsandcheese.com)
Arm (the company) is best known for its Cortex CPU line. But Arm today has expanded to offer a variety of licensable IP blocks, ranging from interconnects to IOMMUs to GPUs.
Bootstrapping Lisp in a Boot Sector (github.com/jart)
sectorlisp is a 512-byte implementation of LISP that's able to bootstrap John McCarthy's meta-circular evaluator on bare metal.
The complicated circuitry for the 386 processor's registers (righto.com)
The groundbreaking Intel 386 processor (1985) was the first 32-bit processor in the x86 architecture.
Computer Architects Can't Find the Average (dgsq.net)
Computer architects can’t agree on a way to find the average.
Fundamental flaws of SIMD ISAs (2021) (bitsnbites.eu)
According to Flynn’s taxonomy SIMD refers to a computer architecture that can process multiple data streams with a single instruction (i.e. “Single Instruction stream, Multiple Data streams”).
The Dauug House - Dauug|36 minicomputer documentation (cs.wright.edu)
Dauug|36 is a 36-bit architecture for owner-built CPUs, controllers, and minicomputers. Only maker-scale assembly tools are necessary, so this architecture can be implemented anywhere on the planet without a semiconductor foundry. All you need is a bare circuit board, about 300 components, and some soldering practice.
Long-term L1 execution layer proposal: replace the EVM with RISC-V (ethereum-magicians.org)
How MOS 6502 Illegal Opcodes Work (2008) (pagetable.com)
The original NMOS version of the MOS 6502, used in computers like the Commodore 64, the Apple II and the Nintendo Entertainment System (NES), is well-known for its illegal opcodes: Out of 256 possible opcodes, 151 are defined by the architecture, but many of the remaining 105 undefined opcodes do useful things.
Efficient Architecture for RISC-V Vector Memory Access (arxiv.org)
Vector processors frequently suffer from inefficient memory accesses, particularly for strided and segment patterns.
INT 10h (int10h.org)
Learning Assembly for Fun, Performance and Profit (thechipletter.substack.com)
Low-level languages have been in the news recently. Use of Nvidia’s ptx has been revealed as part of DeepSeek’s ‘secret sauce’. And there is still plenty of interest in learning assembly language. A recent Substack post advocating learning assembly language for the venerable, but well loved, 6502 as a first step garnered over 240 ‘upvotes’ and more than 290 comments on Hacker News.
PDP-11/Hack de luxe (vcfed.org)
Not really satisfied with my breadboard attempt to build another PDP-11/Hack and also because I wanted to investigate a little bit more how the DCJ11 really works I decided to make another PDP-11/Hack. So I designed a Eurocard DCJ11 based Singleboard Computer. The PCBs arrived today and so nothing better then set them to use. But this time the PDP-11/Hack comes with an expansion slot.
Banked Memories for Soft SIMT Processors (arxiv.org)
Recent advances in soft GPGPU architectures have shown that a small (<10K LUT), high performance (770 MHz) processor is possible in modern FPGAs.
Undocumented 8086 instructions, explained by the microcode (righto.com)
What happens if you give the Intel 8086 processor an instruction that doesn't exist?
Notes on the Pentium's microcode circuitry (righto.com)
Most people think of machine instructions as the fundamental steps that a computer performs. However, many processors have another layer of software underneath: microcode. With microcode, instead of building the processor's control circuitry from complex logic gates, the control logic is implemented with code known as microcode, stored in the microcode ROM. To execute a machine instruction, the computer internally executes several simpler micro-instructions, specified by the microcode.
RISC architecture did change everything (wired.com)
“RISC architecture is gonna change everything.” Those absurdly geeky, incredibly prophetic words were spoken 30 years ago. Today, they’re somehow truer than ever.
An Interview with Zen Chief Architect Mike Clark (computerenhance.com)
Zen is one of the most important microarchitectures in the history of the x86 ecosystem. Not only is it the reigning champion in many x64 benchmarks, but it is also the architecture that enabled AMD’s dramatic rise in CPU marketshare over the past eight years: from 10% when the first Zen processor was launched, to 25% at the introduction of Zen 5.
RISC-V Processor Design – Lec 6 – EXU and Co-Simulation (ycombinator.com)
In this lecture, we stitch together a custom Instruction Set Simulator I created with the RISC-V CPU (now with the execution stage) and see the first instructions flowing in the pipeline.
Bypassing the Branch Predictor (nicula.xyz)
A couple of days ago I was thinking about what you can do when the branch predictor is effectively working against you, and thus pessimizing your program instead of optimizing it.
An Active Message Inspired Reconfigurable Architecture for Irregular Workloads (arxiv.org)
Modern reconfigurable architectures are increasingly favored for resource-constrained edge devices as they balance high performance, energy efficiency, and programmability well.
C Is Not a Low-level Language: Your computer is not a fast PDP-11 (2018) (dl.acm.org)
In the wake of the recent Meltdown and Spectre vulnerabilities, it’s worth spending some time looking at root causes.
The Pentium contains a complicated circuit to multiply by three (righto.com)
In 1993, Intel released the high-performance Pentium processor, the start of the long-running Pentium line.
Comparing Two Verilog CPU Implementations Using EBMC (philipzucker.com)
About a year ago my friends and I built a 4bit cpu out of a kit from aliexpress. https://www.philipzucker.com/td4-4bit-cpu/ It’s a lot of fun. I also think the system is so simple that is is kind of a nice target for tinkering around with formal methods.
SVDQuant+NVFP4: 4× Smaller, 3× Faster FLUX with 16-bit Quality on Blackwell GPUs (hanlab.mit.edu)
With Moore's law slowing down, hardware vendors are shifting toward low-precision inference. NVIDIA's latest Blackwell architecture introduces a new 4-bit floating point format (NVFP4), improving upon the previous MXFP4 format. NVFP4 features more precise scaling factors and a smaller microscaling group size (16 v.s. 32), enabling it to maintain 16-bit model accuracy even at 4-bit precision while delivering 4× higher peak performance.
Explaining my fast 6502 code generator (2023) (pubby.games)
To learn how optimizing compilers are made, I built one targeting the 6502 architecture. In a bizarre twist, my compiler generates faster code than GCC, LLVM, and every other compiler I compared it to.
MESI Cache Coherency Protocol Visualization (scss.tcd.ie)
No canvas support