Getting started with bare-metal assembly(johv.dk) Seeing a program you wrote running directly on the bare metal is deeply satisfying to anyone who enjoys writing software. And it turns out that creating such a program from scratch is easier than it sounds. The hardest part is figuring out where to start. In this tutorial, I'll show you how to write, build and run the classic "Hello, World!" in pure assembly.
Minor 387 Documentation Mystery(os2museum.com) So here I am, writing a bit of test code to figure out the behavior of x87 FPUs with regard to saving and loading the FPU state (FSTENV/FLDENV and FSAVE/FRSTOR instructions in different modes and formats).
Agner Fog's Software Optimization Resources(agner.org) This series of five manuals describes everything you need to know about optimizing code for x86 and x86-64 family microprocessors, including optimization advices for C++ and assembly language, details about the microarchitecture and instruction timings of most Intel, AMD and VIA processors, and details about different compilers and calling conventions.
Write Your Own Virtual Machine (2022)(jmeiners.com) In this tutorial, I will teach you how to write your own virtual machine (VM) that can run assembly language programs, such as my friend’s 2048 or my Roguelike.
114 points by thunderbong 61 days ago | 43 comments
Rules to avoid common extended inline assembly mistakes(nullprogram.com) GCC and Clang inline assembly is an interface between high and low level programming languages. It is subtle and treacherous. Many are ensnared in its traps, usually unknowingly. As such, the asm keyword is essentially the unsafe keyword of C and C++. Nearly every inline assembly tutorial, including the awful ibilio page at the top of search engines for decades, propagate fundamental, serious mistakes, and most examples are incorrect. The dangerous part is that the examples usually produce the expected results!
Bit-permuting 16 u32s at once with AVX-512(blogspot.com) The basic trick to apply the same bit-permutation to each of the u32s is to view them as matrix of 16 rows by 32 columns, transpose it into a 32 u16s, permute those u16s in the same way that we wanted to permute the bits of the u32s [1], then transpose back to 16 u32s. Easy:
BareMetal OS(github.com/ReturnInfinity) BareMetal OS is an exokernel-based operating system crafted entirely in x86-64 assembly and is designed to provide unparalleled levels of flexibility and efficiency.
Assembly Optimization Tips by Mark Larson (2004)(masm32.com) The most important thing to remember is to TIME your code. Trying different tricks might or might not speed up your code. So it is very important to time your code to see if you do get a speedup as you try each trick.
12 points by thunderbong 109 days ago | 4 comments
Why those particular integer multiplies?(wordpress.com) The x86 instruction set has a somewhat peculiar set of SIMD integer multiply operations, and Intel’s particular implementation of several of these operations in their headline core designs has certain idiosyncrasies that have been there for literally over 25 years at this point.
tolower() small string performance(dotat.at) When processing very small string fragments, what is the cross-over point between scalar code and AVX-512 with masked loads and stores?
Implementing and Detecting Anti-Debugging with Fork()(netlify.app) As I continue my journey into reverse engineering macOS and iOS applications, I’m currently focusing on ARM assembly. I’m also working on a follow-up to my previous post on reverse engineering. In this next post, I’ll be tackling the challenge of cracking a macOS app’s license, and since I’ll be dealing with a release build, assembly language will be essential.
Writing a Lisp compiler (Lisp to assembly) from scratch in JavaScript (2018)(eatonphil.com) In this post we'll write a simple compiler in Javascript (on Node) without any third-party libraries. Our goal is to take an input program like (+ 1 (+ 2 3)) and produce an output assembly program that does these operations to produce 6 as the exit code. The resulting compiler can be found here.
SectorC: A C Compiler in 512 bytes (2023)(xorvoid.com) SectorC (github) is a C compiler written in x86-16 assembly that fits within the 512 byte boot sector of an x86 machine. It supports a subset of C that is large enough to write real and interesting programs. It is quite likely the smallest C compiler ever written.