Hacker News with Generative AI: Encoding

Branchless UTF-8 Encoding (cceckman.com)
Can you encode UTF-8 without branches?
The WTF-8 Encoding (simonsapin.github.io)
WTF-8 (Wobbly Transformation Format − 8-bit) is a superset of UTF-8 that encodes surrogate code points if they are not in a pair. It represents, in a way compatible with UTF-8, text from systems such as JavaScript and Windows that use UTF-16 internally but don’t enforce the well-formedness invariant that surrogates must be paired.
Charset="WTF-8" (xn--stpie-k0a81a.com)
This value is not valid
Windows dynamic linking depends on the active code page (nullprogram.com)
Windows paths have been WTF-16-encoded for decades, but module names in the import tables of Portable Executable are octets. If a name contains values beyond ASCII — technically out of spec — then the dynamic linker must somehow decode those octets into Unicode in order to construct a lookup path. There are multiple ways this could be done, and the most obvious is the process’s active code page (ACP), which is exactly what happens.
In MySQL, never use "UTF8". Use "utf8mb4" (medium.com)
Today’s bug: I tried to store a UTF-8 string in a MariaDB “utf8”-encoded database, and Rails raised a bizarre error:
Unicode shenanigans: Martine écrit en UTF-8 (poisson.chat)
On my feed aggregator haskell.pl-a.net, I occasionally saw posts with broken titles like this (from ezyang’s blog):
FFmpeg 7.1 release: a tons of codecs (jbkempf.com)
FFmpeg 7.1 is released today: a major release with numerous features that nevertheless maintains API compatibility with 7.0. it features a full native VVC decoder, a new MV-HEVC decoder, a new LC-EVC decoder, a new xHE-AAC decoder, it finishes the IAMF decoder and it also adds Vulkan hardware encoding, VVC encoding, ARM64 and RISC-V optimizations and other hardware accelerations.
The absolute minimum you must know about Unicode and encodings (joelonsoftware.com)
UTF-8 Everywhere (utf8everywhere.org)
How to chop off bytes of an UTF-8 string to fit into a small slot and look nice (plix.at)
llIlI.lI: a URL shortener using only I/l to encode domains (llili.li)
Decoding UTF8 with parallel extract (nrk.neocities.org)
You can't just assume UTF-8 (csvbase.com)