Hacker News with Generative AI: Data Compression

NNCP: Lossless Data Compression with Neural Networks (bellard.org)
NNCP is an experiment to build a practical lossless data compressor with neural networks.
Lossless Log Aggregation – Reduce Log Volume by 99% Without Dropping Data (kevinslin.com)
On a rainy March day in 1538, Thomas Howard, the Duke of Norfolk, found himself confined within the cold stone walls of his grand estate. His fingers reluctantly penned a letter to sell his cherished lands to settle longstanding debts.
Huffman Coding (wikipedia.org)
In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression.
The $5000 Compression Challenge (2001) (patrickcraig.co.uk)
Mike Goldman makes another offer: I will attach a prize of $5,000 to anyone who successfully meets this challenge. First, the contestant will tell me HOW LONG of a data file to generate. Second, I will generate the data file, and send it to the contestant.
Binary vector embeddings are so cool (emschwartz.me)
Vector embeddings by themselves are pretty neat. Binary quantized vector embeddings are extra impressive. In short, they can retain 95+% retrieval accuracy with 32x compression and ~25x retrieval speedup. Let's get into how this works and why it's so crazy.
Consistently faster and smaller compressed bitmaps with Roaring (2016) (arxiv.org)
Compressed bitmap indexes are used in databases and search engines.
Show HN: Vortex – a high-performance columnar file format (github.com/spiraldb)
Vortex is a toolkit for working with compressed Apache Arrow arrays in-memory, on-disk, and over-the-wire.
Parallel PNG Proposal (2021) (github.com/DavidBuchanan314)
This is a proof-of-concept implementation of a parallel-decodable PNG format, based on ideas from https://github.com/brion/mtpng
Large Text Compression Benchmark (mattmahoney.net)
This competition ranks lossless data compression programs by the compressed size (including the size of the decompression program) of the first 109 bytes of the XML text dump of the English version of Wikipedia on Mar. 3, 2006.
Compression Dictionary Transport (ietf.org)
This document specifies a mechanism for dictionary-based compression in the Hypertext Transfer Protocol (HTTP). By utilizing this technique, clients and servers can reduce the size of transmitted data, leading to improved performance and reduced bandwidth consumption. This document extends existing HTTP compression methods and provides guidelines for the delivery and use of compression dictionaries within the HTTP protocol.
Faster Inverse BWT (2021) (blogspot.com)
Compressing data with sample points and polynomial interpolation (johndcook.com)
Popular introduction to Huffman, arithmetic, ANS coding [video] (youtube.com)
Show HN: Python Compression Suite for Pandas DataFrames, CSV and Excel Files (github.com/MNoorFawi)
Building a data compression utility in Haskell using Huffman codes (lazamar.github.io)
Compressing graphs and indexes with recursive graph bisection (2016) (arxiv.org)
Data Compression Explained (2011) (mattmahoney.net)
Show HN: Kanzi, fast lossless data compression (github.com/flanglet)
LANL Achieves Yottabyte-Scale Data Compression in Neutron Transport Equations (hpcwire.com)
LZW and GIF explained (eecis.udel.edu)