Hacker News with Generative AI: Data Compression

Load-Store Conflicts (zeux.io)
meshoptimizer implements several geometry compression algorithms that are designed to take advantage of redundancies common in mesh data and decompress quickly - targeting many gigabytes per second in decoding throughput.

Computer Graphics, Optimization, Game Development, Data Compression

117 points by ashvardanian 77 days ago | 5 comments

Product Quantization: Compressing high-dimensional vectors by 97% (pinecone.io)

Machine Learning, Data Compression, Vector Databases

10 points by jxmorris12 89 days ago | 0 comments

Data Compression Nerds Hate This One Trick [video] (media.ccc.de)
How one guy in his bedroom (kind of) beat all of PNG's combined multi-decade effort in one year, and why that's strange.

Data Compression, Computer Science, Video, Technology, Hacks

108 points by doener 90 days ago | 41 comments

Show HN: BSE – Semantic Zip Engine for Text, Image and Audio (ycombinator.com)
We built BSE (Bramble Semantic Engine) – a semantic compressor that transforms natural inputs into low-dimensional structured representations.

Machine Learning, Data Compression, Semantic Analysis

11 points by bramblestudio 99 days ago | 8 comments

Compress Better, Compute Bigger (ironarray.io)
Have you ever experienced the frustration of not being able to analyze a dataset because it's too large to fit in memory? Or perhaps you've encountered the memory wall, where computation is hindered by slow memory access? These are common challenges in data science and high-performance computing. The developers of Blosc and Blosc2 have consistently focused on achieving compression and decompression speeds that approach or even exceed memory bandwidth limits.

Data Compression, Data Science

20 points by todsacerdoti 115 days ago | 0 comments

Burrows–Wheeler Transform (wikipedia.org)
The Burrows–Wheeler transform (BWT, also called block-sorting compression) rearranges a character string into runs of similar characters.

Data Compression, Algorithms, Computer Science

19 points by tosh 169 days ago | 2 comments

Reduce your LLM agent costs by 90% with structure-preserving HTML compression (github.com/emmetify)
Cut your LLM processing costs by up to 90% by transforming verbose HTML into efficient Emmet notation, without losing structural integrity.

Cost Optimization, HTML, Web Development, Data Compression

13 points by maledorak 174 days ago | 10 comments

Lossless Compression of Vector IDs for Approximate Nearest Neighbor Search (arxiv.org)
Approximate nearest neighbor search for vectors relies on indexes that are most often accessed from RAM. Therefore, storage is the factor limiting the size of the database that can be served from a machine.

Machine Learning, Data Compression

151 points by fzliu 179 days ago | 6 comments

NNCP: Lossless Data Compression with Neural Networks (bellard.org)
NNCP is an experiment to build a practical lossless data compressor with neural networks.

Neural Networks, Data Compression, Machine Learning

10 points by ksec 202 days ago | 0 comments

Lossless Log Aggregation – Reduce Log Volume by 99% Without Dropping Data (kevinslin.com)
On a rainy March day in 1538, Thomas Howard, the Duke of Norfolk, found himself confined within the cold stone walls of his grand estate. His fingers reluctantly penned a letter to sell his cherished lands to settle longstanding debts.

Log Aggregation, Data Compression, Software, Technology

140 points by benshumaker 227 days ago | 84 comments

Huffman Coding (wikipedia.org)
In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression.

Computer Science, Information Theory, Data Compression

5 points by tosh 236 days ago | 3 comments

The $5000 Compression Challenge (2001) (patrickcraig.co.uk)
Mike Goldman makes another offer: I will attach a prize of $5,000 to anyone who successfully meets this challenge. First, the contestant will tell me HOW LONG of a data file to generate. Second, I will generate the data file, and send it to the contestant.

Data Compression, Challenges, Prizes

181 points by ekiauhce 239 days ago | 185 comments

Binary vector embeddings are so cool (emschwartz.me)
Vector embeddings by themselves are pretty neat. Binary quantized vector embeddings are extra impressive. In short, they can retain 95+% retrieval accuracy with 32x compression and ~25x retrieval speedup. Let's get into how this works and why it's so crazy.

Machine Learning, Computer Science, Data Compression

87 points by emschwartz 252 days ago | 11 comments

Consistently faster and smaller compressed bitmaps with Roaring (2016) (arxiv.org)
Compressed bitmap indexes are used in databases and search engines.

Databases, Search Engines, Data Compression, Performance Optimization, Algorithms

54 points by hambandit 261 days ago | 8 comments

Show HN: Vortex – a high-performance columnar file format (github.com/spiraldb)
Vortex is a toolkit for working with compressed Apache Arrow arrays in-memory, on-disk, and over-the-wire.

Databases, Data Storage, Performance, Data Compression, Open Source

249 points by gatesn 279 days ago | 61 comments

Parallel PNG Proposal (2021) (github.com/DavidBuchanan314)
This is a proof-of-concept implementation of a parallel-decodable PNG format, based on ideas from https://github.com/brion/mtpng

Image Processing, Computer Vision, Data Compression, Open Source

43 points by networked 304 days ago | 3 comments

Large Text Compression Benchmark (mattmahoney.net)
This competition ranks lossless data compression programs by the compressed size (including the size of the decompression program) of the first 109 bytes of the XML text dump of the English version of Wikipedia on Mar. 3, 2006.

Data Compression, Benchmarking, Computer Science, Wikipedia

97 points by redeux 306 days ago | 55 comments

Compression Dictionary Transport (ietf.org)
This document specifies a mechanism for dictionary-based compression in the Hypertext Transfer Protocol (HTTP). By utilizing this technique, clients and servers can reduce the size of transmitted data, leading to improved performance and reduced bandwidth consumption. This document extends existing HTTP compression methods and provides guidelines for the delivery and use of compression dictionaries within the HTTP protocol.

Networking, HTTP, Data Compression

70 points by tosh 309 days ago | 50 comments

Faster Inverse BWT (2021) (blogspot.com)

Data Compression, Algorithms, Computer Science

65 points by fanf2 332 days ago | 2 comments

Compressing data with sample points and polynomial interpolation (johndcook.com)

Data Compression, Mathematics, Algorithms

63 points by speckx 334 days ago | 11 comments